# Rutgers University Newark Regression Methods Worksheet

Chapter 4Model Adequacy Checking

Linear Regression Analysis 6E Montgomery, Peck

& Vining

1

4.1 Introduction

•

Assumptions:

1. Relationship between response and regressors is linear (at least

approximately)

2. Error term, has zero mean

3. Error term, has constant variance

4. Errors are uncorrelated

5. Errors are normally distributed (required for tests and intervals)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

2

4.2 Residual Analysis

• Definition of Residual (= data – fit):

• Approximate average variance:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

3

4.2.2 Methods for Scaling Residuals

• Scaling helps in identifying outliers or extreme values

Four Methods

1.

2.

3.

4.

Standardized Residuals

Studentized Residuals

PRESS Residuals

R-student Residuals

Linear Regression Analysis 6E Montgomery, Peck

& Vining

4

4.2.2 Methods for Scaling Residuals

1. Standardized Residuals

– di’s have mean zero and variance approximately equal to 1.

– Large values of di (di > 3) may indicate an outlier.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

5

4.2.2 Methods for Scaling Residuals

2. Studentized Residuals

– MSRes is only an approximation of the variance of the ith

residual.

– Improve scaling by dividing ei by the exact standard

deviation:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

6

4.2.2 Methods for Scaling Residuals

2. Studentized Residuals

The studentized residuals are then:

– ri’s have mean zero and unit variance.

– Studentized residuals are generally larger than the corresponding

standardized residuals.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

7

4.2.2 Methods for Scaling Residuals

3. PRESS Residuals

Examine the differences:

– [these are the differences between the actual response for the ith

data point and the fitted value of the response for the ith data

point, using all observations except the ith one.]

Linear Regression Analysis 6E Montgomery, Peck

& Vining

8

4.2.2 Methods for Scaling Residuals

3. PRESS Residuals

• Logic: If the ith point is unusual, then it can “overly”

influence the regression model.

– If the ith point is used in fitting the model, then the residual for

the ith point will be small.

– If the ith point is not used in fitting the model, then the residual

will better reflect how unusual that point is.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

9

4.2.2 Methods for Scaling Residuals

3. PRESS Residuals

• Prediction error:

• Calculated for each point, called PRESS residuals – [they

will be used later to calculate the “prediction error sum of

squares].

• Calculate the PRESS residuals using

Linear Regression Analysis 6E Montgomery, Peck

& Vining

10

4.2.2 Methods for Scaling Residuals

3. PRESS Residuals

Linear Regression Analysis 6E Montgomery, Peck

& Vining

11

4.2.2 Methods for Scaling Residuals

3. PRESS Residuals

• The standardized PRESS residuals are

•

Note: these are the studentized residuals when MSRes is

used as the estimate of the variance.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

12

4.2.2 Methods for Scaling Residuals

4. R-Student

• MSRes is an “internal” estimate of variance.

• Use a variance estimate that is based on all observations

except the ith observation:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

13

4.2.2 Methods for Scaling Residuals

4. R-Student

• The R-student residual is

•

This is an externally studentized residual.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

14

Linear Regression Analysis 6E Montgomery, Peck

& Vining

15

Leverage and

influence

Linear Regression Analysis 6E Montgomery, Peck

& Vining

16

4.2.3 Residual Plots

• Normal Probability Plot of Residuals

– Checks the normality assumption

• Residuals against Fitted values,

– Checks for nonconstant variance

– Checks for nonlinearity

– Look for potential outliers

• Do not plot residuals versus yi (why?)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

17

4.2.3 Residual Plots

Linear Regression Analysis 6E Montgomery, Peck

& Vining

18

4.2.3 Residual Plots

Linear Regression Analysis 6E Montgomery, Peck

& Vining

19

4.2.3 Residual Plots

• Residuals against Regressors in the model

– Checks for nonconstant variance

– Look for nonlinearity

• Residuals against Regressors not in the model

– If a pattern appears, could indicate that adding that regressor

might improve the model fit

• Residuals against time order

– Check for Correlated errors

Linear Regression Analysis 6E Montgomery, Peck

& Vining

20

Linear Regression Analysis 6E Montgomery, Peck

& Vining

21

Example 4.4 The Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

22

Example 4.4 The Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

23

Plot of Residuals in Time Sequence

Linear Regression Analysis 6E Montgomery, Peck

& Vining

24

4.2.4 Partial Regression and Partial Residual Plots

Partial Regression Plots

• Why are these used?

– To determine if the correct relationship between y and xi

has been identified.

– To determine the marginal contribution of a variable,

given all other variables are in the model.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

25

4.2.4 Partial Regression and Partial Residual Plots

Partial Regression Plots

• Method

Say we want to know the importance/relationship between y

and some regressor variable, xi.

– Regress y against all variables except xi and calculate

residuals.

– Regress xi against all other regressor variables and calculate

residuals.

– Plot these two sets of residuals against each other.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

26

4.2.4 Partial Regression and Partial Residual Plots

Partial Regression Plots

• Interpretation

– If the plot appears to be linear, then a linear relationship between

y and xi seems reasonable.

– If plot is curvilinear, may need xi2 or 1/xi instead.

– If xi is a candidate variable, and a horizontal “band” appears,

then that variable adds no new information.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

27

Example 4.5

Linear Regression Analysis 6E Montgomery, Peck

& Vining

28

4.2.4 Partial Regression and Partial Residual Plots

Partial Regression Plots – Comments

• Use with caution, they only suggest possible relationships.

• Do not generally detect interaction effects.

• If multicollinearity is present, regression plots could give

incorrect information.

• The slope of the partial regression plot is the regression

coefficient for the variable of interest!

Linear Regression Analysis 6E Montgomery, Peck

& Vining

29

4.2.5 Other Residual Plotting and Analysis Methods

• Plotting regressors against each other can give

information about the relationship between the two:

– may indicate correlation between the regressors.

– may uncover remote points.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

30

Note location of these

two point in the x space

Linear Regression Analysis 6E Montgomery, Peck

& Vining

31

4.3 The PRESS Statistic

• PRESS Residual:

• Prediction Error Sum of Squares (PRESS) Statistic:

• A small value of the PRESS Statistic is desired

• See Table 4.1

Linear Regression Analysis 6E Montgomery, Peck

& Vining

32

4.3 The PRESS Statistic

R2 for Prediction Based on PRESS

• Interpretation:

– We expect the model to explain about R2% of the variability in prediction of

a new observation.

• PRESS is a valuable statistic for comparison of models.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

33

4.4 Outliers

• An outlier is an observation that is considerably different

from the others.

• Formal tests for outliers

• Points with large residuals may be outliers

• Impact can be assessed by removing the points and refitting

• How should they be treated?

Linear Regression Analysis 6E Montgomery, Peck

& Vining

34

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• Assumes

– normality, independence, constant variance assumptions have

been met.

– Only the first-order or straight line model is in doubt.

• Requires

– replication of y for at least one level of x.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

35

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• With replication, we can obtain a “model-independent” estimate of 2

• Say there are ni observations of the response at the ith level of the

regressor xi, i = 1, 2, …m

• yij denotes the jth observation on the response at xi, j = 1, 2, …, ni

• Total number of observations is

Linear Regression Analysis 6E Montgomery, Peck

& Vining

36

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• Partitioning of the residual sum of squares:

SSRes = SSPE + SSLOF

• SSPE – pure error sum of squares

• SSLOF – lack of fit sum of squares

• Note that the (ij)th residual can be partitioned, squared and summed.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

37

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• If the assumption of constant variance is satisfied, then SSPE is a

“model-independent” measure of pure error.

• If the function really is linear, then will be very close to and

SSLOF will be quite small.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

38

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• Test Statistic:

• If F0 > F,m-2,n-m conclude that the regression function is not linear. Why?

Linear Regression Analysis 6E Montgomery, Peck

& Vining

39

4.5 Lack of Fit of the Regression Model

A Formal Test for Lack of Fit

• If the test indicates lack of fit, abandon the model, try a

different one.

• If the test indicates no lack of fit, then MSLOF and MSPE are

combined to estimate 2 .

Linear Regression Analysis 6E Montgomery, Peck

& Vining

40

Example 4.8

Linear Regression Analysis 6E Montgomery, Peck

& Vining

41

Linear Regression Analysis 6E Montgomery, Peck

& Vining

42

An Approximate Procedure based on Estimating Error

from Near-Neighbors

Linear Regression Analysis 6E Montgomery, Peck

& Vining

43

See Example 4.10, pg. 167

Linear Regression Analysis 6E Montgomery, Peck

& Vining

44

Chapter 6

Diagnostics for Leverage

and Influence

Linear Regression Analysis 6E Montgomery, Peck

and Vining

45

6.1 Importance of Detecting Influential Observations

• Leverage Point:

– unusual x-value;

– very little effect

on regression coefficients.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

46

6.1 Importance of Detecting Influential Observations

• Influence Point:

unusual in y and x;

Linear Regression Analysis 6E Montgomery, Peck

and Vining

47

6.2 Leverage

• The hat matrix is:

H = X(XX)- 1 X

• The diagonal elements of the hat matrix are given by

hii = xi(XX)-1xi

• hii – standardized measure of the distance of the ith

observation from the center of the x-space.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

48

6.2 Leverage

• The average size of the hat diagonal is p/n.

• Traditionally, any hii > 2p/n indicates a leverage point.

• An observation with large hii and a large residual is likely

to be influential

Linear Regression Analysis 6E Montgomery, Peck

and Vining

49

Linear Regression Analysis 6E Montgomery, Peck

and Vining

50

Example 6.1 The Delivery Time Data

• Examine Table 6.1; if some possibly influential points are removed

here is what happens to the coefficient estimates and model statistics:

Linear Regression Analysis 6E Montgomery, Peck

and Vining

51

6.3 Measures of Influence

•

1.

2.

3.

4.

The influence measures discussed here are those that

measure the effect of deleting the ith observation.

Cook’s Di, which measures the effect on

DFBETASj(i), which measures the effect on

DFFITSi, which measures the effect on

COVRATIOi, which measures the effect on the variancecovariance matrix of the parameter estimates.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

52

6.3 Measures of Influence: Cook’s D

What contributes to Di:

1. How well the model fits the ith observation, yi

2. How far that point is from the remaining dataset.

Large values of Di indicate an influential point, usually if Di > 1.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

53

Linear Regression Analysis 6E Montgomery, Peck

and Vining

54

6.4 Measures of Influence: DFFITS and DFBETAS

• DFBETAS – measures how much the regression

coefficient changes in standard deviation units if the ith

observation is removed.

where

is an estimate of the jth coefficient when the

ith observation is removed.

–

Large DFBETAS indicates ith observation has considerable

influence. In general, |DFBETASj,i| > 2/

Linear Regression Analysis 6E Montgomery, Peck

and Vining

55

6.4 Measures of Influence: DFFITS and DFBETAS

DFFITS – measures the influence of the ith observation on

the fitted value, again in standard deviation units.

•

Cutoff: If |DFFITSi| > 2

influential.

, the point is most likely

Linear Regression Analysis 6E Montgomery, Peck

and Vining

56

6.4 Measures of Influence: DFFITS and DFBETAS

Equivalencies

• See the computational equivalents of both DFBETAS

and DFFITS (page 223). You will see that they are both

functions of R-student and hii.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

57

Linear Regression Analysis 6E Montgomery, Peck

and Vining

58

6.5 A Measure of Model Performance

•

Information about the overall precision of

estimation can be obtained through another statistic,

COVRATIOi

Linear Regression Analysis 6E Montgomery, Peck

and Vining

59

6.5 A Measure of Model Performance

Cutoffs and Interpretation

• If COVRATIOi > 1, the ith observation improves the

precision.

• If COVRATIOi < 1, ith observation can degrade the
precision.
Or,
• Cutoffs: COVRATIOi > 1 + 3p/n or COVRATIOi < 1 3p/n; (the lower limit is really only good if n > 3p).

Linear Regression Analysis 6E Montgomery, Peck

and Vining

60

Linear Regression Analysis 6E Montgomery, Peck

and Vining

61

6.6 Detecting Groups of Influential Observations

•

•

•

Previous diagnostics were “single-observation”

It is possible that a group of points have highleverage or exert undue influence on the regression

model.

Multiple-observation deletion diagnostic can be

implemented.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

62

6.6 Detecting Groups of Influential Observations

•

•

Cook’s D can be extended to incorporate multiple

observations:

where i denotes the m 1 vector of indices specifying the

points to be deleted.

Large values of Di indicate that the set of m points are

influential.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

63

6.7 Treatment of Influential Observations

•

Should an influential point be discarded?

Yes, if:

– there is an error in recording a measured value;

– the sample point is invalid; or,

– the observation is not part of the population that was intended to be

sampled.

No, if:

– the influential point is a valid observation.

Linear Regression Analysis 6E Montgomery, Peck

and Vining

64

6.7 Treatment of Influential Observations

•

Robust estimation techniques

– These techniques offer an alternative to deleting an influential

observation.

– Observations are retained but downweighted in proportion to

residual magnitude or influence.

– Refer to Chapter 15 for more information on robust regression

Linear Regression Analysis 6E Montgomery, Peck

and Vining

65

Chapter 7

Polynomial Regression Models

Linear Regression Analysis 6E Montgomery, Peck

& Vining

1

7.1 Introduction

A second-order polynomial in one variable:

y = 0 + 1 x + 2 x 2 +

A second-order polynomial in two variables:

y = 0 + 1 x1 + 2 x2 + x + 22 x + 12 x1 x2 +

2

11 1

2

2

Linear Regression Analysis 6E Montgomery, Peck

& Vining

2

7.2 Polynomial Models in One Variable

• One-variable form, again:

y = 0 + 1 x + 2 x +

2

• If we let x1 = x and x2 = x2, we have the same type of model as in

previous chapters – standard linear regression analysis applies.

• The expectation of y for a one-variable second-order polynomial model is

E ( y ) = 0 + 1 x + 2 x

2

Linear Regression Analysis 6E Montgomery, Peck

& Vining

3

7.2 Polynomial Models in One Variable

Linear Regression Analysis 6E Montgomery, Peck

& Vining

4

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable:

1. Keep the order of the model as low as possible.

– This is especially true if you are using the model as a predictor.

– Transformations are often preferred over higher-order models.

– Parsimony – this is a good thing, try to fit the data using the simplest model

possible.

– Remember: You can always fit an n – 1 order model to a set of

data with n points, but this is undesirable.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

5

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable:

2. Model Building Strategy

– One approach is fitting the lowest order polynomial possible and build up

(forward selection).

– Second approach is fitting the highest order polynomial of interest, and

removing terms (backward elimination).

– In general, you may not get the same result from the two approaches. You

should always try to fit the lowest-order model possible.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

6

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable:

3. Extrapolation

– Can be dangerous when the model is a higher-order polynomial. The nature

of the true underlying relationship may change or be completely different than

the system that produced the data used to fit the model.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

7

Linear Regression Analysis 6E Montgomery, Peck

& Vining

8

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable:

4. Ill-conditioning I

– Ill-conditioning refers to the fact that as the order of the model increases, the

X’X matrix inversion will become inaccurate –error can be introduced into

the parameter estimates.

– As the order of the model , multicollinearity

– Centering the variables first may remove some ill-conditioning but not all.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

9

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable

5. Ill-conditioning II

– Narrow ranges on the x variables can result in significant illconditioning and multicollinearity problems.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

10

7.2 Polynomial Models in One Variable

Cautions in fitting a polynomial in one-variable

6. Hierarchy

– hierarchical model is one which, if it is of order n, then it contains all terms

with orders of n and below:

y = 0 + 1 x + 2 x 2 + + n−1 x n−1 + n x n

– Two schools of thought: 1) Maintain hierarchy and, 2) Maintaining hierarchy

is not important.

– What to do? Fit the model with only significant terms and use knowledge and

understanding of the process to determine if a hierarchical model is necessary

(if you do not have one).

Linear Regression Analysis 6E Montgomery, Peck

& Vining

11

7.2 Polynomial Models in One Variable

Centering

– Sometimes, centering the regressor variables can minimize or

eliminate at least some of the ill-conditioning that may be present

in a polynomial model.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

12

Linear Regression Analysis 6E Montgomery, Peck

& Vining

13

Centering

Consider the Hardwood data in Example 7.1. The regression analysis is provided below:

The regression equation is

y = – 6.67 + 11.8 x – 0.635 x2

Predictor

Constant

x

x2

S = 4.420

Coef

-6.674

11.764

-0.63455

SE Coef

3.400

1.003

0.06179

R-Sq = 90.9%

Analysis of Variance

Source

DF

Regression

2

Residual Error

16

Total

18

SS

3104.2

312.6

3416.9

T

-1.96

11.73

-10.27

P

0.067

0.000

0.000

VIF

17.1

17.1

R-Sq(adj) = 89.7%

MS

1552.1

19.5

F

79.43

P

0.000

Note that the variance inflation factors indicate that multicollinearity may be a problem.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

14

Now, center the data using the mean of the regressor variable. The new data is

given as

xi – 7.2632

-6.2632

-5.7632

-5.2632

-4.2632

-3.2632

-2.7632

-2.2632

-1.7632

-1.2632

-0.7632

-0.2632

0.7368

1.7368

2.7368

3.7368

4.7368

5.7368

6.7368

7.7368

(xi – 7.2632)2

39.2277

33.2145

27.7013

18.1749

10.6485

7.6353

5.1221

3.1089

1.5957

0.5825

0.0693

0.5429

3.0165

7.4901

13.9637

22.4373

32.9109

45.3845

59.8581

y

6.3

11.1

20.0

24.0

26.1

30.0

33.8

34.0

38.1

39.9

42.0

46.1

53.1

52.0

52.5

48.0

42.8

27.8

21.9

Linear Regression Analysis 6E Montgomery, Peck

& Vining

15

Now, a new model is fit:

y = 0 + 1 ( x − 7.2632) + 2 ( x − 7.2632) +

2

The regression equation is

y = 45.3 + 2.55 xcent – 0.635 x2cent

Predictor

Constant

xcent

x2cent

S = 4.420

Coef

45.295

2.5463

-0.63455

SE Coef

1.483

0.2538

0.06179

R-Sq = 90.9%

Analysis of Variance

Source

DF

Regression

2

Residual Error

16

Total

18

SS

3104.2

312.6

3416.9

T

30.55

10.03

-10.27

P

0.000

0.000

0.000

VIF

1.1

1.1

R-Sq(adj) = 89.7%

MS

1552.1

19.5

Linear Regression Analysis 6E Montgomery, Peck

& Vining

F

79.43

P

0.000

16

7.2.2 Piecewise Polynomial Fitting (Splines)

•

This is a technique that can be used a particular function behaves

differently for different ranges of x. Generally, divide the range of

x into “homogeneous” segments and fit an appropriate function in

each section.

Splines:

a. Splines are piecewise polynomials of order k.

b. Splines have knots – the points at which the segments are joined.

Too many knots can result in “overfitting” and will not necessarily

provide more insight into the system.

c. Usually, a cubic spline is sufficient – polynomial of order 3.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

17

7.2.2 Piecewise Polynomial Fitting (Splines)

•

•

Cubic Spline with continuous first and second derivatives.

Say there are h knots, t1 < t2 < … < th. This cubic spline is given by:
3
h
E ( y ) = S ( x) = 0 j x + i ( x − ti ) 3+
j =0
with
•
j
i =1
x − ti , x ti
( x − ti ) + =
x ti
0,
Think of (x – ti)+ as an “indicator variable” – that is, “on” or “off”.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
18
7.2.2 Piecewise Polynomial Fitting (Splines)
•
If continuity restrictions are not necessarily appropriate, then
the general spline is
3
h
3
E ( y ) = S ( x) = 0 j x + ij ( x − t i ) +j
j =0
j
i =1 j = 0
Linear Regression Analysis 6E Montgomery, Peck
& Vining
19
7.2.2 Piecewise Polynomial Fitting (Splines)
To illustrate, consider the data in Example 7.2 – voltage drop data. First,
look at the plot of the data.
15
14
13
12
y
•
11
10
9
8
7
0
10
20
x
Linear Regression Analysis 6E Montgomery, Peck
& Vining
20
Linear Regression Analysis 6E Montgomery, Peck
& Vining
21
• If we attempt to fit a standard quadratic model to this data, we
would obtain the following Minitab output:
The regression equation is
y = 5.27 + 1.49 x - 0.0652 x2
Predictor
Constant
x
x2
Coef
5.2657
1.4872
-0.065198
S = 1.076
SE Coef
0.4807
0.1112
0.005375
R-Sq = 83.2%
T
10.95
13.37
-12.13
P
0.000
0.000
0.000
VIF
15.3
15.3
R-Sq(adj) = 82.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
38
40
SS
218.66
44.03
262.69
MS
109.33
1.16
F
94.35
P
0.000
• This looks as though it may be a good fit, but examine the residual
plots.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
22
Obviously, something is missing. (Note, even if you include the x3
term, the residual plots are not acceptable).
Residuals Versus the Fitted Values
(response is y)
3
Residual
2
1
0
-1
-2
5
6
7
8
9
10
11
12
13
14
Fitted Value
Linear Regression Analysis 6E Montgomery, Peck
& Vining
23
Example 7.2 (continued)
•
•
•
•
A cubic spline is now investigated. Based on the plot of the original
data and knowledge of the process, two knots are chosen.
It appears that voltage behaves different between time 0 and 6.5
seconds than it does between 6.5 and 13 seconds.
It appears to behave differently yet again after 13 seconds.
Therefore, h = 2 knots are chosen to be t1 = 6.5 and t2 = 13.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
24
Example 7.2 (continued)
•
The cubic spline model is
y = 00 + 01 x + 02 x 2 + 03 x 3 + 1 ( x − 6.5) 3+ + 2 ( x − 13) 3+ +
•
Putting the original data in Minitab and then adding 4 new
columns (for each term beyond x) we obtain the following
results from the regression analysis:
Linear Regression Analysis 6E Montgomery, Peck
& Vining
25
Example 7.2 (continued)
y = 8.47 - 1.45 x + 0.490 x2 - 0.0295 x3 + 0.0247 x65 +
0.0271 x13
Predictor
Constant
x
x2
x3
x65
x13
Coef
8.4657
-1.4531
0.48989
-0.029467
0.024706
0.027112
SE Coef
0.2005
0.1816
0.04302
0.002848
0.004039
0.003578
T
42.22
-8.00
11.39
-10.35
6.12
7.58
Linear Regression Analysis 6E Montgomery, Peck
& Vining
P
0.000
0.000
0.000
0.000
0.000
0.000
26
Example 7.2 (continued)
Linear Regression Analysis 6E Montgomery, Peck
& Vining
27
Chapter 9
Multicollinearity
Linear Regression Analysis 6E Montgomery, Peck
& Vining
28
9.1 Introduction
• Multicollinearity is a problem that plagues many regression
models. It impacts the estimates of the individual regression
coefficients.
• Uses of regression:
1. Identifying the relative effects of the regressor variables
2. Prediction and/or estimation, and
3. Selection of an appropriate set of variables for the model.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
29
9.1 Introduction
• If all regressors are orthogonal, then multicollinearity is not a
problem. This is a rare situation in regression analysis.
• More often than not, there are near-linear dependencies among the
regressors such that
p
t jX j = 0
j =1
is approximately true. If this sum holds exactly for a subset of
regressors, then (X’X)-1 does not exist.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
30
9.2 Sources of Multicollinearity
Four primary sources:
1.
2.
3.
4.
The data collection method employed
Constraints on the model or in the population
Model specification
An overdefined model
Linear Regression Analysis 6E Montgomery, Peck
& Vining
31
9.2 Sources of Multicollinearity
Data collection method employed
- Occurs when only a subsample of the entire sample space has
been selected. (Soft drink delivery: number of cases and distance
tend to be correlated. That is, we may have data where only a small
number of cases are paired with short distances, large number of
cases paired with longer distances). We may be able to reduce this
multicollinearity through the sampling technique used. There is no
physical reason why you can’t sample in that area.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
32
9.2 Sources of Multicollinearity
Linear Regression Analysis 6E Montgomery, Peck
& Vining
33
9.2 Sources of Multicollinearity
Constraints on the model or in the population.
(Electricity consumption: two variables x1 – family income
and x2 – house size). Physical constraints are present,
multicollinearity will exist regardless of collection method.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
34
9.2 Sources of Multicollinearity
Model Specification
Polynomial terms can cause ill-conditioning in the X’X
matrix. This is especially true if range on a regressor variable,
x, is small.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
35
9.2 Sources of Multicollinearity
Overdefined model
More regressor variables than observations. The best way to
counter this is to remove regressor variables.
- Recommendations:
1) Redefine the model using smaller set of regressors;
2) Do preliminary studies using subsets of regressors; or
3) Use principal components type regression methods to
remove regressors.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
36
9.3 Effects of Multicollinearity
Strong multicollinearity can result in large variances and covariances
for the least squares estimates of the coefficients. Recall from chapter
3, C = (X’X)-1 and
1
C jj =
1 − R 2j
Strong multicollinearity between xj and any other regressor variable
will cause Rj2 to be large, and thus Cjj to be large.
In other words, the variance of the least squares estimate of the
coefficient will be very large.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
37
9.3 Effects of Multicollinearity
Strong multicollinearity can also produce least-squares
estimates of the coefficients that are too large in absolute
value. The squared distance between the least squares
estimate and the true parameter is denoted
L12 = (ˆ − )' (ˆ − )
E L2 = E (ˆ − )' (ˆ − )
( )
1
= 2Tr ( X' X) −1
Linear Regression Analysis 6E Montgomery, Peck
& Vining
38
Linear Regression Analysis 6E Montgomery, Peck
& Vining
39
Linear Regression Analysis 6E Montgomery, Peck
& Vining
40
Linear Regression Analysis 6E Montgomery, Peck
& Vining
41
9.4 Multicollinearity Diagnostics
•
Ideal characteristics of a multicollinearity diagnostic:
1. We want the procedure to correctly indicate if
multicollinearity is present; and,
2. We want the procedure to provide some insight as to which
regressors are causing the problem.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
42
9.4.1 Examination of the Correlation Matrix
•
•
If we scale and center the regressors in the X’X matrix, we have the
correlation matrix. The pairwise correlation between two variables xi and xj
is denoted rij. The off diagonal elements of the centered and scaled X’X
matrix (X’X matrix in correlation form) are the pairwise correlations.
If |rij| is close to unity, then there may be an indication of multicollinearity.
But, the opposite does not always hold. That is, there may be instances when
multicollinearity is present, but the pairwise correlations do not indicate a
problem. This can happen when using pairwise correlations in a problem
with more than two variables involved.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
43
Linear Regression Analysis 6E Montgomery, Peck
& Vining
44
The correlation matrix fails to identify the multicollinearity problem
in the Mason, Gunst & Webster data in Table 9.4, page 304.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
45
9.4.2 Variance Inflation Factors
•
As discussed in Chapter 3, variance inflation factors are very
useful in determining if multicollinearity is present.
VIF j = C jj = (1 − R 2j ) −1
•
VIFs > 5 to 10 are considered significant. The regressors that

have high VIFs probably have poorly estimated regression

coefficients.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

46

Linear Regression Analysis 6E Montgomery, Peck

& Vining

47

9.4.2 Variance Inflation Factors

VIFs: A Second Look and Interpretation

•

The length of the normal-theory confidence interval on

the jth regression coefficient can be written as

L j = 2(C jj ˆ ) t / 2,n− p −1

2 1/ 2

Linear Regression Analysis 6E Montgomery, Peck

& Vining

48

9.4.2 Variance Inflation Factors

VIFs: A Second Look and Interpretation

•

The length of the corresponding normal-theory confidence

interval based on a design with orthogonal regressors (with

same sample size, same root-mean square (rms) values) is

L = 2ˆ t / 2,n− p −1

*

Linear Regression Analysis 6E Montgomery, Peck

& Vining

49

9.4.2 Variance Inflation Factors

VIFs: A Second Look and Interpretation

•

Take the ratio of these two:

That is, the square

root of the jth VIF gives us a measure of how much longer the

confidence interval for the jth regression coefficient is

because of multicollinearity.

•

For example, say VIF3 = 10. Then VIF3 3.3 . This tells us that

that the confidence interval is 3.3 times longer than if the

regressors had been orthogonal (the best case scenario).

1/ 2

*

C

Lj/L = jj .

Linear Regression Analysis 6E Montgomery, Peck

& Vining

50

9.4.3 Eigensystem Analysis of X’X

• The eigenvalues of X’X (denoted 1, 2, …, p) can be used to

measure multicollinearity. Small eigenvalues are indications of

multicollinearity.

max

The condition number of X’X is =

min

• This number measures the spread in the eigenvalues.

< 100, no serious problem
100 < < 1000, moderate to strong multicollinearity
> 1000, strong multicollinearity.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

51

9.4.3 Eigensystem Analysis of X’X

• A large condition number indicates multicollinearity exists. It does

not tell us how many regressors are involved.

The condition indices of X’X are

max

j =

j

• The number of condition indices that are large (greater than 1000)

provide a measure of the number of near linear dependencies in X’X.

• In SAS, PROC REG, in the model statement of your program, you can

use the option COLLIN; this will produce out eigenvalues, condition

indices, etc.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

52

Linear Regression Analysis 6E Montgomery, Peck

& Vining

53

Linear Regression Analysis 6E Montgomery, Peck

& Vining

54

Linear Regression Analysis 6E Montgomery, Peck

& Vining

55

Linear Regression Analysis 6E Montgomery, Peck

& Vining

56

Linear Regression Analysis 6E Montgomery, Peck

& Vining

57

Linear Regression Analysis 6E Montgomery, Peck

& Vining

58

9.5 Methods for Dealing with Multicollinearity

•

•

•

Collect more data

Respecify the model

Ridge Regression and related techniques (PC regression,

LASSO, etc)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

59

9.5 Methods for Dealing with Multicollinearity

• Least squares estimation gives an unbiased estimate,

E (ˆ ) =

with minimum variance – but this variance may still be very

large, resulting in unstable estimates of the coefficients.

– Alternative: Find an estimate that is biased but with smaller variance than

the unbiased estimator.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

60

9.5 Methods for Dealing with Multicollinearity

Ridge Estimator ̂ R

ˆ = ( X’ X + kI ) −1 X’ y

R

= ( X’ X + kI ) X’ Xβˆ

= Z βˆ

−1

k

k is a “biasing parameter” usually between 0 and 1.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

61

9.5 Methods for Dealing with Multicollinearity

The effect of k on the MSE

Recall: MSE (ˆ * ) = Var (ˆ * ) + (bias ) 2

Now, MSE (ˆ *R ) = Var (ˆ *R ) + (bias) 2

=

2

j

2

−2

+

k

β

‘

(

X’

X

+

k

I

)

β

2

( j + k )

As k , Var , and bias

Choose k such that the reduction in variance > increase in bias.

SS Re s = ( y − xˆ R )’ ( y − xˆ R )

Linear Regression Analysis 6E Montgomery, Peck

& Vining

62

9.5 Methods for Dealing with Multicollinearity

• Ridge Trace

– Plots k against the coefficient estimates. If multicollinearity is

severe, the ridge trace will show it. Choose k such that ̂ R is

stable and hope the MSE is acceptable

– Ridge regression is a good alternative if the model user wants to

have all regressors in the model.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

63

9.5 Methods for Dealing with Multicollinearity

Linear Regression Analysis 6E Montgomery, Peck

& Vining

64

Linear Regression Analysis 6E Montgomery, Peck

& Vining

65

More About Ridge Regression

• Methods for choosing k

• Relationship to other estimators

• Ridge regression and variable selection

• Generalized ridge regression (a procedure with a biasing

parameter k for each regressor

Linear Regression Analysis 6E Montgomery, Peck

& Vining

66

Generalized Regression Techniques

Linear Regression Analysis 6E Montgomery, Peck

& Vining

67

Linear Regression Analysis 6E Montgomery, Peck

& Vining

68

Linear Regression Analysis 6E Montgomery, Peck

& Vining

69

Linear Regression Analysis 6E Montgomery, Peck

& Vining

70

Linear Regression Analysis 6E Montgomery, Peck

& Vining

71

Linear Regression Analysis 6E Montgomery, Peck

& Vining

72

Linear Regression Analysis 6E Montgomery, Peck

& Vining

73

Linear Regression Analysis 6E Montgomery, Peck

& Vining

74

Linear Regression Analysis 6E Montgomery, Peck

& Vining

75

Linear Regression Analysis 6E Montgomery, Peck

& Vining

76

Linear Regression Analysis 6E Montgomery, Peck

& Vining

77

Linear Regression Analysis 6E Montgomery, Peck

& Vining

78

Linear Regression Analysis 6E Montgomery, Peck

& Vining

79

Linear Regression Analysis 6E Montgomery, Peck

& Vining

80

Linear Regression Analysis 6E Montgomery, Peck

& Vining

81

Linear Regression Analysis 6E Montgomery, Peck

& Vining

82

9.5.4 Principal-Component Regression

Linear Regression Analysis 6E Montgomery, Peck

& Vining

83

Linear Regression Analysis 6E Montgomery, Peck

& Vining

84

Linear Regression Analysis 6E Montgomery, Peck

& Vining

85

The eigenvalues suggest that a model based on 4 or 5 of the PCs

would probably be adequate.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

86

Linear Regression Analysis 6E Montgomery, Peck

& Vining

87

Models D and E are

pretty similar

Linear Regression Analysis 6E Montgomery, Peck

& Vining

88

Chapter 10

Variable Selection and

Model Building

Linear Regression Analysis 6E Montgomery, Peck

& Vining

89

10.1 Introduction

In this chapter, we will cover some variable selection

techniques. Keep in mind the following:

1. None of the variable selection techniques can guarantee the best

regression equation for the dataset of interest.

2. The techniques may very well give different results.

3. Complete reliance on the algorithm for results is to be avoided.

Other valuable information such as experience with and

knowledge of the data and problem.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

90

10.1.1 Model-Building Problem

Two “conflicting” goals in regression model building:

1. Want as many regressors as possible so that the “information

content” in the variables will influence ŷ

2. Want as few regressors as necessary because the variance of ŷ

will increase as the number of regressors increases. (Also,

more regressors can cost more money in data

collection/model maintenance)

A compromise between the two hopefully leads to the best

regression equation.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

91

10.1.2 Consequences of Model Misspecification

Say there are K regressor variables under investigation in a

problem. Then

y = X +

where X can be partitioned into two submatrices:

1) a matrix containing the intercept and the p – 1 regressors that

are significant (to be retained in the model) – denoted Xp ;

and,

2) a matrix containing the remaining r regressors that are not

significant and should be deleted from the model – denoted

Xr.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

92

Note that K +1 = p + r. Our model is rewritten as

y = X p p + X r r +

For the full model:

*

1) ˆ = ( XX) −1 X’ y , ̂ consists of two parts ˆ *p and ˆ r

−1

ˆ

y’

y

−

*’

X’

y

y

I

−

X(X’

X)

X’ y

2) ˆ *2 =

=

n − Κ −1

n − Κ −1

3) Fitted values are yˆ

*

i

Linear Regression Analysis 6E Montgomery, Peck

& Vining

93

For the subset model:

1) ˆ p = ( X ‘p X p ) −1 X ‘p y

2) ˆ =

2

y’ y − p ‘ X ‘p y

n− p

=

y I − X p (X ‘p X p ) −1 X ‘p y

n− p

3) Fitted values are ŷ i

What is the difference between ˆ *p and ̂ p?

Linear Regression Analysis 6E Montgomery, Peck

& Vining

94

10.1.2 Consequences of Model Misspecification

Some properties of the estimates ̂ pand ̂ 2 are:

1. E( ̂ p ) = p+Ar . ̂ p is a biased estimate of p unless the

regression coefficients of the insignificant (or deleted) variables

are zero or are orthogonal to the retained variables. (Xp’Xr = 0).

2. Var( ̂ p ) = 2(Xp’Xp)-1, Var( ̂ *) = 2(X’X)-1. Var(ˆ *p) – Var( ̂ p)

is a matrix such that all variances of regression coefficients in the

full model are greater than or equal to variances of

corresponding coefficients in the reduced model. In other words,

deleting unnecessary variables will not increase the variance on

remaining coefficients.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

95

10.1.2 Consequences of Model Misspecification

Some properties of the estimates ̂ p and ̂ 2 are:

3.

MSE(̂ p) < MSE( ˆ *p ) when each coefficient in ˆ *r< the
standard error of ˆ *r . In a nutshell, the MSE for the
subset model is better (smaller) than the MSE for the
same coefficients when the full model is employed – if
the deleted variables are really insignificant.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
96
10.1.2 Consequences of Model Misspecification
Some properties of the estimates ̂ p and ̂ 2 are:
4. For the subset model,
E (ˆ ) =
2
2 + 'r X 'r I − X p (X 'p X p ) −1 X 'p X r r
n− p
that is for this model, ̂ 2 is a biased upward estimate
of 2.
5. Prediction: From full model: yˆ * = x' ˆ *, from the
' ˆ
ˆ
y
=
x
subset model:
p p , then
Var( ŷ *) MSE( ŷ )
Linear Regression Analysis 6E Montgomery, Peck
& Vining
97
10.1.2 Consequences of Model Misspecification
The summary of the five statements is:
•
•
•
•
Deleting variables improves the precision of the parameter
estimates of retained variables.
Deleting variables improves the precision of the variance of the
predicted response.
Deleting variables can induce bias into the estimates of
coefficients and variance of predicted response. (But, if the
deleted variables are “insignificant” the MSE of the biased
estimates will be less than the variance of the unbiased estimates).
Retaining insignificant variables can increase the variance of the
parameter estimates and variance of the predicted response.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
98
10.1.3 Criteria for Evaluating Subset Regression Models
Coefficient of Multiple Determination
•
Say we are investigating a model with p terms,
SS Re s ( p)
SS R ( p)
R =
= 1−
SS T
SS T
2
p
•
Models with large values of Rp2 are preferred, but
adding terms will increase this value.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
99
Linear Regression Analysis 6E Montgomery, Peck
& Vining
100
10.1.3 Criteria for Evaluating Subset Regression Models
Adjusted R2
•
Say we are investigating a model with p terms,
R
•
2
adj , p
n −1
(1 − R p2 )
= 1 −
n− p
This value will not necessarily increase as additional
terms are introduced into the model. We want a
model with the maximum adjusted R2.
Linear Regression Analysis 6E Montgomery, Peck
& Vining
101
10.1.3 Criteria for Evaluating Subset Regression Models
Residual Mean Square
•
The MSres for a subset regression model is
SS Re s ( p)
MS Re s ( p) =
n− p
•
MSRes(p) increases as p increases, in general. The increase in
MSRes(p) occurs when the reduction in SSRes(p) from adding a
regressor to the model is not sufficient to compensate for the loss
of one degree of freedom. We want a model with a minimum
MSRes(p).
Linear Regression Analysis 6E Montgomery, Peck
& Vining
102
Linear Regression Analysis 6E Montgomery, Peck
& Vining
103
10.1.3 Criteria for Evaluating Subset Regression Models
Mallow’s Cp Statistic
•
This criterion is related to the MSE of the fitted
value, that is
2
2
ˆ
ˆ
E y i − E ( y i ) = E ( y i ) − E ( y i ) + Var ( yˆ i )
2
ˆ
E
(
y
)
−
E
(
y
)
where
is the squared bias. The total
i
i
squared bias for a p-term model is
SS B ( p) = E ( y i ) − E ( yˆ i )
n
2
i =1
Linear Regression Analysis 6E Montgomery, Peck
& Vining
104
10.1.3 Criteria for Evaluating Subset Regression Models
Mallow’s Cp Statistic
•
The standardized total squared error is
n
1 n
2
p = 2 E ( y i ) − E ( yˆ i ) + Var ( yˆ i )
i =1
i =1
SS B ( p ) 1 n
=
+ 2 Var ( yˆ i )
2
i =1
•
Making some appropriate substitutions, we can find the estimate of
p, denoted Cp:
SS Re s ( p)
Cp =
− n+ 2p
2
ˆ
Linear Regression Analysis 6E Montgomery, Peck
& Vining
105
10.1.3 Criteria for Evaluating Subset Regression Models
Mallow’s Cp Statistic
•
It can be shown that if Bias = 0, the expected value of Cp is
(n − p)ˆ 2
E C p | Bias = 0 =
− n+ 2p = p
2
ˆ
Linear Regression Analysis 6E Montgomery, Peck
& Vining
106
10.1.3 Criteria for Evaluating Subset Regression Models
Mallow’s Cp Statistic
Notes:
1. Cp is a measure of variance in the fitted values and (bias)2.
(Large bias can be a result of important variables being left out
of the model).
2. Cp >> p, then significant bias.

3. Small Cp values are desirable.

4. Beware of negative values of Cp. These could result because the

MSE for the full model overestimates the true 2.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

107

Linear Regression Analysis 6E Montgomery, Peck

& Vining

108

Linear Regression Analysis 6E Montgomery, Peck

& Vining

109

Linear Regression Analysis 6E Montgomery, Peck

& Vining

110

10.1.3 Criteria for Evaluating Subset Regression Models

Uses of Regression and Model Evaluation Criteria

•

Regression equations may be used to make predictions. So,

minimizing the MSE for prediction may be an important

criterion. The PRESS statistic can be used for comparisons of

candidate models.

PRESS p = ( yi − yˆ (i ) )

2

n

i =1

ei

=

i =1 1 − hii

2

n

•

We want models with small values of PRESS.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

111

10.2 Computational Techniques for Variable Selection

10.2.1 All Possible Regressions

•

Assume the intercept term is in all equations considered. Then, if

there are K regressors, we would investigate 2K possible

regression equations. Use the criteria above to determine some

candidate models and complete regression analysis on them.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

112

Example 10.1 Hald Cement Data (Appendix Table B21)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

113

Example 10.1 Hald Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

114

Example 10.1 Hald Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

115

Example 10.1 Hald Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

116

Example 10.1 Hald Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

117

Example 10.1 Hald Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

118

10.2 Computational Techniques for Variable Selection

10.2.1 All Possible Regressions

Notes:

• Once some candidate models have been identified, run regression

analysis on each one individually and make comparisons (include

the PRESS statistic).

• A caution about the regression coefficients. If the estimates of a

particular coefficient tends to “jump around,” this could be an

indication of multicollinearity. Jumping around is a technical term

– example: if some estimates are positive and then negative.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

119

Example 10.1 Hald

Cement Data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

120

10.2 Computational Techniques for Variable Selection

10.2.2 Stepwise Regression Methods

Three types of stepwise regression methods:

1. Forward selection

2. Backward elimination

3. Stepwise regression — combination of forward and backward

Linear Regression Analysis 6E Montgomery, Peck

& Vining

121

10.2 Computational Techniques for Variable Selection

10.2.2 Stepwise Regression Methods

Forward Selection

•

Procedure is based on the idea that no variables are in the model originally, but

are added one at a time. The selection procedure is:

1. The first regressor selected to be entered into the model is the one with the

highest correlation with the response. If the F statistic corresponding to the

model containing this variable is significant (larger than some predetermined

value, Fin), then that regressor is left in the model.

2. The second regressor examined is the one with the largest partial

correlation with the response. If the F-statistic corresponding to the addition of

this variable is significant, the regressor is retained.

3. This process continues until all regressors are examined.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

122

Linear Regression Analysis 6E Montgomery, Peck

& Vining

123

10.2 Computational Techniques for Variable Selection

10.2.2 Stepwise Regression Methods

Backward Elimination

Procedure is based on the idea that all variables are in the model originally, examined

one at a time and removed if not significant.

1. The partial F statistic is calculated for each variable as if it were the last one

added to the model. The regressor with the smallest F statistic is examined

first and will be removed if this value is less than some predetermined value

Fout.

2. If this regressor is removed, then the model is refit with the remaining

regressor variables and the partial F statistics calculated again. The regressor

with the smallest partial F statistic will be removed if that value is less than

Fout.

3. The process continues until all regressors are examined.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

124

Linear Regression Analysis 6E Montgomery, Peck

& Vining

125

10.2 Computational Techniques for Variable Selection

10.2.2 Stepwise Regression Methods

Stepwise Regression

This procedure is a modification of forward selection.

1. The contribution of each regressor variable that is put into the model is

reassessed by way of its partial F statistic.

2. A regressor that makes it into the model, may also be removed it if is found to

be insignificant with the addition of other variables to the model. If the partial

F-statistic is less than Fout, the variable will be removed.

3. Stepwise requires both an Fin value and Fout value.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

126

Linear Regression Analysis 6E Montgomery, Peck

& Vining

127

10.2 Computational Techniques for Variable Selection

10.2.2 Stepwise Regression Methods

Cautions:

•

•

•

No one model may be the “best”

The three stepwise techniques could result in different models

Inexperienced analysts may use the final model simply because the

procedure spit it out.

Please look over the discussion on “Stopping Rules for Stepwise Procedures” on

page 283.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

128

Strategy for

Regression Model

Building

Linear Regression Analysis 6E Montgomery, Peck

& Vining

129

Chapter 13

Generalized Linear Models

Linear Regression Analysis 6E Montgomery, Peck

& Vining

1

Generalized Linear Models

• Traditional applications of linear models, such as DOX and

multiple linear regression, assume that the response variable

is

– Normally distributed

– Constant variance

– Independent

• There are many situations where these assumptions are

inappropriate

– The response is either binary (0,1), or a count

– The response is continuous, but nonnormal

Linear Regression Analysis 6E Montgomery, Peck

& Vining

2

Some Approaches to These Problems

• Data transformation

– Induce approximate normality

– Stabilize variance

– Simplify model form

• Weighted least squares

– Often used to stabilize variance

• Generalized linear models (GLM)

– Approach is about 30 years old, unifies linear and nonlinear regression

models

– Response distribution is a member of the exponential family (normal,

exponential, gamma, binomial, Poisson)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

3

Generalized Linear Models

• Original applications were in biopharmaceutical sciences

• Lots of recent interest in GLMs in industrial statistics

• GLMs are simple models; include linear regression and OLS as a

special case

• Parameter estimation is by maximum likelihood (assume that the

response distribution is known)

• Inference on parameters is based on large-sample or asymptotic

theory

• We will consider logistic regression, Poisson regression, then the

GLM

Linear Regression Analysis 6E Montgomery, Peck

& Vining

4

References

• Montgomery, D. C., Peck, E. A., and Vining, G. G. (2021), Introduction to

Linear Regression Analysis, 6th Edition, Wiley, New York (see Chapter 13)

• Myers, R. H., Montgomery, D. C., Vining, G. G. and Robinson, T.J. (2010),

Generalized Linear Models with Applications in Engineering and the Sciences,

2nd edition, Wiley, New York

• Hosmer, D. W. and Lemeshow, S. (2000), Applied Logistic Regression, 2nd

Edition, Wiley, New York

• Lewis, S. L., Montgomery, D. C., and Myers, R. H. (2001), “Confidence Interval

Coverage for Designed Experiments Analyzed with GLMs”, Journal of Quality

Technology 33, pp. 279-292

• Lewis, S. L., Montgomery, D. C., and Myers, R. H. (2001), “Examples of

Designed Experiments with Nonnormal Responses”, Journal of Quality

Technology 33, pp. 265-278

• Myers, R. H. and Montgomery, D. C. (1997), “A Tutorial on Generalized Linear

Models”, Journal of Quality Technology 29, pp. 274-291

Linear Regression Analysis 6E Montgomery, Peck

& Vining

5

Binary Response Variables

• The outcome ( or response, or endpoint) values 0, 1 can

represent “success” and “failure”

• Occurs often in the biopharmaceutical field; dose-response

studies, bioassays, clinical trials

• Industrial applications include failure analysis, fatigue testing,

reliability testing

• For example, functional electrical testing on a semiconductor

can yield:

– “success” in which case the device works

– “failure” due to a short, an open, or some other failure mode

Linear Regression Analysis 6E Montgomery, Peck

& Vining

6

Binary Response Variables

• Possible model:

i = 1, 2,…, n

yi = 0 + j xij + i = xi + i

j =1

yi = 0 or 1

k

• The response yi is a Bernoulli random variable

P( yi = 1) = i with 0 i 1

P( yi = 0) = 1 − i

E ( yi ) = i = xi = i

Var ( yi ) = y2i = i (1 − i )

Linear Regression Analysis 6E Montgomery, Peck

& Vining

7

Problems With This Model

• The error terms take on only two values, so they can’t

possibly be normally distributed

• The variance of the observations is a function of the mean

(see previous slide)

• A linear response function could result in predicted values

that fall outside the 0, 1 range, and this is impossible

because

0 E ( yi ) = i = xi = i 1

Linear Regression Analysis 6E Montgomery, Peck

& Vining

8

At Least

One Oring

Failure

Temperature

at Launch

At Least

One O-ring

Failure

53

1

70

1

56

1

70

1

57

1

72

0

63

0

73

0

66

0

75

0

67

0

75

1

67

0

76

0

67

0

76

0

68

0

78

0

69

0

79

0

70

0

80

0

70

1

81

0

1.0

O-Ring Fail

Temperatur

e at Launch

Binary Response Variables – The

Challenger Data

0.5

0.0

Data for space shuttle

launches and static

tests prior to the

launch of Challenger

50

Linear Regression Analysis 6E Montgomery, Peck

& Vining

60

70

80

Temperature

9

Binary Response Variables

• There is a lot of empirical evidence that the response

function should be nonlinear; an “S” shape is quite logical

• See the scatter plot of the Challenger data

• The logistic response function is a common choice

exp(x)

1

E ( y) =

=

1 + exp(x) 1 + exp(−x)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

10

Linear Regression Analysis 6E Montgomery, Peck

& Vining

11

The Logistic Response Function

• The logistic response function can be easily linearized. Let:

= x and E ( y ) =

• Define

= ln

1−

• This is called the logit transformation

Linear Regression Analysis 6E Montgomery, Peck

& Vining

12

Logistic Regression Model

• Model:

yi = E ( yi ) + i

where

E ( yi ) = i

exp(xi)

=

1 + exp(xi)

• The model parameters are estimated by the method of

maximum likelihood (MLE)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

13

A Logistic Regression Model for the Challenger Data

(Using Minitab)

Binary Logistic Regression: O-Ring Fail versus Temperature

Link Function:

Logit

Response Information

Variable

Value

Count

O-Ring F

1

7

0

17

Total

24

(Event)

Logistic Regression Table

Odds

Predictor

Coef

SE Coef

Z

P

Constant

10.875

5.703

1.91 0.057

Temperat

-0.17132

0.08344

-2.05 0.040

95% CI

Ratio

Lower

Upper

0.84

0.72

0.99

Log-Likelihood = -11.515

Linear Regression Analysis 6E Montgomery, Peck

& Vining

14

A Logistic Regression Model for the Challenger Data

Test that all slopes are zero: G = 5.944, DF = 1,

P-Value = 0.015

Goodness-of-Fit Tests

Method

Chi-Square

DF

P

Pearson

14.049

15

0.522

Deviance

15.759

15

0.398

Hosmer-Lemeshow

11.834

8

0.159

exp(10.875 − 0.17132 x)

yˆ =

1 + exp(10.875 − 0.17132 x)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

15

Logistic Regression Model for Challenger Data

Note that the fitted function has been

extended down to 31 deg F, the

temperature at which Challenger

was launched

Linear Regression Analysis 6E Montgomery, Peck

& Vining

16

Maximum Likelihood Estimation in Logistic Regression

• The distribution of each observation yi is

fi ( yi ) = iyi (1 − i )1− yi , i = 1, 2,…, n

• The likelihood function is

n

n

i =

i =1

L(y, ) = fi ( yi ) = iyi (1 − i )1− yi

• We usually work with the log-likelihood:

i n

ln L(y, ) = ln fi ( yi ) = yi ln

+ ln(1 − i )

i =1

i =1

1 − i i =1

n

n

Linear Regression Analysis 6E Montgomery, Peck

& Vining

17

Maximum Likelihood Estimation in Logistic Regression

• The maximum likelihood estimators (MLEs) of the model parameters

are those values that maximize the likelihood (or log-likelihood)

function

• ML has been around since the first part of the previous century

• Often gives estimators that are intuitively pleasing

• MLEs have nice properties; unbiased (for large samples), minimum

variance (or nearly so), and they have an approximate normal

distribution when n is large

Linear Regression Analysis 6E Montgomery, Peck

& Vining

18

Maximum Likelihood Estimation in Logistic Regression

• If we have ni trials at each observation, we can write the loglikelihood as

n

ln L(y, ) = Xy − ni ln[1 + exp(xi)

i =1

• The derivative of the log-likelihood is

n

ni

ln L(y, )

= X y −

exp(xi)xi

i =1 1 + exp( xi )

n

= Xy − ni i xi

i =1

= Xy − X ( because i = ni i )

Linear Regression Analysis 6E Montgomery, Peck

& Vining

19

Maximum Likelihood Estimation in Logistic Regression

• Setting this last result to zero gives the maximum likelihood score

equations

X(y − ) = 0

• These equations look easy to solve…we’ve actually seen them

before in linear regression:

y = X + = +

X( y − ) = 0 results from OLS or ML with normal errors

Since = X X( y − ) = X( y − X) = 0,

XXˆ = Xy , and ˆ = ( XX) −1 Xy (OLS or the normal-theory MLE)

Linear Regression Analysis 6E Montgomery, Peck

& Vining

20

Maximum Likelihood Estimation in Logistic Regression

• Solving the ML score equations in logistic regression isn’t quite as easy,

because

i =

ni

, i = 1, 2,…, n

1 + exp(−xi)

• Logistic regression is a nonlinear model

• It turns out that the solution is actually fairly easy, and is based on iteratively

reweighted least squares or IRLS (see Appendix for details)

• An iterative procedure is necessary because parameter estimates must be

updated from an initial “guess” through several steps

• Weights are necessary because the variance of the observations is not constant

• The weights are functions of the unknown parameters

Linear Regression Analysis 6E Montgomery, Peck

& Vining

21

Interpretation of the Parameters in Logistic Regression

• The log-odds at x is

ˆ ( x)

ˆ ( x) = ln

= ˆ0 + ˆ1 x

1 − ˆ ( x)

• The log-odds at x + 1 is

ˆ ( x + 1)

ˆ ( x + 1) = ln

= ˆ0 + ˆ1 ( x + 1)

1 − ˆ ( x + 1)

• The difference in the log-odds is

ˆ ( x + 1) − ˆ ( x) = ˆ1

Linear Regression Analysis 6E Montgomery, Peck

& Vining

22

Interpretation of the Parameters in Logistic Regression

• The odds ratio is found by taking antilogs:

Oddsx +1

ˆ1

ˆ

OR =

=e

Oddsx

• The odds ratio is interpreted as the estimated increase in the

probability of “success” associated with a one-unit increase in the

value of the predictor variable

Linear Regression Analysis 6E Montgomery, Peck

& Vining

23

Odds Ratio for the Challenger Data

Oˆ R = e −0.17132 = 0.84

This implies that every decrease of one degree in temperature increases the odds of Oring failure by about 1/0.84 = 1.19 or 19 percent

The temperature at Challenger launch was 22 degrees below the lowest observed

launch temperature, so now

Oˆ R = e 22( −0.17132) = 0.0231

This results in an increase in the odds of failure of 1/0.0231 = 43.34, or about 4200

percent!!

There’s a big extrapolation here, but if you knew this prior to launch, what decision

would you have made?

Linear Regression Analysis 6E Montgomery, Peck

& Vining

24

Inference on the Model Parameters

Linear Regression Analysis 6E Montgomery, Peck

& Vining

25

Inference on the Model Parameters

See slide 15;

Minitab calls

this “G”.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

26

Testing Goodness of Fit

Linear Regression Analysis 6E Montgomery, Peck

& Vining

27

Pearson chi-square goodness-of-fit statistic:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

28

The Hosmer-Lemeshow goodness-of-fit statistic:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

29

Refer to slide 15 for the Minitab output showing all three goodness-of-fit

statistics for the Challenger data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

30

Likelihood Inference on the Model Parameters

• Deviance can also be used to test hypotheses about subsets of the model

parameters (analogous to the extra SS method)

• Procedure:

= X1 + X 2 2 , with p parameters, 2 has r parameters

This full model has deviance ()

H 0 : 2 = 0

H1 : 2 0

The reduced model is = X1 , with deviance (1 )

The difference in deviance between the full and reduced models is

( | 1 ) = (1 ) − () with r degrees of freedom

( | 1 ) has a chi-square distribution under H 0 : = 0

Large values of ( | 1 ) imply that H 0 : = 0 should be rejected

Linear Regression Analysis 6E Montgomery, Peck

& Vining

31

Inference on the Model Parameters

• Tests on individual model coefficients can also be done using Wald inference.

• Uses the result that the MLEs have an approximate normal distribution, so the

distribution of

ˆ

Z0 =

se( ˆ )

is standard normal if the true value of the parameter is zero. Some computer

programs report the square of Z (which is chi-square), and others calculate the

P-value using the t distribution.

See slide 14 for the Wald test on the temperature parameter for the Challenger

data

Linear Regression Analysis 6E Montgomery, Peck

& Vining

32

Another Logistic Regression Example: The Pneumoconiosis Data

• A 1959 article in Biometrics reported the data:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

33

Linear Regression Analysis 6E Montgomery, Peck

& Vining

34

Linear Regression Analysis 6E Montgomery, Peck

& Vining

35

The fitted model:

Linear Regression Analysis 6E Montgomery, Peck

& Vining

36

Linear Regression Analysis 6E Montgomery, Peck

& Vining

37

Linear Regression Analysis 6E Montgomery, Peck

& Vining

38

Linear Regression Analysis 6E Montgomery, Peck

& Vining

39

Diagnostic Checking

Linear Regression Analysis 6E Montgomery, Peck

& Vining

40

Linear Regression Analysis 6E Montgomery, Peck

& Vining

41

Linear Regression Analysis 6E Montgomery, Peck

& Vining

42

Linear Regression Analysis 6E Montgomery, Peck

& Vining

43

Consider Fitting a More Complex Model

Linear Regression Analysis 6E Montgomery, Peck

& Vining

44

A More Complex Model

Is the expanded model useful? The Wald test on the term (Years)2 indicates that

the term is probably unnecessary.

Consider the difference in deviance:

() = ( ) =

( | ) = ( ) − () = −

= with 1 df (chi-square P-value = 0.0961)

Compare the P-values for the Wald and deviance tests

Linear Regression Analysis 6E Montgomery, Peck

& Vining

45

Linear Regression Analysis 6E Montgomery, Peck

& Vining

46

Linear Regression Analysis 6E Montgomery, Peck

& Vining

47

Linear Regression Analysis 6E Montgomery, Peck

& Vining

48

Other Models for Binary Response Data

Logit model

Probit model

Complimentary

log-log model

Linear Regression Analysis 6E Montgomery, Peck

& Vining

49

Linear Regression Analysis 6E Montgomery, Peck

& Vining

50

More than two categorical outcomes

Linear Regression Analysis 6E Montgomery, Peck

& Vining

51

Linear Regression Analysis 6E Montgomery, Peck

& Vining

52

Linear Regression Analysis 6E Montgomery, Peck

& Vining

53

Poisson Regression

• Consider now the case where the response is a count of some relatively

rare event:

– Defects in a unit of product

– Software bugs

– Particulate matter or some pollutant in the environment

– Number of Atlantic hurricanes

• We wish to model the relationship between the count response and one

or more regressor or predictor variables

• A logical model for the count response is the Poisson distribution

−

e

f ( y) =

, y = 0,1,…, and 0

y!

y

Linear Regression Analysis 6E Montgomery, Peck

& Vining

54

Poisson Regression

• Poisson regression is another case where the response variance is related

to the mean; in fact, in the Poisson distribution

E ( y ) = and Var ( y ) =

• The Poisson regression model is

yi = E ( yi ) + i = i + i , i = 1, 2,…, n

• We assume that there is a function g that relates the mean of the response

to a linear predictor

g ( i ) = i

= 0 + i xi1 + … + k xik

= xi

Linear Regression Analysis 6E Montgomery, Peck

& Vining

55

Poisson Regression

• The function g is called a link function

• The relationship between the mean of the response distribution

and the linear predictor is

i = g (i ) = g (xi)

−1

−1

• Choice of the link function:

– Identity link

– Log link (very logical for the Poisson-no negative predicted values)

g ( i ) = ln( i ) = xi

i = g −1 (xi) = ex

i

Linear Regression Analysis 6E Montgomery, Peck

& Vining

56

Poisson Regression

• The usual form of the Poisson regression model is

yi = e

xi

+ i , i = 1, 2,…, n

• This is a special case of the GLM; Poisson response and a log link

• Parameter estimation in Poisson regression is essentially equivalent

to logistic regression; maximum likelihood, implemented by IRLS

• Wald (large sample) and Deviance (likelihood-based) based

inference is carried out the same way as in the logistic regression

model

Linear Regression Analysis 6E Montgomery, Peck

& Vining

57

An Example of Poisson Regression

• The aircraft damage data

• Response y = the number of locations where damage was inflicted

on the aircraft

• Regressors:

0 = A-4

x1 = type of aircraft

1 = A-6

x2 = bomb load (tons)

x3 = total months of crew experience

Linear Regression Analysis 6E Montgomery, Peck

& Vining

58

The table contains data from 30

strike missions

There is a lot of multicollinearity

in this data; the A-6 has a two-man

crew and is capable of carrying a

heavier bomb load

All three regressors tend to increase

monotonically

Linear Regression Analysis 6E Montgomery, Peck

& Vining

59

Based on the full model, we can remove x3

However, when x3 is removed, x1 (type of

aircraft) is no longer significant – this is not

shown, but easily verified

This is probably multicollinearity at work

Note the Type 1 and Type 3 analyses for

each variable

Note also that the P-values for the Wald

tests and the Type 3 analysis (based on

deviance) don’t agree

Linear Regression Analysis 6E Montgomery, Peck

& Vining

60

Let’s consider all of the subset regression models:

Deleting either x1 or x2 results in a two-variable model that is worse than the full model

Removing x3 gives a model equivalent to the full model, but as noted before, x1 is

insignificant

One of the single-variable models (x2) is equivalent to the full model

Linear Regression Analysis 6E Montgomery, Peck

& Vining

61

The one-variable model

with x2 displays no

lack of fit (Deviance/df

= 1.1791)

The prediction equation

is

ˆy = e −1.6491+ 0.2282 x2

Linear Regression Analysis 6E Montgomery, Peck

& Vining

62

Another Example Involving

Poisson Regression

•The mine fracture data

•The response is a count of the number of

fractures in the mine

•The regressors are:

x1 = inner burden thickness (feet)

x2 = Percent extraction of the lower

previously mined seam

x3 = Lower seam height (feet)

x4 = Time in years that mine has been open

Linear Regression Analysis 6E Montgomery, Peck

& Vining

63

The * indicates the best model of a specific subset size

Note that the addition of a term cannot increase the

deviance (promoting the analog between deviance and

the “usual” residual sum of squares)

To compare the model with only x1, x2, and x4 to the

full model, evaluate the difference in deviance:

38.03 – 37.86 = 0.17

with 1 df. This is not significant.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

64

There is no indication of lack of fit: deviance/df = 0.9508

The final model is:

yˆ = e

−3.721− 0.0015 x1 + 0.0627 x2 − 0.0317 x4

Linear Regression Analysis 6E Montgomery, Peck

& Vining

65

The Generalized Linear Model

• Poisson and logistic regression are two special cases of the GLM:

– Binomial response with a logistic link

– Poisson response with a log link

• In the GLM, the response distribution must be a member of the exponential family:

f ( yi , i , ) = exp{[ yi i − b( i )] / a( ) + h( yi , )}

= scale parameter

i = natural location parameter(s)

• This includes the binomial, Poisson, normal, inverse normal, exponential, and gamma

distributions

Linear Regression Analysis 6E Montgomery, Peck

& Vining

66

The Generalized Linear Model

• The relationship between the mean of the response

distribution and the linear predictor is determined by the

link function

i = g −1 (i ) = g −1 (xi)

• The canonical link is specified when

i = i

• The canonical link depends on the choice of the response

distribution

Linear Regression Analysis 6E Montgomery, Peck

& Vining

67

Canonical Links for the GLM

Linear Regression Analysis 6E Montgomery, Peck

& Vining

68

Links for the GLM

• You do not have to use the canonical link, it just simplifies

some of the mathematics.

• In fact, the log (non-canonical) link is very often used with

the exponential and gamma distributions, especially when

the response variable is nonnegative.

• Other links can be based on the power family (as in power

family transformations), or the complimentary log-log

function.

Linear Regression Analysis 6E Montgomery, Peck

& Vining

69

Parameter Estimation and Inference in the GLM

• Estimation is by maximum likelihood (and IRLS); for the

canonical link the score function is

X(y − ) = 0

• For the case of a non-canonical link,

X (y − ) = 0

= diag (d i / di )

• Wald inference and deviance-based inference is conducted

just as in logistic and Poisson regression

Linear Regression Analysis 6E Montgomery, Peck

& Vining

70

This is “classical data”; analyzed by

many.

y = cycles to failure, x1 = cycle length,

x2 = amplitude, x3 = load

The experimental design is a 33

factorial

Most analysts begin by fitting a full

quadratic model using ordinary least

squares

Linear Regression Analysis 6E Montgomery, Peck

& Vining

71

DESIGN-EXPERT Plot

Cycles

Box-Cox Plot for Power Transforms

20.56

L n (R e s id u a lS S )

Lambda

Current = 1

Best = -0.19

Low C.I. = -0.54

High C.I. = 0.22

18.46

Recommend transform:

Log

(Lambda = 0)

Design-Expert V6 was used

to analyze the data

A log transform is suggested

16.37

14.27

12.18

-3

-2

-1

0

1

2

3

Lam bda

Linear Regression Analysis 6E Montgomery, Peck

& Vining

72

The Final Model is First-Order:

ˆ = e6.34 + 0.83 x1 −0.63 x2 −0.39 x3

y

Response: Cycles

Transform: Natural log Constant:

ANOVA for Response Surface Linear Model

Analysis of variance table [Partial sum of squares]

Sum of

Mean

F

Source

Squares

DF

Square

Value

Model

22.32

3

7.44

213.50

A

12.47

1

12.47

357.87

B

7.11

1

7.11

204.04

C

2.74

1

2.74

78.57

Residual

0.80

23

0.035

Cor Total

23.12

26

0.000

Prob > F

< 0.0001
< 0.0001
< 0.0001
< 0.0001
Std. Dev.
Mean
C.V.
0.19
6.34
2.95
R-Squared 0.9653
Adj R-Squared
Pred R-Squared
0.9608
0.9520
PRESS
1.11
Adeq Precision
51.520
Factor
Coefficient
Estimate
DF
Standard
Error
95% CI
Low
95% CI
High
Intercept
6.34
1
0.036
6.26
6.41
A-A
B-B
C-C
0.83
-0.63
-0.39
1
1
1
0.044
0.044
0.044
0.74
-0.72
-0.48
0.92
-0.54
-0.30
Linear Regression Analysis 6E Montgomery, Peck
& Vining
73
DESIGN-EXPERT Plot
DESIGN-EXPERT Plot
Ln(Cycles)
1.00
Ln(Cycles)
X = A: A
Y = B: B
Ln(Cycles)
Design Points
X = A: A
Y = B: B
Actual Factor
C: C = -1.00
5.84934
3594.79
2744.48
0.50
1894.16
Actual Factor
C: C = -1.00
Cycle s
1043.84
B: B
6.33631
193.529
6.82328
0.00
7.31025
1.00
-0.50
1.00
0.50
0.50
0.00
7.89149
B: B
0.00
-0.50
-1.00
-1.00
-0.50
-1.00
-0.50
0.00
0.50
A: A
-1.00
1.00
A: A
Contour plot (log cycles) & response surface (cycles)
Linear Regression Analysis 6E Montgomery, Peck
& Vining
74
A GLM for the Worsted Yarn Data
• We selected a gamma response distribution with a log link
• The resulting GLM (from SAS) is
ˆ =e
y
6.3489 + 0.8425 x1 − 0.6313 x2 − 0.3851 x3
• Model is adequate; little difference between GLM & OLS
• Contour plots (predictions) very similar
Linear Regression Analysis 6E Montgomery, Peck
& Vining
75
The SAS PROC GENMOD
output for the worsted yarn
experiment, assuming a firstorder model in the linear
predictor
Scaled deviance divided by
df is the appropriate lack of
fit measure in the gamma
response situation
Linear Regression Analysis 6E Montgomery, Peck
& Vining
76
Comparison of the OLS and GLM
Models
Linear Regression Analysis 6E Montgomery, Peck
& Vining
77
A GLM for the Worsted Yarn Data
• Confidence intervals on the mean response are uniformly
shorter from the GLM than from least squares
• See Lewis, S. L., Montgomery, D. C., and Myers, R. H.
(2001), “Confidence Interval Coverage for Designed
Experiments Analyzed with GLMs”, JQT, 33, pp. 279-292
• While point estimates are very similar, the GLM provides
better precision of estimation
Linear Regression Analysis 6E Montgomery, Peck
& Vining
78
Residual Analysis in the GLM
• Analysis of residuals is important in any model-fitting procedure
• The ordinary or raw residuals are not the best choice for the GLM,
because the approximate normality and constant variance
assumptions are not satisfied
• Typically, deviance residuals are employed for model adequacy
checking in the GLM.
• The deviance residuals are the square roots of the contribution to
the deviance from each observation, multiplied by the sign of the
corresponding raw residual:
rDi = di sign( yi − yˆi )
Linear Regression Analysis 6E Montgomery, Peck
& Vining
79
Deviance Residuals:
• Logistic regression:
yi
1 − ( yi / ni )
di = yi ln
+ (ni − yi )
ˆ
ˆ
n
1
−
i
i i
1
ˆi =
− xi ˆ
1+ e
• Poisson regression:
yi
xi ˆ
di = yi ln xˆ − ( yi − e )
e i
Linear Regression Analysis 6E Montgomery, Peck
& Vining
80
Deviance Residual Plots
• Deviance residuals behave much like ordinary residual in normal-theory
linear models
• Normal probability plot is appropriate
• Plot versus fitted values, usually transformed to the constant-information
scale:
Normal responses, yˆi
Binomial responses, 2sin −1 ( yˆi )
Poisson responses, 2 yˆi
Gamma responses, 2 ln( yˆi )
Linear Regression Analysis 6E Montgomery, Peck
& Vining
81
Linear Regression Analysis 6E Montgomery, Peck
& Vining
82
Deviance Residual Plots for the Worsted Yarn Experiment
Linear Regression Analysis 6E Montgomery, Peck
& Vining
83
Overdispersion
• Occurs occasionally with Poisson or binomial data
• The variance of the response is greater than one would anticipate
based on the choice of response distribution
• For example, in the Poisson distribution, we expect the variance to be
approximately equal to the mean – if the observed variance is greater,
this indicates overdispersion
• Diagnosis – if deviance/df greatly exceeds unity, overdispersion may
be present
• There may be other reasons for deviance/df to be large, such as a
poorly specified model, missing regressors, etc (the same things that
cause the mean square for error to be inflated in ordinary least
squares modeling)
Linear Regression Analysis 6E Montgomery, Peck
& Vining
84
Overdispersion
• Most direct way to model overdispersion is with a multiplicative
dispersion parameter, say , where
Var ( y ) = (1 − ), binomial
Var ( y ) = , Poisson
• A logical estimate for is deviance/df
• Unless overdispersion is accounted for, the standard errors will be
too small.
• The adjustment consists of multiplying the standard errors by
deviance/df
Linear Regression Analysis 6E Montgomery, Peck
& Vining
85
The Wave-Soldering Experiment
• Response is the number of defects
• Seven design variables:
– A = prebake condition
– B = flux density
– C = conveyor speed
– D = preheat condition
– E = cooling time
– F = ultrasonic solder agitator
– G = solder temperature
Linear Regression Analysis 6E Montgomery, Peck
& Vining
86
The Wave-Soldering Experiment
One observation has been
discarded, as it was
suspected to be an outlier
This is a resolution IV
design
Linear Regression Analysis 6E Montgomery, Peck
& Vining
87
The Wave-Soldering Experiment
5 of 7 main effects significant; AC, AD, BC,
and BD also significant
Overdispersion is a possible problem, as
deviance/df is large
Overdispersion causes standard errors to be
underestimated, and this could lead to
identifying too many effects as significant
deviance / df
= 4.234 = 2.0577
Linear Regression Analysis 6E Montgomery, Peck
& Vining
88
After adjusting for overdispersion,
fewer effects are significant
C, G, AC, and BD the important
factors, assuming a 5% significance
level
Note that the standard errors are larger
than they were before, having been
multiplied by
deviance / df
= 4.234 = 2.0577
Linear Regression Analysis 6E Montgomery, Peck
& Vining
89
The Edited
Model for the
Wave-Soldering
Experiment
Linear Regression Analysis 6E Montgomery, Peck
& Vining
90
Generalized Linear Models
• The GLM is a unification of linear and nonlinear models that can
accommodate a wide variety of response distributions
• Can be used with both regression models and designed experiments
• Computer implementations in Minitab, JMP, SAS (PROC GENMOD), SPlus
• Logistic regression available in many basic packages
• GLMs are a useful alternative to data transformation, and should always be
considered when data transformations are not entirely satisfactory
• Unlike data transformations, GLMs directly attack the unequal variance
problem and use the maximum likelihood approach to account for the form
of the response distribution
Linear Regression Analysis 6E Montgomery, Peck
& Vining
91
Chapter 15
Other Topics in the use of
Regression Analysis
Linear Regression Analysis 6E Montgomery, Peck
& Vining
92
15.1.1 Need for Robust Regression
Linear Regression Analysis 6E Montgomery, Peck
& Vining
93
Linear Regression Analysis 6E Montgomery, Peck
& Vining
94
Linear Regression Analysis 6E Montgomery, Peck
& Vining
95
Linear Regression Analysis 6E Montgomery, Peck
& Vining
96
Finding the MLEs involves minimizing the sum of the absolute
errors, not the squared errors
Linear Regression Analysis 6E Montgomery, Peck
& Vining
97
15.1.2 M-Estimators
Linear Regression Analysis 6E Montgomery, Peck
& Vining
98
Linear Regression Analysis 6E Montgomery, Peck
& Vining
99
Linear Regression Analysis 6E Montgomery, Peck
& Vining
100
Linear Regression Analysis 6E Montgomery, Peck
& Vining
101
Linear Regression Analysis 6E Montgomery, Peck
& Vining
102
Linear Regression Analysis 6E Montgomery, Peck
& Vining
103
Linear Regression Analysis 6E Montgomery, Peck
& Vining
104
Linear Regression Analysis 6E Montgomery, Peck
& Vining
105
Linear Regression Analysis 6E Montgomery, Peck
& Vining
106
Example 15.1 The Stack Loss Data
Linear Regression Analysis 6E Montgomery, Peck
& Vining
107
Linear Regression Analysis 6E Montgomery, Peck
& Vining
108
Linear Regression Analysis 6E Montgomery, Peck
& Vining
109
Linear Regression Analysis 6E Montgomery, Peck
& Vining
110
Properties of Robust Estimators
Linear Regression Analysis 6E Montgomery, Peck
& Vining
111
Linear Regression Analysis 6E Montgomery, Peck
& Vining
112
15.2 Effect of Measurement Errors in the Regressors
Linear Regression Analysis 6E Montgomery, Peck
& Vining
113
15.3 Inverse Estimation – The Calibration Problem
Linear Regression Analysis 6E Montgomery, Peck
& Vining
114
Linear Regression Analysis 6E Montgomery, Peck
& Vining
115
Example 15.2 Thermocouple Calibration
Linear Regression Analysis 6E Montgomery, Peck
& Vining
116
Linear Regression Analysis 6E Montgomery, Peck
& Vining
117
Linear Regression Analysis 6E Montgomery, Peck
& Vining
118
Linear Regression Analysis 6E Montgomery, Peck
& Vining
119
Linear Regression Analysis 6E Montgomery, Peck
& Vining
120
Linear Regression Analysis 6E Montgomery, Peck
& Vining
121
15.4 Bootstrapping in Regression
Linear Regression Analysis 6E Montgomery, Peck
& Vining
122
15.4.1 Bootstrap Sampling in Regression
Linear Regression Analysis 6E Montgomery, Peck
& Vining
123
Linear Regression Analysis 6E Montgomery, Peck
& Vining
124
15.4.2 Bootstrap Confidence Intervals
Linear Regression Analysis 6E Montgomery, Peck
& Vining
125
Linear Regression Analysis 6E Montgomery, Peck
& Vining
126
Linear Regression Analysis 6E Montgomery, Peck
& Vining
127
Example 15.3 The Delivery Time Data
Linear Regression Analysis 6E Montgomery, Peck
& Vining
128
Linear Regression Analysis 6E Montgomery, Peck
& Vining
129
Linear Regression Analysis 6E Montgomery, Peck
& Vining
130
Example 15.4 The Puromycin Date
Linear Regression Analysis 6E Montgomery, Peck
& Vining
131
Linear Regression Analysis 6E Montgomery, Peck
& Vining
132
Linear Regression Analysis 6E Montgomery, Peck
& Vining
133
Linear Regression Analysis 6E Montgomery, Peck
& Vining
134
Linear Regression Analysis 6E Montgomery, Peck
& Vining
135
15.7 Designed Experiments for Regression
Linear Regression Analysis 6E Montgomery, Peck
& Vining
136
Linear Regression Analysis 6E Montgomery, Peck
& Vining
137
Linear Regression Analysis 6E Montgomery, Peck
& Vining
138
Linear Regression Analysis 6E Montgomery, Peck
& Vining
139
Linear Regression Analysis 6E Montgomery, Peck
& Vining
140
Linear Regression Analysis 6E Montgomery, Peck
& Vining
141
Average or Integrated Prediction Variance:
Linear Regression Analysis 6E Montgomery, Peck
& Vining
142
Designs for Second-Order Models
Standard designs include the central composite design
and the Box-Behnken design
Linear Regression Analysis 6E Montgomery, Peck
& Vining
143
Linear Regression Analysis 6E Montgomery, Peck
& Vining
144
Linear Regression Analysis 6E Montgomery, Peck
& Vining
145
Software will construct D and I optimal designs
Linear Regression Analysis 6E Montgomery, Peck
& Vining
146
Design efficiencies; D, G, and I
Linear Regression Analysis 6E Montgomery, Peck
& Vining
147
Linear Regression Analysis 6E Montgomery, Peck
& Vining
148
Linear Regression Analysis 6E Montgomery, Peck
& Vining
149
Regression Methods/Analysis HW #4 (7 questions on 2 pages)
June 29, 2022
1. (18 pts) Answer True or False. If your answer is False, explain why it is False.
a. If least squares lines are ﬁtted to a data set using two sets of explanatory variables of the same size, we should choose the model
with larger R2. __________
b. For a multiple regression, the log transformation of the response variable is a possible technique that can be used to reduce
multicollinearity. __________
c. The goal of cross-validation of a fitted model is to simulate the use of observations not in the data set used to build the fitted
model. __________
d. Overfitting is more likely in larger data sets than in smaller data sets. __________
e. We can fix overfitting by adding predictors to the model. __________
f. A stepwise regression procedure should still be performed even if the number of variables is small, say k

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.