Cook’s Distance¶

A measure of how much the regression changes if the case is deleted. Relatively large values are associated with cases with high leverage and large studentized residuals.

An equivalent interpretation is as a standardized weighted distance between the vector of regression coefficients generated by the full data set and a data set with the ith run removed.

Large values should be investigated – they could be caused by recording errors, an incorrect model, or a design point far from the remaining cases.

“Large” is the value of the red line which is set as the minimum of 1 or the F critical value at alpha of 0.5 using p and n-p degrees of freedom, where p is the number of terms in the model including the intercept and n is the number of runs.

$$min(F_{(0.5,\,p,\,n-p)}^{-1},\, 1)$$

If more than one outlier emerges from other diagnostics plots, Cook’s Distance can be used to prioritize which runs to investigate first.

Note

Never ignore a run just because the diagnostics plots indicate it may be a problem. Verify that the data is wrong in some way before ignoring it.