This section contains descriptions of each case statistic. The values in this report table are used to produce the diagnostics graphs.

**Run Order**: The randomized order for the experiments.

**Actual Value**: The measured response data for this particular run, y_{i}.

**Predicted Value**: The value predicted from the model, generated using the
prediction equation, includes block and center-point corrections, when they are
part of the design.

\(\hat{\bar{Y}} = X\hat{\beta}\)

**Residual**: Difference between Actual and Predicted values for each point.

\(e = Y - \hat{\bar{Y}}\)

**Leverage**: Leverage of a point varies from 0 to 1 and indicates how much an
individual design point influences the model’s predicted values. A leverage of 1
means the predicted value at that particular case will exactly equal the
observed value of the experiment, i.e., the residual will be 0. The sum of
leverage values across all cases equals the number of coefficients (including
the constant) fit by the model. The maximum leverage an experiment can have is
1/k, where k is the number of times the experiment is replicated.

\(H = X(X^T X)^{-1}X^T\)

\(Leverage = diag(H)\)

where, **X** is the model matrix that has one row for each run in the design (n)
and one column for each term in the model (p). **H** is therefore an n x n
symmetric matrix, often called the hat matrix. The diagonal elements of the
**H** matrix are the leverages. Leverage represents the fraction of the error
variance, associated with the point estimate, carried into the model. A leverage
of 1 means that any error (experimental, measurement, etc.) associated with an
observation is carried into the model and included in the prediction.

**Internally Studentized Residual**: The residual divided by the estimated
standard deviation (Std Dev) of that residual. It measures the number of standard
deviations separating the actual and predicted values.

\(r_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - Leverage_i}}\)

**Externally Studentized Residual (a.k.a. Outlier t-value, RStudent)**: is
calculated by leaving each run, one at a time, out of the analysis and
estimating the response from the remaining runs. (See Weisberg page 115.) The
t-value is the number of standard deviations difference between this predicted
value and the actual response. This tests whether the run in question follows
the model with coefficients estimated from the rest of the runs, that is,
whether this run is consistent with the rest of the data for this model. Runs
with large t-values should be investigated.

\(\hat{\sigma}^2_{(-i)} = \frac{(n - p)\cdot{\hat{\sigma}^2} - {\frac{e^2_i}{(1 - Leverage_i)}}}{n - p - 1}\)

\(t_i = \frac{e_i}{\sqrt{{\hat{\sigma}^2_{(-i)}}(1 - Leverage_i)}}\)

where, n is the number of runs minus the one being left out and p is the number of terms in the model including the intercept. The deletion variance can also be computed through brute force model fitting leaving each run out one at a time.

**DFFITS**: Measures the influence the ith observation has on the predicted value.
(See Myers page 284.) It is the studentized difference between the predicted
value with observation i and the predicted value without observation i:

\(\hat{Y}_{(-i)} = Y + \frac{e}{1 - Leverage}\)

\(DFFITS = \frac{\hat{\bar{Y}} - \hat{Y}_{(-i)}}{\sqrt{\hat{\sigma}^2_{(-i)}} \cdot Leverage}\)

**DFBETAS**: Not Shown on the report but present on the diagnostic graphs. This
statistic is calculated for each coefficient at each run. The influence tool has
a pull-down to pick which term’s graph is shown. Shows the influence the
i^{th} observation has on each regression coefficient. (See Myers page
284.) The DFBETASj,i is the number of standard errors that the j^{th}
coefficient changes if the i^{th} observation is removed.

\(DFBETAS_{j, (-i)} = \frac{\hat{\beta}_j - \hat{\beta}_{j,(-i)}}{\sqrt{\hat{\sigma}^2_{(-i), i} \cdot (X^T X)^{-1}_{jj}}}\)

A large DFBETAS_{j,-i} value indicates that the i^{th} observation
has extra influence on the j^{th} regression coefficient.

**Cook’s Distance**: A measure of how much the regression would change if the case
is omitted from the analysis. Relatively large values are associated with cases
with high leverage and large studentized residuals. Cases with large Di values
relative to the other cases should be investigated - they could be caused by
recording errors, an incorrect model, or a design point far from the remaining
cases.

Cook’s distance (D_{i}) is a product of the square of the i^{th}
internally studentized residual and a monotonic function of the leverage:

\(D_i = \frac{r^2_i}{p + 1}\left(\frac{Leverage_i}{1 - Leverage_i}\right)\)

A large value in D may be due to large r, large leverage, or both.

Cook’s distance can be thought of as the average squared difference between the
predictions that result from the full data set and those that result from a
reduced data set (deleting the i^{th} observation) compared to the error
mean squared of the fitted model. An equivalent interpretation of D is as a
standardized weighted distance between the vector of regression coefficients
obtained from the full model and the vector obtained after deleting the i^{th}
case. If the value of D is substantially less than 1, deleting the i^{th}
case will not change the estimates of the regression coefficients very much.

In a perfectly balanced orthogonal array, Cook’s distance and the externally studentized residual are directly related and thus give the same information. In general regression problems there can be considerable differences in the information contained in the two statistics, in other words, different runs may be identified for investigation.

**Standard Order**: A conventional “textbook” ordering of the array of low and
high factor levels.

For further reading:

Christensen, Pearson, and Johnson. Case-deletion diagnostics for mixed models.

*Technometrics*, 34(1):38–45, 1992.Raymond H. Myers.

*Classical and Modern Regression with Applications*. Duxbury Press, 1986.Weisberg and Stanford.

*Applied Linear Regression*. John Wiley & Sons, Inc., 1985.