Response Transformations

Transformation of the response is an important component of any data analysis. Transformation is needed if the error (residuals) is a function of the magnitude of the response (predicted values). Design-Expert provides extensive diagnostic capabilities to check if the statistical assumptions underlying the data analysis are met. The normal plot of the residuals tests their normality. The residuals versus predicted response values plot will indicate a problem if a pattern exists. Unless the ratio of the maximum response to the minimum response is large, transforming the response will not make much difference.

The Box-Cox plot on the Diagnostics button will provide a recommended transformation from the power family. The two non-power law transformations, logit for bounded data and arcsin-sqrt for proportions, must be applied based on the type of response. The Box-Cox plot will often recommend a square-root transformation when proportion data is present, and the log transformation for bounded data.

Design-Expert provides a broad range of possible transformations - most are from the power family, plus there are two additional transformations, the logit and the arcsine square root.

Most data transformations can be described by the power function, l) power gives a scale satisfying the equal variance requirement of the statistical model.

The appropriate choice of a response transformation relies on subject-matter knowledge and/or statistical considerations. The available transformations and examples for their use are:

Power Law/Standard

Square Root – count, frequency data

Natural log – variance or growth data

Base 10 log – variance or growth data

Inverse square root

Inverse – rate/time, decay rate

Power – for more extreme transformation needs

The power transformation allows transformation to any power in the range –3 to +3, provided the data are positive. You may add a constant to the data to avoid powers of negative numbers. If the standard deviation associated with an observation is proportional to the mean raised to some power, then transforming the observation by a power gives a scale satisfying the equal variance requirement of the ANOVA. The Box-Cox plot is provided in the Diagnostics plots to help you choose an appropriate power transformation.

Logistic Regression

Logistic regression analysis estimates the odds of an event.

Logistic regression models the odds (or chance) of an outcome based on input factors. Because odds is a ratio, what will actually be modeled is the logarithm of the odds given by:

\[Logit(p) = \left(\frac{p(y=1)}{1 - p(y=1)}\right) = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \cdots + \beta_{k}x_{k}\]
\[\hat{p}=\frac{e^{\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+ \beta_{k}x_{k}}}{1+e^{\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+ \beta_{k}x_{k}}}=\frac{1}{1+e^{-(\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+ \beta_{k}x_{k})}}\]

We fit a model (z) for Logit(p):

\[z = \beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+ \beta_{k}x_{k}\]

Then we apply the inverse transformation:

\[\hat{p} = \frac{1}{1+e^{-(z)}}\]

Poisson Regression

Poisson regression is used to model count data.

Special Cases

Logit

The logit transformation is used when the response has a unreachable lower and upper physical limits. One example is the yield of a chemical reaction. The physical bounds are 0% and 100%, but in practice the actual yields will not quite reach 100% due to impurities, energy loss, etc. The logit transform spreads out the values near the boundaries. When using this transformation, it is very important to correctly set the lower and upper limits to the natural limits of the response.

\[\log_{e}\begin{bmatrix} \frac{Y\: -\: lower\: limit\: of\: Y} {upper\: limit\: of\: Y\: -\: Y } \end{bmatrix}\]

Arcsine square root

The arcsine square root should be used for proportion data. Proportion data is a fraction between 0 and 1 inclusive. The assumption is a batch of size “n” is generated by the settings of each run. Each individual member of the batch has a binomial outcome, either passing or failing a specified criteria.

\[\arcsin \begin{pmatrix}{\sqrt{Y}}\end{pmatrix}\]

References

  • D. Miller. Reducing transformation bias in curve fitting. The American Statistician, 38(2):124–126, 1984.