## Greg's DOE Adventure: Important Statistical Concepts behind DOE

posted by Greg on May 3, 2019

If you read my previous post, you will remember that design of experiments (DOE) is a systematic method used to find cause and effect. That systematic method includes a lot of (frightening music here!) statistics.

[I’ll be honest here. I was a biology major in college. I was forced to take a statistics course or two. I didn’t really understand why I had to take it. I also didn’t understand what was being taught. I know a lot of others who didn’t understand it as well. But it’s now starting to come into focus.]

Before getting into the concepts of DOE, we must get into the basic concepts of statistics (as they relate to DOE).

Basic Statistical Concepts:

Variability
In an experiment or process, you have inputs you control, the output you measure, and uncontrollable factors that influence the process (things like humidity). These uncontrollable factors (along with other things like sampling differences and measurement error) are what lead to variation in your results.

Mean/Average
We all pretty much know what this is right? Add up all your scores, divide by the number of scores, and you have the average score.

Normal distribution
Also known as a bell curve due to its shape. The peak of the curve is the average, and then it tails off to the left and right.

Variance
Variance is a measure of the variability in a system (see above). Let’s say you have a bunch of data points for an experiment. You can find the average of those points (above). For each data point subtract that average (so you see how far away each piece of data is away from the average). Then square that. Why? That way you get rid of the negative numbers; we only want positive numbers. Why? Because the next step is to add them all up, and you want a sum of all the differences without negative numbers getting in the way. Now divide that number by the number of data points you started with. You are essentially taking an average of the squares of the differences from the mean.

That is your variance. Summarized by the following equation:

$$s^2 = \frac{\Sigma(Y_i - \bar{Y})^2}{(n - 1)}$$

In this equation:

Yi is a data point
Ȳ is the average of all the data points
n is the number of data points

Standard Deviation
Take the square root of the variance. The variance is the average of the squares of the differences from the mean. Now you are taking the square root of that number to get back to the original units. One item I just found out: even though standard deviations are in the original units, you can’t add and subtract them. You have to keep it as variance (s2), do your math, then convert back.

## Choosing the Best Design for Process Optimization

posted by Shari on Aug. 29, 2017

Ever wonder what the difference is between the various response surface method (RSM) optimization design options? To help you choose the best design for your experiment, I’ve put together a list of things you should know about each of the three primary response surface designs—Central Composite, Box-Behnken, and Optimal.

Central Composite Design (CCD)

• Developed for estimating a quadratic model
• Created from a two-level factorial design, and augmented with center points and axial points
• Relatively insensitive to missing data
• Features five levels for each factor (Note: The number of levels can be reduced by choosing alpha=1.0, a face-centered CCD which has only three levels for each factor.)
• Provides excellent prediction capability near the center (bullseye) of the design space

Box-Behnken Design (BBD)

• Created for estimating a quadratic model
• Requires only three levels for each factor
• Requires specific positioning of design points
• Provides strong coefficient estimates near the center of the design space, but falls short at the corners of the cube (no design points there)
• BBD vs CCD: If you end up missing any runs, the accuracy of the remaining runs in the BBD becomes critical to the dependability of the model, so go with the more robust CCD if you often lose runs or mismeasure responses.

Optimal Design

• Customize for fitting a linear, quadratic or cubic model (Note: In Design-Expert® software you can change the user preferences to get up to a 6th order model.)
• Produce many levels when augmented as suggested by Design-Expert, but these can be limited by choosing the discrete factor option
• Design points are positioned mathematically according to the number of factors and the desired model, therefore the points are not at any specific positions—they are simply spread out in the design space to meet the optimality criteria, particularly when using the coordinate exchange algorithm
• The default optimality for “I” chooses points to minimize the integral of the prediction variance across the design space, thus providing good response estimation throughout the experimental region.
• Other comments: If you have knowledge of the subject matter, you can edit the desired model by removing the terms that you know are not significant or can't exist. This will decrease the required number of runs. Also, you can also add constraints to your design space, for instance to exclude particular factor combinations that must be avoided, e.g., high-temperature and high time for cooking.

For an in-depth exploration of both factorial and response surface methods, attend Stat-Ease’s Modern DOE for Process Optimization workshop.

Shari Kraber
shari@statease.com

## The Importance of Center Points in Central Composite Designs

posted by Pat on March 24, 2017

A central composite design (CCD) is a type of response surface design that will give you very good predictions in the middle of the design space.  Many people ask how many center points (CPs) they need to put into a CCD. The number of CPs chosen (typically 5 or 6) influences how the design functions.

Two things need to be considered when choosing the number of CPs in a central composite design:

1)  Replicated center points are used to estimate pure error for the lack of fit test. Lack of fit indicates how well the model you have chosen fits the data.  With fewer than five or six replicates, the lack of fit test has very low power.  You can compare the critical F-values (with a 5% risk level) for a three-factor CCD with 6 center points, versus a design with 3 center points.  The 6 center point design will require a critical F-value for lack of fit of 5.05, while the 3 center point design uses a critical F-value of 19.30.  This means that the design with only 3 center points is less likely to show a significant lack of fit, even if it is there, making the test almost meaningless.

TIP: True “replicates” are runs that are performed at random intervals during the experiment. It is very important that they capture the true normal process variation! Do not run all the center points grouped together as then most likely their variation will underestimate the real process variation.0

2)  The default number of center points provides near uniform precision designs.  This means that the prediction error inside a sphere that has a radius equal to the +/- 1 levels is nearly uniform.  Thus, your predictions in this region (+/- 1) are equally good.  Too few center points inflate the error in the region you are most interested in.  This effect (a “bump” in the middle of the graph) can be seen by viewing the standard error plot, as shown in Figures 1 & 2 below. (To see this graph, click on Design Evaluation, Graph and then View, 3D Surface after setting up a design.)

Ask yourself this—where do you want the best predictions? Most likely at the middle of the design space. Reducing the number of center points away from the default will substantially damage the prediction capability here! Although it can seem tedious to run all of these replicates, the number of center points does ensure that the analysis of the design can be done well, and that the design is statistically sound.