As a chemical engineer with roots as an R&D process developer, the appeal of design of experiments (DOE) is its ability to handle multiple factors simultaneously. Traditional scientific methods restrict experimenters to one factor at a time (OFAT), which is inefficient and does not reveal interactions. However, a simple-comparative OFAT often suffices for a process improvement. If this is all that’s needed, you may as well do it right statistically. As industrial-statistical guru George Box reportedly said “DOE is a wonderful comparison machine.”

A fellow named William Sealy Gosset developed the statistical tools for simple-comparative experiments (SCE) in the early 1900s. As Head Experimental Brewer for Guiness in Dublin, he evaluated hops from various regions soft resin content—a critical ingredient for optimizing the bitterness on preserving their beer.1 To compare the results from one source versus another with statistical rigor, Gosset invented the t-test—a great tool for DOE even today (and far easier to do with modern software!).

The t-test simply compares two means relative to the standard deviation of the difference. The result can be easily interpreted with a modicum of knowledge about normal distributions: As t increases beyond 2 standard deviations, the difference becomes more and more significant. Gosset’s breakthrough came by his adjustment of the distribution for small sample sizes, which make the tails on the bell shape curve slightly fatter and the head somewhat lower as shown in Figure 1. The correction, in this case for a test comparing a sample of 4 results for one level versus 4 at the other, is minor but very important to get the statistics right.

Figure 1. Normal curve versus t-distribution (probabilities plotted by standard deviations from zero)

To illustrate a simple comparative DOE, consider a case study on the filling of 16-ounce plastic bottles with two production machines—line 1 and line 2.2 The packaging engineers must assess whether they differ. To make this determination, they set up an experiment to randomly select 10 bottles from each machine. Stat-Ease software makes this easy via its Factorial, Randomized, Multilevel Categorical design option as shown by the screen shot in Figure 2.

Figure 2. Setting up a simple comparative DOE in Stat-Ease software

The resulting volumes in ounces are shown below (mean outcome shown in parentheses).

- 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99 (16.02)
- 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00 (16.01)

Stat-Ease software translates the mean difference between the two machines (0.01 ounce) a t value of 0.7989, that is, less than one standard deviation apart, which produces a p-value of 0.4347—far above the generally acceptable standard of p<0.05 for significance. Its Model Graph in Figure 3 displays all the raw data, the means of each level and their least significant difference (LSD) bars based on a t test at p of 0.05—notice how they overlap from left to right—clearly the difference is not significant.

Figure 3. Graph showing effect on fill from one machine line to the other

Thus, from the stats and at first glance of the effect graph it seems that the packaging engineers need not worry about any differences between the two machine lines. But hold on before jumping to a final conclusion: What if a difference of 0.01 ounce adds up to a big expense over a long period of time? The managers overseeing the annual profit and loss for the filling operation would then be greatly concerned. Before doing any designed experiment, it pays to do a power calculation to work out how many runs are needed to see a minimal difference (signal ‘delta’) of importance relative to the variation (noise ‘sigma). In this case, the power for sample size 10 for a delta of 0.01 ounce with a sigma (standard deviation) of 0.028 ounces (provided by Stat-Ease software) generates a power of only 11.8%—far short of the generally acceptable level of 80%. Further calculations reveal that if this small of a difference really needed to be detected, they should fill 125 or more bottles on each line.

In conclusion, it turns out that simple comparative DOEs are not all that simple to do correctly from a statistical perspective. Some keys to getting these two level OFAT experiments done right are:

- Randomizing the run order (a DOE fundamental for washing out the impact of time-related lurking factors such as steadily increasing temperature or humidity).
- Performing at least 4 runs at each level—more if needed to achieve adequate power (always calculate this before pressing ahead!).
- Blocking out know sources of variation via a paired t-test,3 e.g., when assessing two runners, rather than them each running a number of time trials one after the other, race them together side-by-side, thus eliminating the impact of changing wind and other environmental conditions.
- Always deploying a non-directional two-tailed t-test4 (a fun alliteration!)—as done by default in Stat-Ease software; the option for a one-tailed t-test requires an assumption that one level of the tested factor will certainly be superior to the other (i.e., directional), which may produce false-positive significance; before going this route consult with our StatHelp consulting team.

- For more background on Gosset and his work for Guiness, see my 8/9/24 StatsMadeEasy blog on The secret sauce in Guinness beer?
- From Chapter 2, “Simple Comparative Experiments”, problem 2.24,
*Design and Analysis of Experiments, 8th Edition*, Douglas C. Montgomery, John Wiley and Sons, New York, NY, 2013. - “Letter to a Young Statistician: On ‘Student’ and the Lanarkshire Milk Experiment”,
*Chance Magazine*: Volume 37, No. 1, Stephen T. Ziliak. - Wikipedia, One- and two-tailed tests.

- One Factor tutorials in Program Help.
- Stat-Ease Academy eLearning PreDOE course, which includes a t-statistic software tutorial.

Observing process improvement teams at Imperial Chemical Industries in the late 1940s George Box, the prime mover for response surface methods (RSM), realized that as a practical matter, statistical plans for experimentation must be very flexible and allow for a series of iterations. Box and other industrial statisticians continued to hone the strategy of experimentation to the point where it became standard practice for stats-savvy industrial researchers.

Via their Management and Technology Center (sadly, now defunct), Du Pont then trained legions of engineers, scientists, and quality professionals on a “Strategy of Experimentation” called “SCO” for its sequence of **s**creening, **c**haracterization and **o**ptimization. This now-proven SCO strategy of experimentation, illustrated in the flow chart below, begins with fractional two-level designs to screen for previous unknown factors. During this initial phase, experimenters seek to discover the vital few factors that create statistically significant effects of practical importance for the goal of process improvement.

The ideal DOE for screening resolves main effects free of any two-factor interactions (2FI’s) in broad and shallow two-level factorial design. I recommend the “resolution IV” choices color-coded yellow on our “Regular Two-Level” builder (shown below). To get a handy (pun intended) primer on resolution, watch at least the first part of this Institute of Quality and Reliability YouTube video on Fractional Factorial Designs, Confounding and Resolution Codes.

If you would like to screen more than 8 factors, choose one of our unique “Min-Run Screen” designs. However, I advise you accept the program default to add 2 runs and make the experiment less susceptible to botched runs.

Stat-Ease® 360 and Design-Expert® software conveniently color-code and label different designs.

After throwing the trivial many factors off to the side (preferably by holding them fixed or blocking them out), the experimental program enters the characterization phase (the “C”) where interactions become evident. This requires a higher-resolution of V or better (green Regular Two-Level or Min-Run Characterization), or possibly full (white) two-level factorial designs. Also, add center points at this stage so curvature can be detected.

If you encounter significant curvature (per the very informative test provided in our software), use our design tools to augment your factorial design into a central composite for response surface methods (RSM). You then enter the optimization phase (the “O”).

However, if curvature is of no concern, skip to ruggedness (the “R” that finalizes the “SCOR”) and, hopefully, confirm with a low resolution (red) two-level design or a Plackett-Burman design (found under “Miscellaneous” in the “Factorial” section). Ideally you then find that your improved process can withstand field conditions. If not, then you will need to go back up to the beginning for a do-over.

The SCOR strategy, with some modification due to the nature of mixture DOE, works equally well for developing product formulations as it does for process improvement. For background, see my October 2022 blog on Strategy of Experiments for Formulations: Try Screening First!

Stat-Ease provides all the tools and training needed to deploy the SCOR strategy of experiments. For more details, watch my January webinar on YouTube. Then to master it, attend our Modern DOE for Process Optimization workshop.

Know the SCOR for a winning strategy of experiments!

There are a couple features in the latest release of Design-Expert and Stat-Ease 360 software programs (version 22.0) that I really love, and wanted to draw your attention to. These features are accessible to everyone, no matter if you are a novice or an expert in design of experiments.

First, the **Analysis Summary** in the Post Analysis section: This provides a quick view of all response analyses in a set of tables, making it easy to compare model terms, statistics such as R-squared values, equations and more. We are pleased to now have this feature that has been requested many times! When you have a large number of responses, understanding the similarities and differences between the model may lead to additional insights to your product or process.

Second, the **Custom Graphs** (previously Graph Columns): Functionality and flexibility have been greatly expanded so that you can now plot analysis or diagnostic values, as well as design column information. Customize the colors, shapes and sizes of the points to tell your story in the way that makes sense to your audience.

Figure 1 (left) shows the layout of points in a central composite design, where the points are colored by the their space point type (factorial, axial or center points) and then sized by the response value. We can visualize where in the design space the responses are smaller versus larger.

In Figure 2 (right), I had a set of existing runs that I wanted to visualize in the design space. Then I augmented the design with new runs. I set the Color By option to Block to clearly see the new (green) runs that were added to the design space.

These new features offer many new ways to visualize your design, response data, and other pieces of the analysis. What stories will you tell?

At the outset of my chemical engineering career, I spent 2 years working with various R&D groups for a petroleum company in Southern California. One of my rotations brought me to their tertiary oil-recovery lab, which featured a wall of shelves filled to the brim with hundreds of surfactants. It amazed me how the chemist would seemingly know just the right combination of anionic, nonionic, cationic and amphoteric varieties to blend for the desired performance. I often wondered, though, whether empirical screening might have paid off by revealing a few surprisingly better ingredients. Then after settling in on the vital few components doing an in-depth experiment may very well have led to discovery of previously unknown synergisms. However, this was before the advent of personal computers and software for mixture design of experiments (DOE), and, thus, extremely daunting for non-statisticians.

Nowadays I help many formulators make the most from mixture DOE via Stat-Ease softwares’ easy-to-use statistical tools. I was very encouraged to see this 2021 meta-analysis that found 200 or so recent publications (2016-2020) demonstrating the successful application of mixture DOE for food, beverage and pharmaceutical formulation development. I believe that this number can be multiplied many-fold to extrapolate these findings to other process industries—chemicals, coatings, cosmetics, plastics, and so forth. Also, keep in mind that most successes never get published—kept confidential until patented.

However, though I am very heartened by the widespread adoption of mixture DOE, screening remains underutilized based on my experience and a very meager yield of publications from 2016 to present from a Google-Scholar search. I believe the main reasons to be:

- Formulators prefer to rely on their profound knowledge of the chemistry for selection of ingredients (refer to my story about surfactants for tertiary oil recovery)
- The number of possibilities get overwhelming; for example, this 2016
*Nature*publication reports that experimenters on a pear cell suspension culture got thrown off by the 65 blends they believed were required for simplex screening of 20 components (too bad, as shown in the Stat-Ease software screenshot below, by cutting out the optional check blends and constraint-plane-centroids, this could be cut back to substantially.)

- Misapplying factorial screening to mixtures, which, unfortunately happens a lot due to these process-focused experiments being simpler and more commonly used. This is really a shame as pointed out in this Stat-Ease blog post

I feel sure that it pays to screen down many components to a vital few before doing an in-depth optimization study. Stat-Ease software provides some great options for doing so. Give screening a try!!

For more details on mixture screening designs and a solid strategy of experiments for optimizing formulations, see my webinar on Strategy of Experiments for Optimal Formulation. If you would like to speak with our team about putting mixture DOE to good use for your R&D, please contact us.

Thank you to our presenters and all the attendees who showed up to our 2022 Online DOE Summit! We're proud to host this annual, premier DOE conference to help connect practitioners of design of experiments and spread best practices & tips throughout the global research community. Nearly 300 scientists from around the world were able to make it to the live sessions, and many more will be able to view the recordings on the Stat-Ease YouTube channel in the coming months.

Due to a scheduling conflict, we had to move Martin Bezener's talk on "The Latest and Greatest in Design-Expert and Stat-Ease 360." This presentation will provide a briefing on the major innovations now available with our advanced software product, Stat-Ease 360, and a bit of what's in store for the future. Attend the whole talk to be entered into a drawing for a free copy of the book *DOE Simplified: Practical Tools for Effective Experimentation, 3rd Edition**.* New date and time: **Wednesday, October 12, 2022** at 10 am US Central time.

Even if you registered for the Summit already, you'll need to register for the new time on October 12. Click this link to head to the registration page. If you are not able to attend the live session, go to the Stat-Ease YouTube channel for the recording.

Want to be notified about our upcoming live webinars throughout the year, or about other educational opportunities? Think you'll be ready to speak on your own DOE experiences next year? Sign up for our mailing list! We send emails every month to let you know what's happening at Stat-Ease. If you just want the highlights, sign up for the *DOE FAQ Alert* to receive a newsletter from Engineering Consultant Mark Anderson every other month.

Thank you again for helping to make the 2022 Online DOE Summit a huge success, and we'll see you again in 2023!