Issue: Volume 3, Number 2
Date: February 2003
From: Mark J. Anderson, Stat-Ease, Inc.
Here's another set of frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed previous DOE FAQ Alerts, click on the links below. Feel free to forward this newsletter to your colleagues. They can subscribe by going to http://www.statease.com/doealertreg.html.
I wrote the following paragraph before the Columbia space shuttle disaster this past weekend. The multinational crew had successfully completed over 80 experiments during their mission. These astronaut scientists will be sorely missed. The loss of the shuttle and its impact on space research is a terrible setback.
Here's an appetizer to get this Alert
off to a good start: http://www.firstscience.com/site/articles/beer.asp.
It details what happens when beer is brewed in a weightless environment, based
on data from two separate space shuttle experiments (recall that last month
my appetizer featured a launch video, so I am on a roll!). I came across this
fascinating site while trying to find a case study on the application of DOE
to brewing beer. This was done on behalf of a DOE FAQ Alert reader - a Brewery
Master from India. If any of you know of such a study, please let me know so
I can pass it along. Considering the role that beer played in the development
of modern-day statistics (via the pioneering work of W.S. Gossett, Master Brewer
for Guinness), it's surprisingly hard to find any evidence of modern-day applications
of DOE in this
field. I am sure I am just not looking in the right places.
Next time you enjoy a beer or any other beverage in the company of your colleagues, don't forget to raise your glass to the gallant crew of Columbia. I pass along a salute for fallen aviators like them: "We toast our hearty comrades who have fallen from the skies, and were gently caught by God's own hands to be with him on high."
Here's what I cover in the body text
of this DOE FAQ Alert (topics that delve into statistical detail are designated
1. FAQ: How to make a Pareto chart of effects, enhanced with Lenth's benchmarks, via Stat-Ease software and Microsoft Excel
2. Expert-FAQ: How to estimate error from a d-optimal general factorial design
3. Reader feedback: How DOE saved a company 50 million dollars ($50,000,000) in raw material costs
4. Events alert: Kansas City and other places to see Stat-Ease and hear talks by their DOE experts
5. Workshop alert: Special presentations in Philadelphia and Seattle, plus regularly scheduled workshops at our home training facility in Minneapolis (link to the complete schedule)
PS. Quote for the month - Lenth's neural nets, genetic algorithms, data mining, and the like
1 - FAQ: How to make a Pareto
chart of effects, enhanced with Lenth's benchmarks, via Stat-Ease software and
From: United Kingdom
"Mark, I am really interested in being able to get an ordered bar [Pareto] chart out of your software, which shows the importance of the effects in rank order as an alternative to the half-normal plot. I feel that this provides an intuitive way of identifying the most important effects. We have found the half-normal plot difficult to explain to scientists. Is there any way of doing this in Stat-Ease software, or any plans to put it in to future versions?"
From: North Carolina
"Mark, with respect to future upgrades, may I suggest that a Pareto chart of effects would be a very useful addition to the output. None of my associates uses a half-normal plot to a management group that generally has had no DOE training. They have to export to Excel, clean it up, make a Pareto chart and import again to the report document. While the utility of the half-normal plot is obvious to the analyst, it is a mystery to the unlearned and a distraction to explain. A simple Pareto chart would be a great time-saver and strengthen the output."
From a statistical point of view, the half-normal plot of effects is far more appropriate than the Pareto chart for separating the vital few (likely significant) from the trivial many (likely due to normal error).* However, I've found that the Pareto chart helps focus people's attention on the bigger effects, so perhaps it would good to use in support of what's identified by the half-normal plot. I passed your suggestion along to my colleagues who decide on what goes into future versions of Design-Ease® and Design-Expert® software. Something is now in the works for version 7. Just don't ask me when it will be released - it will be a while!
In the meantime, here's a work-around,
via Microsoft Excel, for those of you who really want a Pareto chart of effects:
1. Go to the Effects, View of the Effects list.
2. Skipping over the Intercept, drag your cursor over the list of alphabetical effects. Do not include lines for curvature, lack of fit or pure error, which may or may not be estimated, depending on your design.
3. Do an Edit, Copy to the Microsoft Windows clipboard.
4. Open Excel and do an Edit, Paste.
5. Right-click on the headers for all but the columns labeled "Term" and "Effect". Delete these superfluous columns.
6. Right-click on the Effect column and Insert a new column. Label this "|Effect|".
7. Click on first empty cell under |Effect| and click on the "fx" button. Select the Math and Trig category which brings up the ABS function first on the list. Press OK and then click on the numerical effect for Effect A. Press OK again.
8. Place your mouse cursor over the resulting calculation so it changes to a + (copy function) and then drag it down to perform the ABS function on the remaining effects. If you like, include the benchmarks provided by Lenth** (labeled "ME" and "SME"). Otherwise delete these two extra rows from the bottom of the list.
9. Click back on the first |Effect| number (the absolute value of the first effect). Then go to Data, Sort. Select Sort by |Effect| in Descending order and press OK.
10. Drag your mouse over the Term and |Effect| columns (so they become highlighted) and then click on the Chart wizard. Press Next 3 times and Finish.
The end result is a nice-looking Pareto chart with Lenth's benchmarks interspersed (if you have not previously deleted them). Be careful when you interpret such a chart. It's not very clear where to draw the line between the vital few and trivial many effects, even with the aid of Lenth, which works well only on medium-sized designs of 16 runs or so. FYI, we've found the Lenth under-selects on small designs and over-selects on large designs. The half-normal plot of effects really works the best for identifying potentially significant effects so don't overlook it in favor of the Pareto chart.
I hope I got this all detailed correctly. Now I can appreciate why folks like you who want the Pareto chart are asking us to put it in our software. Thank you for the suggestion.
*See Chapter 3 of "DOE Simplified: Practical Tools for Effective Experimentation". For information on this soft cover book, and a link to purchase it, go to http://www.statease.com/doe_simp.html.
**See the paper by Russell Lenth entitled "Quick and Easy Analysis of Unreplicated Factorials" published in Technometrics, November 1989, Volume 31, Number 4, Page 469. Lenth's method is based on the sparsity of effects principle (only a few main effects and two-factor interactions will be significant). The most conservative benchmark for significance is called the simultaneous margin of error (SME). If you want to reduce the chance of missing significant effects, use the uncorrected margin of error (Lenth's ME). However, this less conservative measure increases the likelihood of picking insignificant effects. If you run replicates and get an estimate of pure error, we take this information into account in our software by making a further modification of the Lenth method. Lenths method can be used as a starting point for selecting model terms, but it should always be verified by looking at the half-normal probability plot of effects. Don't be surprised to find obviously significant effects overlooked, or obviously insignificant effects picked. Modify the selection accordingly by clicking effects on or off.
2 - Expert-FAQ: How to estimate
error from a d-optimal general factorial design
"I have 1 factor [A] at four levels and 3 factors [B,C,D] at two levels. The d-optimal option [on the Factorial tab in Stat-Ease software] chooses a 19-run design [a subset of all 32 (4x2x2x2) possible combinations to fit a two-factor interaction model]. During the design-building process, I cannot key in or select the replicates. The box for this is grayed out. Without replication, how is the error term calculated? The only error terms I can select are main effects or two-factor interactions. How can I replicate a few key runs to produce estimates of pure error?"
Answer (from Stat-Ease Consultant
"If you'd like to get better error estimates, I suggest two approaches to augment the d-optimal designs generated by default with our software.
1. Add higher-order terms to the model you specify in the d-optimal design wizard by clicking on the "Edit Model" button. In your case it would be best to add the three-factor interaction (3fi) made up of the three two-level factors (BCD). The variation from this 3fi effect can then be assumed to be a measure of error when doing the analysis.
2. The other option is to replicate some key runs as you suggest. Here's how to do this with Stat-Ease software: Do a right mouse-click over the gray square to the left of the desired run and choose "Duplicate" from the menu that pops up. The software then adds a replicate at the bottom of the list. After you finish editing the design, right-click over the column-header for Run numbers and select Randomize to re-order the runs."
(Learn more about analyzing data from general factorial designs, including d-optimal, by attending the 3-day computer-intensive workshop "Experiment Design Made Easy." For a complete description see http://www.statease.com/clasedme.html. Link from this page to the course outline and schedule. Then, if you like, enroll online.)
3 - Reader feedback: How DOE saved a company 50 million dollars ($50,000,000) in raw material costs
From: Gary Knapp, President, GFK
"25 years ago, I worked as a Senior Research (Chemical) Engineer for the Ortho Division (Agricultural Chemicals) of Chevron Chemical Co. Ortho was converting our Orthene organophosphate insecticide plant to take advantage of a new manufacturing process that would eventually save the company over $50,000,000 in raw material costs.
While the plant was undergoing modification, the process development team took advantage of the time to once more verify each new reaction in the Pilot Plant. To our consternation, we found that a critical reaction no longer gave the expected conversions and yields which, unless resolved, would have eliminated the anticipated savings. We tried to determine the cause of the problem by changing one variable at a time. There were at least five major variables: reaction time, temperature, catalyst concentration, moisture, reactant history. The one-factor-at-a-time (OFAT) experimentation was to no avail - we could not discover any consistent trends.
At this point we contacted one of Chevron's Senior Statisticians, Jake Sredni,* who proposed a two-level fractional factorial design. He even suggested we throw in some extra variables, since unless they have an effect, their presence won't affect the study. Those were strange concepts for OFAT experimenters. We carried out the statistical experimental design, and found that none of the variables had an effect!
We quickly realized how valuable this result was and we started to think outside the box. What about the chemical analysis? Could it have been the cause for our inconsistent results? We had naturally consulted with the analytical section all along, but had always been assured that everything was fine. With the statistical results in hand, we finally spiked some of our samples and discovered that the recovery of the spike was always low, and that it also varied randomly.
The whole problem turned out to be analytical. In the end, it was a well designed set of experiments that pointed us in the right direction! The plant conversion was a success, and the expected savings were materialized. I've continued to use factorial designs as a key screening tool.
P. S. This story has an ironic twist, since we were an agricultural chemical company: It is my understanding that DOE was invented, or first used, for agricultural research. Experiments usually lasted for a whole season. The time factor alone made efficient experimentation a necessity. The problem of so many outside variables changing from season to season made one-variable-at-a-time experimentation virtually impossible.
*For background on Jake Sredni, my "saviour" back in 1980, see http://makeashorterlink.com/?N1B932743. Note that Dr. Sredni studied under the renowned statistician Prof George E. P. Box. I'm not a good statistician, but at least I know where to go for help! Jake was very good to work with - we solved the problem together. He, like Mark and his partner Pat at Stat-Ease, is a Chemical Engineer. It's obviously a good combination (I'm also a Chemical Engineer!)."
4 - Events alert: Kansas City and other places to see Stat-Ease and hear talks by their DOE experts
FIRST NOTICE: Sign up for the 57th Annual Quality Congress in Kansas City, Missouri on May 19-21 at http://aqc.asq.org/. See us at Booth #115 and attend my May 20th talk (T310): "How to Use Graphs to Diagnose and Deal with Bad Experimental Data".
SECOND NOTICE: We've been asked to pass along an alert for the Conference on New Directions in Experimental Design to be held in Chicago, Illinois on May 14-17, 2003. The focus of the conference will be on design of experiments in the pharmaceutical and related industries. See http://www.math.uic.edu/~kjryan/dae2003.html for more details.
Click http://www.statease.com/events.html for a listing of where Stat-Ease consultants will be giving talks and doing DOE demos. We hope to see you sometime in the near future!
5 - Workshop alert: Special presentations in Philadelphia and Seattle, plus regularly scheduled workshops at our home training facility in Minneapolis (link to the complete schedule)
In addition to the ongoing schedule
of classes taught in our Minneapolis training facility, Stat-Ease will be on-the-road
presenting its "Experiment Design Made Easy" workshop in:
- Philadelphia, PA on April 1-3
- Seattle, WA on May 6-8.
We hope to see you there or here in Minneapolis.
for schedule and site information on all Stat-Ease workshops open to the public.
To enroll, click the "register online" link on our web site or call
Stat-Ease at 1.612.378.9449. If spots remain available, bring along several
colleagues and take advantage of quantity discounts in tuition, or consider
bringing in an expert from Stat-Ease to teach a private class at your site.
Call us to get a quote.
I hope you learned something from this issue. Address your questions and comments to me at:
Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc. (http://www.statease.com)
Minneapolis, Minnesota USA
PS. Quote for the month - Lenth's
views on neural nets, genetic algorithms, data mining, and the like:
"Engineers are quite comfortable these days - in fact, far too comfortable - with results from the blackest of black boxes: neural nets, genetic algorithms, data mining, and the like."
- Russell Lenth (http://www.stat.uiowa.edu/~rlenth/)
Trademarks: Design-Ease, Design-Expert and Stat-Ease are registered trademarks of Stat-Ease, Inc.
Acknowledgements to contributors:
- Students of Stat-Ease training and users of Stat-Ease software
- Fellow Stat-Ease consultants Pat Whitcomb and Shari Kraber (see http://www.statease.com/consult.html for resumes)
- Statistical advisor to Stat-Ease: Dr. Gary Oehlert (http://www.statease.com/garyoehl.html)
- Stat-Ease programmers, especially Tryg Helseth (http://www.statease.com/pgmstaff.html)
- Heidi Hansel, Stat-Ease marketing director, and all the remaining staff.
Interested in previous FAQ DOE Alert e-mail newsletters? To view a past issue, choose it below.
#1 - Mar 01, #2 - Apr 01, #3 - May 01, #4 - Jun 01, #5 - Jul 01 , #6 - Aug 01, #7 - Sep 01, #8 - Oct 01, #9 - Nov 01, #10 - Dec 01, #2-1 Jan 02, #2-2 Feb 02, #2-3 Mar 02, #2-4 Apr 02, #2-5 May 02, #2-6 Jun 02, #2-7 Jul 02, #2-8 Aug 02, #2-9 Sep 02, #2-10 Oct 02, #2-11 Nov 02, #2-12 Dec 02, #3-1 Jan 03, #3-2 Feb 03 (see above)