The DOE FAQ Alert Vol. 4 No. 6

Issue: Volume 4, Number 6
Date: June 2004
From: Mark J. Anderson, Stat-Ease, Inc. (www.statease.com)

Dear Experimenter,

Here's another set of frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed previous DOE FAQ Alerts, please click on the links at the bottom of this page. If you have a question that needs answering, click the Search tab and enter the key words. This finds not only answers from previous Alerts, but also other documents posted to the Stat-Ease web site.

Feel free to forward this newsletter to your colleagues. They can subscribe by going to http://www.statease.com/doealertreg.html. If this newsletter prompts you ask to your own questions about DOE, please address them to stathelp@statease.com.

Here's an appetizer to get this Alert off to a good start:
http://www.sciencedaily.com/. It brings up "Science Daily"--a free, award-winning online magazine for the latest scientific discoveries and research. Browse or easily search this comprehensive guide to what's happening in the fields of science, including statistics.

Also, check out the neat posters by the Statistical Graphics section of the American Statistical Association (ASA) offered at http://www.public.iastate.edu/~dicook/Stat.Graphics/posters.html. I especially like poster 3 ("Pick a box..."), which illustrates the value of graphics versus raw statistics (1 graph = 1000 numbers?).

Here's what I cover in the body text of this DOE FAQ Alert (topics that delve into statistical detail are designated "Expert"):

1. Newsletter alert: The May issue of the Stat-Teaser features "Katie's Coke versus Pepsi DOE"--a lesson on aliasing
2. Warning--DOE under attack: Follow the link to "Limitations of Experimental Design..." and read my rebuttal
3. Info alert: Link to forums on DOE and other statistical tools
4. Reader reply: Another way to explain degrees of freedom
5. Events alert: Link to my May 2004 Annual Quality Congress talk on screening designs; also, see a list of upcoming appearances
6. Workshop alert: See when and where to learn about DOE--response surface methods (RSM) is next on the educational agenda

PS. Quote for the month--Galileo on being a lone voice of reason against the ignorance of the masses.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Newsletter alert: The May issue of the Stat-Teaser features "Katie's Coke versus Pepsi DOE"--a lesson on aliasing

Many of you by now may have received a printed copy of the latest Stat-Teaser, but others, by choice or because you reside outside of North America, will get your first look at the May issue at http://www.statease.com/news/news0405.pdf.

The feature article, "Katie's Coke versus Pepsi DOE," details a fundamental mistake in design by a novice experimenter--my youngest daughter. Her Coke versus Pepsi taste test turned out to be a good lesson on inadvertent aliasing of effects. However, I did experience a significant jolt from both caffeinated colas!
This was very helpful for me to do my moonlighting work on my next book--"RSM Simplified," which like "DOE Simplified" will be co-authored by Patrick Whitcomb.

Other stories in the Stat-Teaser provide:
- Answers about lack of fit (by consultant Shari Kraber)
- A great example on the use of mixture design for optimal formulation* by an R&D group at Sigma-Aldrich, Biotechnology
- A spotlight on South African statistician Nico Laubscher, who contributed a fabulous photo of a majestic lion seen in Kruger National Park.

*(To master these powerful tools of DOE, attend our "Mixture Design for Optimal Formulation" workshop. For a description, see http://www.statease.com/clas_mix.html. Link from this page to the course outline and schedule. You can enroll online by linking to the Stat-Ease e-commerce page for workshops.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2. Warning--DOE under attack: Follow the link to "Limitations of Experimental Design..." and read my rebuttal

"Desktop Engineering" published an article touting neural networks versus DOE: http://www.deskeng.com/articles/04/may/web/main.htm. This prompted me to write a rebuttal to the editor, which I will share with you readers. Here's the title and abstract of the original article.

"CASE STUDY: Limitations of Experimental Design for Function Approximation and Resulting Penalties in Multi-Objective Optimization" by David Lengacher and Andrew Turner

Abstract
Though most engineers and analysts in manufacturing and service industries have experience using multiple linear regression to approximate how the inputs (controls) of a process affect the outputs, there is less awareness of the best approach for nonlinear processes. The most popular choice is to use design of experiments (DOE) of the form 3k (the more common 2k design would only produce a linear equation). However, this method requires that two conditions be met:
- the input space must be rectangular in order to avoid risky extrapolation, and
- randomized replications must be executed at each test point, which in turn, usually necessitates work stoppages.

While DOE provides powerful function approximation abilities, these two conditions are not easily met in industry, making DOEs difficult to execute in the real world. But more frequently, an engineer has access to historic data of a type that DOEs cannot use, and the process he is studying has both a nonrectangular (dependent) input space and produces more than one output. This case study examines a situation where the need for accurate prediction is critical. Furthermore, we will provide a comparative analysis of the best approximation methods available; namely neural networks and experimental design."

Answer (my rebuttal):
David Lengacher and Andrew Turner have presented a very unfair case against design of experiments (DOE) by restricting their view only to the 3^k option. By concocting an example that is not rectangular they literally force a square peg into a triangular hole, which guarantees that DOE will come out poorly in comparison to methods that fill the space. Perhaps they are unaware of DOE tools, specifically response surface methods (RSM), that deal with multilinear constraints such as those that Lengacher and Turner (L&T) present in their case.

For example, L&T's specification of Y<X (where each factor ranges from 1 to 5) can be re-arranged algebraically as 0<X-Y--a multilinear constraint. Based on a statistical algorithm called "D-optimal," I used a specialized software called Design-Expert® to generate a 16-run DOE (shown below) geared for a nonlinear function such that used by L&T in their simulation.

# Type (X,Y)
1. Vertex (1, 1)
2. Vertex (1, 1)
3. Vertex (5, 1)
4. Vertex (5, 1)
5. Vertex (5, 5)
6. Vertex (5, 5)
7. Edge (3, 1)
8. Edge (3, 3)
9. Edge (5, 3)
10. Check (2.33, 1.67)
11. Check (4.33, 1.67)
12. Check (4.33, 3.67)
13. Center (3.67, 2.33)
14. Center (3.67, 2.33)
15. Center (3.67, 2.33)
16. Center (3.67, 2.33)

This test plan shows under the "Type" column where each run is located geometrically. The points labeled "Vertex" fall at a corner of the triangular space that L&T established as the feasible region for their hypothetical process. The ones identified as "Edge" are located at the centers of the sides forming the constraints. The "Center" is the overall centroid of the experimental region. "Check" points fill in gaps within the interior space.

Notice that a number of the points, most at the center, are replicated for estimating error. The run order should be randomized to counteract lurking variables such as machine wear. Thus the replicates would occur at varying intervals, providing a good barometer of any drift in the process.

Using Design-Expert software, I re-created the simulations of L&T for both responses--including the standard deviation. Then I ran my 16-run RSM design. It generated very close approximations throughout the feasible space with no need for extrapolation.

Finally, I ran Design-Expert's numerical optimization with L&T's specification that both responses be minimized. The software produced a most desirable solution of (5, 3.65) for the (X,Y) inputs--very close to the result of (5, 3.75) suggested by L&T.

Readers must draw their own conclusions on the efficacy of DOE versus the alternatives proposed by Lengacher and Turner. However, they should be aware that a fair comparison results only by making use of more sophisticated tools, namely D-optimal RSM designs.

Last, but not least, it should be noted that the article by Lengacher and Turner advocates use of happenstance, historical data fitted via neural network, versus a pro-active experiment, such as what I've laid out using appropriate response surface methods for design of experiments. Models based on happenstance data often prove to be very unstable due to highly correlated inputs--very typical from tightly-controlled processes, such as that described by L&T.

I would be happy to expand this letter into a complete article.

Mark J. Anderson, Principal, Stat-Ease, Inc.

Here's the response from "Desktop Engineering":
" -----Original Message-----
From: Tony Lockwood, Editorial Director
RE: Letter to "Desktop Engineering" Editor rebutting
"Limitations of Experimental Design..."

"I must commend you on your alacrity and thoroughness. I have never had an article generate so complete a response so quickly. I will be forwarding this to the authors for consideration.

I welcome your offer to expand your thoughts into a full-length article. I'd be delighted to take it under consideration for publication."

I have since submitted a complete article on this subject.
- Mark

(Learn more D-optimal design by attending the three-day computer-intensive workshop "Response Surface Methods for Process Optimization." See http://www.statease.com/clas_rsm.html for a complete description. Link from this page to the course outline and schedule. Then, if you like, enroll online.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3. Info alert: A Link to forums on DOE and other statistical tools

Question
From: Netherlands

"Would you know of a newsgroup that covers areas like DOE that has knowledgeable people frequenting them? I wouldn't want to misuse your time for questions I could ask (expert) volunteers!"

Answer
From: Shari Kraber, Statistical Consultant

"The ASQ Stats Division recently set up a small group of forums on the following topics: Basic Statistics, DOE, Process Capability, and Six Sigma: http://www.asqstatdiv.org/discussiongroups.htm. I monitor the Basic Stats and DOE forums. So far, there are less than two dozen people registered as members, so activity thus far is minimal. As these forums are advertised more, hopefully they will become a source of information for people."

Heidi Hansel, Stat-Ease Marketing Director, has these suggestions for discussions on statistical tools:
- http://www.isixsigma.com/forum/ *
- http://news-reader.org/sci.stat.consult/
- http://news-reader.org/sci.stat.math/

*(For a question on DOE and standard error that I answered, see http://www.isixsigma.com/forum/showmessage.asp?messageID=39709.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

4. Reader reply: Another way to explain degrees of freedom

From: Kip Hilshafer, Research Associate, Stepan Company, Illinois

Re: DOE FAQ Alert, Volume 4, Number 5 - May, 2004,* FAQ #1:Explanation for "degrees of freedom"
*[Posted at http://www.statease.com/news/faqalert4-5.html]

"The problem is, mathematicians and statisticians may have good ideas, but that does not mean they write well. Your explanation is very good. I came up with an analogy that the chemists at Stepan liked.

Consider fitting a linear equation (y = mx + b) through data. Because there are two parameters, the slope and y-intercept, which much be determined, there are two unknowns. As a system of equations or collection of data, two independent experiments are required for two simultaneous equations for an exact solution (and a "full-rank" matrix). The two independent data pairs of (x,y) constitute two degrees of freedom (df) and each parameter in the line (a and b) requires a df. Thus, to solve, you need two and have two; life is good.

To fit y = mx + b to three ordered pairs, though, you still require two df for the equation but with three independent measurements, you have 3 df. From there, it's like the First Law of Thermodynamics and not believing in witchcraft--nothing just vanishes into thin air. You have three df and use only two for the line, then one is left over, and the remaining one is error. Your friend has heard and may (justifiably) be confused with df total = df model + df error. Just like thermodynamics, you can't get something for nothing and nothing gets annihilated. Using five points to figure the same line figures as 5 - 2 = 3 df for error.

This is merely a formalization of what most people knew all along: The more data, the better the estimate, because the estimate of error improves. And so on ...

Thus, degrees of freedom might be viewed as a bookkeeping or accounting procedure to keep track of everything. Once you use more stuff, it has to go somewhere and you must know where it goes. It's not witchcraft. Does this help/roil/cause more harm than good?"

Answer:

I think this is great, but I await the verdict from my more statistically-savvy colleagues (and authors of the original explanation).

[Pat Whitcomb, Stat-Ease Consultant, then replied that having achieved a master's degree in chemical engineering, he liked Kip's analogy to thermodynamics (although only a bachelor's level chemical engineer,* so did I). However, Pat feared that for many readers adding thermodynamics to statistics may be going from bad to worse.]

*(Perhaps what persuaded me to pursue a master's in business rather than chemical engineering was a University of Minnesota required course with the terrifying title: "Statistical Thermodynamics". The teacher, a grad student who clearly resented being called away from his research, made this
incredibly difficult subject all the more incomprehensible by:
1. Not being fluent in English
2. Speaking only while facing the chalkboard, so we could not hear him.
3. Not assigning a reference text.
4. Erasing with his left hand as he wrote with his right, so we had little chance of transcribing notes.
I must confess that for the final exam I scored only 15 out of a possible 100 points. However, the teacher graded on the normal curve and the mean was 10, so he gave me an "A". What really astounded me is that one student scored a 90! I never determined the special cause for this outlier but it was hard to imagine anyone doing that well without some sort of special relationship to the teacher. - Mark)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

5. Events alert: Link to my May 2004 Annual Quality Congress talk on screening designs; also, see a list of upcoming appearances

On May 26 in Toronto, I presented a talk (co-authored by Pat Whitcomb) titled "Screening Process Factors in the Presence Of Interactions" to the Annual Quality Congress of the American Society of Quality. It introduced a new, more efficient type of fractional two-level factorial design of experiments (DOE) tailored for screening of process factors. These designs are referred to as "Min Res IV" because they require a minimal number of factor combinations (runs) to resolve main effects from two-factor interactions (resolution IV). To view the proceedings, click on http://www.statease.com/pubs/aqc2004.pdf.

See http://www.statease.com/events.html for a list of appearances by Stat-Ease professionals. We hopeto see you sometime in the near future!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

6. Workshop alert: See when and where to learn about DOE--response surface methods (RSM) are next on the educational agenda

If you've mastered the basics of DOE, take the next step by attending "Response Surface Methods for Process Optimization"--a 3 day, computer-intensive workshop, which will be presented on June 22-24 at the Stat-Ease training center in Minneapolis.

See http://www.statease.com/clas_pub.html for schedule and site information on all Stat-Ease workshops open to the public. To enroll, click the "register online" link on our web site or call Stat-Ease at 1.612.378.9449. If spots remain available, bring along several colleagues and take advantage of quantity discounts in tuition, or consider bringing in an expert from Stat-Ease to teach a private class at your site. Call us to get a quote.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I hope you learned something from this issue. Address your general questions and comments to me at: mark@statease.com.

Sincerely,

Mark

Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc. (http://www.statease.com)
Minneapolis, Minnesota USA

PS. Quote for the month--Galileo on being a lone voice of reason against the ignorance of the masses:

"In questions of science the authority of a thousand is not worth the humble reasoning of a single individual."

- Galileo Galilei

Trademarks: Design-Ease, Design-Expert and Stat-Ease are registered trademarks of Stat-Ease, Inc.

Acknowledgements to contributors:

- Students of Stat-Ease training and users of Stat-Ease software
- Fellow Stat-Ease consultants Pat Whitcomb and Shari Kraber (see http://www.statease.com/consult.html for resumes)
- Statistical advisor to Stat-Ease: Dr. Gary Oehlert (http://www.statease.com/garyoehl.html)
- Stat-Ease programmers, especially Tryg Helseth (http://www.statease.com/pgmstaff.html)
- Heidi Hansel, Stat-Ease marketing director, and all the remaining staff

Interested in previous FAQ DOE Alert e-mail newsletters?
To view a past issue, choose it below.

#1 Mar 01, #2 Apr 01, #3 May 01, #4 Jun 01, #5 Jul 01 , #6 Aug 01, #7 Sep 01, #8 Oct 01, #9 Nov 01, #10 Dec 01, #2-1 Jan 02, #2-2 Feb 02, #2-3 Mar 02, #2-4 Apr 02, #2-5 May 02, #2-6 Jun 02, #2-7 Jul 02, #2-8 Aug 02, #2-9 Sep 02, #2-10 Oct 02, #2-11 Nov 02, #2-12 Dec 02, #3-1 Jan 03, #3-2 Feb 03, #3-3 Mar 03, #3-4 Apr 03, #3-5 May 03, #3-6 Jun 03, #3-7 Jul 03, #3-8 Aug 03, #3-9 Sep 03 #3-10 Oct 03, #3-11 Nov 03, #3-12 Dec 03, #4-1 Jan 04, #4-2 Feb 04, #4-3 Mar 04, #4-4 Apr 04, #4-5 May 04, #4-6 June 04 (see above)

Software Training Consulting Publications Order Online Contact Us Search

Stat-Ease, Inc.
2021 E. Hennepin Avenue, Ste 480
Minneapolis, MN 55413-2726
e-mail: info@statease.com
p: 612.378.9449, f: 612.378.2152