>>>>>>>>>>>>>>>>>>>>>>>>>>DOE FAQ Alert<<<<<<<<<<<<<<<<<<<<<<<<<< Issue: Volume 1, Number 4 Date: June 2001 From: Mark J. Anderson, Stat-Ease, Inc. (http://www.statease.com) "Statistics Made Easy" (tm) TO UNSUBSCRIBE FOLLOW THE INSTRUCTIONS AT THE END OF THIS E-MAIL. Dear Experimenter, Here's our fourth issue in an ongoing series of e-mails with answers to frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed the prior DOE FAQ Alert (or earlier ones), go to http://www.statease.com/doealert.html . Feel free to forward this newsletter to your colleagues. They can subscribe by going to: http://www.statease.com/doealertreg.html. Before I get into the meat of this message, I offer this link as an appetizer from an educational web-site on the Internet: http://www.brainpop.com/specials/scientificmethod/index.weml . (site requires subscription to view) Upon arrival the site loads a cartoon movie (be patient!). Press play to gain insights on the scientific method. This fun- looking site is aimed at 5th to 8th graders, so if you like it, pass it along to any youngsters you think might benefit. (Can you suggest any links that experimenters might find fun and interesting? If so, send me an e-mail with the link embedded.) Here's what I cover in this DOE FAQ Alert: 1. FAQ: Why choose a probability of 0.05 (p-value) as the criteria for statistical significance? 2. X-FAQ: Criteria for p-values when doing multiple pairwise significance testing 3. Software Alert: Upgrade patch available for V6.04 of Design-Expert(R) version 6 (DX6) software. (If you own DX6, click the on link for a free upgrade. Otherwise, follow the link to the free trial version.) 4. Info Alert: New "Stat-Teaser" newsletter features a bread DOE and other helpful articles (click on link to see it!) 5. Events Alert: A heads-up on DOE talks and demos. 6. Workshop Alert: Coming soon to Philadelphia and Seattle PS. Statistics quote for the month from Nero Wolfe, fictional detective. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 - FAQ: Why choose a probability of 0.05 (p-value) as the criteria for statistical significance? -----Original Question----- From: Indiana "I have a question for you: Do you have a reference to back up the following rule of thumb for p values*?: p <= 0.05 ==> significant 0.1> p >0.05 ==> may be significant p >= 0.1 ==> not significant" *(If you need background on the term "p-value" to understand this question, see page 24 in Chapter 2 of "DOE Simplified, Practical Tools for Effective Experimentation." Details on the book can be found at http://www.statease.com/doe_simp.html along with an excerpt from Chapter 2 that contains the referenced material on p- values. Also, if you own Design-Expert or Design-Ease software, you will find a definition of "p-value" and many other statistical terms in the Glossary under Help. The "DOE Simplified" book also offers a glossary of terms.) Answer: >The p-value represents the risk of falsely rejecting the null hypothesis. The "p >= 0.1 ==> not significant" rule is pretty universal. Less universal may be the additional rules we use in our workshops that "p <= 0.05 ==> significant" and "0.1> p >0.05 ==> may be significant." How much risk you can tolerate depends on the cost of falsely rejecting the null hypothesis. You have to quantify this cost and decide on the risk you are willing to accept.< (Answered by Pat Whitcomb, Principal, Stat-Ease, Inc.) I completely agree with Pat's assessment of generally acceptable p-values. How these specific values evolved is somewhat murky. In "The History of Statistics" (published by Belknap Press of Harvard University) Stephen Stigler relates a story about the statistician LaPlace who used a p-value of 0.01 as a measure of significance for a study on how the moon affected barometric pressure. This is the earliest reference I can find to p-values. Sir Ronald Fisher, who introduced the concept of significance testing in the early part of the 20th century, said this: "If P is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at 0.05...." [from: "Statistical method for research workers." London: Oliver and Boyd, 1950:80.] Fisher argued that interpretation of the p-value was ultimately up to the researcher. For example, a p-value of around 0.05 might provide incentive to perform another experiment rather than provide immediate resolution as to whether to accept or reject the null hypothesis. In "Statistics for Experimenters," the authors (Box, Hunter, Hunter) go a bit higher in p-value than Pat (and most other statisticians) at the upper end. They say "one begins to be slightly suspicious of a discrepancy at the 0.20 level, somewhat convinced of its reality at the 0.05 level, and fairly confident of it at the 0.01 level." But they then go on to say: "Significance testing in general has been a greatly overworked procedure...[It's] better to provide an interval within which the value of the parameter would be expected to lie." For this reason we now print confidence intervals on the model coefficients in version 6 of our Design-Ease and Design-Expert software. You should also make use of the Point Prediction feature in our software, which provides confidence and prediction intervals on the predicted response(s). In the end you must decide what to do based on the statistics and your subject matter knowledge. Box, Hunter and Hunter acknowledge that "In practice, an experimenter's prior belief in the possibility of a particular type of discrepancy must affect his attitude." (Learn more about significance testing and basic DOE by attending the 3-day computer-intensive workshop "Experiment Design Made Easy." Go to http://www.statease.com/clas_edme.html for a description and links to the course outline and schedule.) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 - X-FAQ: Criteria for p-values when doing multiple pairwise significance testing -----Original Question----- From: New Jersey "I have a question which has come up with regard to the F-test as in the "bowling" tutorial.* It was indicated that we should never try to use the t-test to compare averages between any two treatments UNLESS the F-test showed overall significance. However, I would like to know exactly WHY that was said?? Even if the overall F-test does not show significance, isn't it possible that say one of four treatments might be significantly better or worse than the other three, even though there is no statistically significant difference among those other three?? I know that for some sets of data, the F test could indicate a "prob>F" that exceeds 0.10, yet one or more of the t-test comparisons will indicate a "prob>t" that's less than 0.05. So I do not believe it to be a true statement that the F test will ALWAYS indicate significance if even one of the t-test comparisons indicates significance. However, it appears to me that the converse is true: If the F test shows significance, then at least one of the t-tests will always indicate significance. I would like to be able to conclude with 95% confidence that the t-test could still indicate significance even if the F test is close but not quite into the significance range. Do you get my dilemma? Can you answer this? Thanks a lot!" *(Refer to Design-Expert User's Guide Section 2: "One Factor Tutorial, or http://www.statease.com/x6ug/DX02-Factor-One.pdf .) Answer: >The problem with doing all the individual t-tests is that of multiple comparisons. If you do multiple comparisons, each with a 5% risk of a type I error, the overall risk of a type I error is much greater than 5%. The maximum risk is roughly k (the number of comparisons) times the error risk. If you have 5 means there are 10 pairwise t-tests and the maximum risk of falsely rejecting the null hypothesis is 10 times 5% or 50%. The actual risk in this case is about 29%, not 50%. By first performing the F-test we provide protection from type I error creep by not looking for differences unless we know the null hypothesis has been rejected with our overall risk set at 5%. There are also schemes that correct the t-values to control the overall type I error. If you want to learn more about this look in statistics text books for "multiple comparisons".< (Answered by Pat Whitcomb, Principal, Stat-Ease, Inc.) PS from MJA: See what the online "Engineering Statistics Handbook" by NIST/SEMATECH says about this issue by linking to: http://www-09.nist.gov/div898/handbook/prc/section4/prc47.htm . (Dear advanced readers: What can you add to clarify this FAQ and the related one above? Mark) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3 - Software Alert: Upgrade patch available for V6.04 of Design-Expert(R) version 6 (DX6) software If you own a permanently licensed copy of Design-Expert version 6 go to http://www.statease.com/soft_ftp.html#dx6updt for a patch that will update your software (individual or networked) with the latest enhancements. If you do not currently use Stat-Ease software, download a fully-functional free trial of DX6 at http://www.statease.com/dx6descr.html , which you can use at no cost for 30 days. The latest version of DX6 offers a new design option for response surface methods called "Historical Data". This feature makes it easy to create a blank layout to enter happenstance data for up to 10 numeric factors and 10 categorical factors and the associated responses. Typically these will be copied into DX from a Windows- based spreadsheet such as Excel. Then you can apply Design- Expert's powerful tools for regression modeling, 3D graphics and multiple response optimization. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4 - Info Alert: New "Stat-Teaser" newsletter features a bread DOE and other helpful articles Get a free download of the June 2001 Stat-Teaser in PDF by clicking on http://www.statease.com/newsltr.html . Find out how I applied DOE methods to improve the performance of a bread-making machine. The analysis revealed an unexpected interaction between two key ingredients. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5 - Events Alert: A heads-up on DOE talks and demos Click on http://www.statease.com/events.html for a listing of where Stat-Ease consultants will be giving talks and doing DOE demos. The next event of international interest will be the Joint Statistical Meetings in Atlanta this August. We hope to see some of you there! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6 - Workshop Alert: Coming soon to Philadelphia and Seattle On July 12th Stat-Ease travels to Philadelphia for a presentation of "DOE Simplified" (DOES), a one-day overview of DOE based on the book of the same name. See http://www.statease.com/does.html for class content. Although "DOE Simplified" is fun and informative, it's only intended to get people started on the path to more effective experimentation. We hope that participants will then be motivated to take the next step by attending our "Experiment Design Made Easy" (EDME) workshop, which will be presented next in Seattle on July 10-12. We will return to Seattle on Sept. 13 with the one-day "DOE Simplified" presentation. See http://www.statease.com/clas_pub.html for a schedule and sites for all Stat-Ease workshops open to the public. To enroll, call Stat-Ease at 612-378-9449. Don't delay, seats sometimes fill up fast. If spots remain available, bring along several colleagues and take advantage of quantity discounts in tuition, or consider bringing in an expert from Stat-Ease to teach a private class at your site. Call us to get a quote. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I hope you learned something from this issue. Address your questions and comments to me at: Mark@StatEase.com PLEASE DO NOT SEND ME REQUESTS TO SUBSCRIBE OR UNSUBSCRIBE - FOLLOW THE INSTRUCTIONS AT THE END OF THIS MESSAGE. Sincerely, Mark Mark J. Anderson, PE, CQE Principal, Stat-Ease, Inc. (http://www.statease.com) Minneapolis, Minnesota USA PS. Statistics quote for the month: "In a world that operates largely at random, coincidences are to be expected, but each one of them must always be mistrusted." - Line spoken by actor playing Detective Nero Wolfe, from A&E television show, originally aired on 4/28/01. Trademarks: Design-Ease, Design-Expert and Stat-Ease are registered trade-marks of Stat-Ease, Inc. Acknowledgements to contributors: - Students of Stat-Ease training and users of Stat-Ease software - Fellow Stat-Ease consultants Pat Whitcomb and Shari Kraber (see http://www.statease.com/consult.html for resumes) - Statistical advisor to Stat-Ease: Dr. Gary Oehlert ( http://www.statease.com/garyoehl.html ) - Stat-Ease programmers, especially Tryg Helseth ( http://www.statease.com/pgmstaff.html ) - Heidi Hansel, Stat-Ease communications specialist, and all the remaining staff DOE FAQ Alert - Copyright 2001 Stat-Ease, Inc. All rights reserved.