Stat-Ease Blog

Blog

Christmas Trees on my Effects Plot?

posted by Shari Kraber on Dec. 3, 2020

As a Stat-Ease statistical consultant, I am often asked, “What are the green triangles (Christmas trees) on my half-normal plot of effects?”

Factorial design analysis utilizes a half-normal probability plot to identify the largest effects to model, leaving the remaining small effects to provide an error estimate. Green triangles appear when you have included replicates in the design, often at the center point. Unlike the orange and blue squares, which are factor effect estimates, the green triangles are noise effect estimates, or “pure error”. The green triangles represent the amount of variation in the replicates, with the number of triangles corresponding to the degrees of freedom (df) from the replicates. For example, five center points would have four df, hence four triangles appear. The triangles are positioned within the factor effects to reflect the relative size of the noise effect. Ideally, the green triangles will land in the lower left corner, near zero. (See Figure 1). In this position, they are combined with the smallest (insignificant) effects and help position the red line. Factor effects that jump off that line to the right are most likely significant. Consider the triangles as an extra piece of information that increases your ability to find significant effects.

SE_BlogGraph1.png

Once in a while we encounter an effects plot that looks like Figure 2. “What does it mean when the green triangles are out of place - on the upper right side instead of the lower left?”

This indicates that the variation between the replicates is greater than the largest factor effects! Since this error is part of the normal process variation, you cannot say that any of the factor effects are statistically significant. At this point you should first check the replicate data to make sure it was both measured and recorded correctly. Then, carefully consider the sources of process variation to determine how the variation could be reduced. For a situation like this, either reduce the noise or increase the factor ranges. This generates larger signals that allow you to discover the significant effects.

SE_BlogGraph2.png

- Shari Kraber

For statistical details, read “Use of Replication in Almost Unreplicated Factorials” by Larntz and Whitcomb.

For more frequently asked questions, sign up for Mark’s bi-monthly e-mail, The DOE FAQ Alert.


Breaking beyond A/B splits for better business experiments

posted by Mark Anderson on Oct. 12, 2020

Design of experiments (DOE), being such an effective combination of multifactor testing with statistical tools, hits the spot for engineers and scientists doing industrial R&D. However, as documented in my white paper on Achieving Breakthroughs in Non-Manufacturing Processes via Design of Experiments (DOE), this statistical methodology works equally well for business processes. Yet, non-manufacturing experimenters rarely make it beyond simple one-factor-at-a-time (OFAT) comparisons known as A/B splits—most recently embraced, to my great disappointment, by Harvard Business Review*. But to give HBR some credit, this 2017 feature on experimentation at least mentions “multivariate” (I prefer “multifactor”) testing as a better alternative.

To see an illuminating example of multifactor testing applied to marketing, see my April 21 StatsMadeEasy blog: Business community discovers that “Experimentation Works”.

Another great case for applying multifactor DOE came from Kontsevaia and Berger in a study published by the International Journal of Business, Economics and Managemental**. To maximize impressions per social-media posts, they applied a fractional two-level design on 6 factors in 16 runs varying:

A. Type of Day/Day of the week: Weekend (Sat, Sun) vs Workday (Thu, Fri)

B. Social Media Channel: LinkedIn vs Twitter

C. Image present: No vs Yes

D. Time of Day: Afternoon (3-6pm) vs Morning (7-10am)

E. Length of Message: Long (at least 70 characters) vs Short (under 70 characters)

F. Hashtag present: No vs Yes

The multifactor marketing test revealed the choice of channel for maximum impressions to be highly dependent on posts going out on weekends versus workdays. This valuable insight on a two-factor interaction (AB) would never have been revealed by a simple OFAT split.

Design-Expert® software makes multifactor business experiments like this very easy for non-statisticians to design, analyze and optimize for greatly increased returns. Aided by Stat-Ease you can put DOE to work for your enterprise and make a big hit career-wise.

*“Building a Culture of Experimentation”, Stefan Thomke, March-April, 2020.

**“Analyzing Factors Affecting the Success of Social Media Posts for B2B Networks: A Fractional-Factorial Design Approach”, August, 2020.


Magic of multifactor testing revealed by fun physics experiment

posted by Shari on Sept. 16, 2020


If you haven't discovered Mark Anderson's Stats Made Easy blog yet, check it out!  Mark offers a wry look at all things statistical and/or scientific from an engineering perspective. You will find posts on topics as varied as nature, science, sports, politics, and DOE.  His latest post, Magic of multifactor testing, involves a fun DOE with bouncy balls. 

The first article follows the setup of Mark's DOE. At the bottom of Part 1, click on the Related Links tab to find Part 2 (Results) and Part 3 (Data and Details)



That’s a Wrap!

posted by Greg on July 1, 2020

The 2020 Online DOE Summit was a remarkable success! If you missed any of it, read on!

We just wrapped up our 2020 Online DOE Summit. What a successful summit. A group of influential speakers kicked off the discussion of design of experiments (DOE). Hundreds of attendees logged in to each talk and soaked up that knowledge. Thank you to everyone who participated.

We created the summit because of the COVID-19 pandemic. Originally scheduled for the middle of June, our 8th European DOE Meeting was canceled in January. So, after thinking about it for a bit, we decided to move the meeting online. This would be the only way to have a meeting for a while. Plus, the cost to the audience would be zero.

All the speakers who agreed to speak at the European meeting agreed to make the move to a virtual event. A schedule was set-up. We gave it a new name. We sent emails to everyone with dates. The 2020 Online DOE Summit was born.

Our first group of presentations consisted of a kickoff talk, three keynotes, and a tutorial. Many of these talks revolved around current directions in DOE. Even though DOE has been around for decades, it is an evolving practice with new techniques and advice coming up all the time. Each speaker discussed broad concepts in design of experiments.

[Click on the title of the talk for a video recording of the presentation.]

Kickoff: Know the SCOR for Multifactor Strategy of Experimentation
Mark Anderson: Principal of Stat-Ease, Inc.
Talk Topic: Laying out a strategy for multifactor design of experiments

Keynote: My Lifelong Journey with DOE
Pat Whitcomb: Founding Principal of Stat-Ease, Inc.
Talk topic: Pat explores his lifetime of design of experiments with a view to the future

Keynote: Some Experiences in Modern Experimental Design
Marcus Perry: Editor in Chief, Quality Engineering; Professor of Statistics, The University of Alabama
Talk topic: Handling non-standard situations in today’s DOE environment

Keynote: Innovative Mixture-Process Models
Geoff Vining: Professor of Statistics, Virginia Tech
Talk Topic: An overview of KCV designs that limit runs in experiments involving both mixture components and process variables

Tutorial: Strategies for Sequential Experimentation
Martin Bezener: Director of Research & Development, Stat-Ease, Inc.
Talk Topic: This presentation explores how it may be more efficient to divide an experiment into smaller pieces. Learn how to use resources in a smarter, more adaptive manner.

In the second week of the summit, we had a separate set of talks. Each one detailed real-world experiments. Presenters discussed the actual experiments they had worked on, and how they used DOE in each case.

Simultaneous and Quick Determination of Two Ingredients Concentrations in a Solution Using a UV-Vis Spectroscopy Chemometric Model
Samd Guizani: Process Scientist, Ferring International Center

Use of DOE for 3D Printer Ink Formulation Development
Uri Zadok: Senior Research Chemist, Stratasys

Using Experimental Design to Optimize the Surfactant Package Properties of a Metalworking Cleaner
Mathijs Uljé: Development Chemist, Quaker Houghton

Optimizing Multi-Step Processes with DoE – A Cryopreservation Protocol for Plant Cells as a Case
Johannes Buyel: Head of Department of Bioprocess Engineering, Aachen University

In all, this was a great summit. The presenters were spot on with the current state of DOE, whether in modern concepts or real-life experiments. The audience took away many useful ideas and practices. It was a classic case of making lemonade from lemons.

Thanks all!


Greg's DOE Adventure - Factorial Design, Part 2

posted by Greg on April 15, 2020

[Disclaimer: I’m not a statistician. Nor do I want you to think that I am. I am a marketing guy (with a few years of biochemistry lab experience) learning the basics of statistics, design of experiments (DOE) in particular. This series of blog posts is meant to be a light-hearted chronicle of my travels in the land of DOE, not be a text book for statistics. So please, take it as it is meant to be taken. Thanks!]

Keep your experiment planned, but random

When I wrote my introduction to factorial design (Greg’s DOE Adventure - Factorial Design, Part 1), there were a couple of points that I left out. I’ll amend that post here to talk about making sure your experiment is planned out yet random.

Wait. What?

You’ll see. Let me explain.

Getting organized

During the initial phase of an experiment, you should make sure that it is well planned out. First, think about the factors that affect the outcome of your experiment. You want to create a list that’s as all-encompassing as possible. Anything that may change the outcome, put on your list. Then pare it down into the ones that you know are going to be the biggest contributors.

Once you have done that, you can set the levels at which to run each factor. You want the low and high levels to be as far apart as possible. Not too low that you won’t see an effect (if your experiment is cooking something, don’t set the temperature so low that nothing happens). Not too high that it’s dangerous (as in cooking, you don’t want to burn your product).

Finally, you want to make sure your experiment is balanced when it comes to the factors in your experiment. Taking the cooking example above a little further, suppose you have three factors you are testing: time, temperature, and ingredient quality. Let’s also say that you are testing at two different levels: low and high (symbolized by minus and plus signs, respectively). We can write this out in a table:

Table-Factorial2-300.png

This table contains all the possible combinations of the three factors. It’s called an ‘orthogonal array’ because it’s balanced. Each column has the same number of pluses and minuses (4 in this case). This balance in the array allows all factors to be uncorrelated and independent from each other.

With these steps, you have ensured that your experiment is well planned out and balanced when looking at your factors.

Always randomize

At the start of this post, I said that an experiment should be planned out, yet random. Well we have the planned-out part, now let’s get into the random part.

In any experimentation, influence from external sources (variables you are not studying) should be kept to a minimum. One way to do this is randomizing your runs.

As an example, let’s look at the table above with the cooking example. Let’s say that it represents the order of how the experiment was run. So, all the low temperature runs were made together and then all the high ones together. This makes sense, right? Perform all the runs at one temperature before adjusting up to the next setting.

The problem is, what if there is an issue with your oven that causes the temperature to fluctuate more, early in the experiment and less later. This time-related issue introduces variation (bias) into your results that you didn’t know about.

To reduce the influence of this variable, randomize your run order. It may take more time adjusting your oven for every run, but it will remove that unwanted variation.

Temperature is a popular example to illustrate randomization. But this can be said of any factor that may have time-related problems. It could be warm-up time on a machine or the physical tiring of an operator. Randomization is used to guard against bias as much as you can when running an experiment.

Conclusions

Hopefully, you see now why I said to keep your experiments planned but random. It sounds like an oxymoron, but it’s not. Not in the way I’m talking about it here!