Stat-Ease Blog

Beware of totally leveraged runs!

posted by Mark Anderson on Aug. 18, 2025

A challenge for a statistical sleuth

A few weeks ago, a process engineer hoping to glean a model of yield as function of 8 factors asked me to explain the failure by analysis of variance (ANOVA) to produce p values. See this deficiency on the left side of the software output shown in Figure 1. On the right side notice the dire warning about the fit statistics. The missing p’s and other non-available (“NA”) stats created great concern about the validity of the entire analysis.

Design-Expert software screenshot showing the right-click menu for a factor.

Figure 1: Alarming results in ANOVA and fit statistics

The tip-off for what went wrong can be found in the footnote: “Case(s) with leverage of 1.” After poring over the inputs, which stemmed from existing data—not from a designed experiment, I discovered that many of the rows had been duplicated. Removing these ‘dups’ left only 9 unique runs to fit a linear model featuring 8 coefficients for the 8 factors (main-effect slopes) plus 1 coefficient required for the intercept. The statistical software did the best it could from this ‘mission impossible.’ It did nothing wrong.

Creating total leverage as in this multifactor case can be likened to fitting a line to two points. It leaves no degrees of freedom (df) for estimating error (see this shown in the Figure 1 ANOVA). Thus, the F-test cannot be performed and, therefore, no p values can be estimated.

A model can be generated (barely!), but the lack of statistical tests provides no confidence in the outcome, literally (zero).

The remedy is very simple: Collect more data!

What is leverage?

Leverage is a numerical value between 0 and 1 that indicates the potential for a design point to influence the model fit. It’s strictly a function of the design itself—not the responses. Thus, leverage can be assessed before running the experiment.

A leverage of 1 means that the model will exactly fit the observation. That is never good because, unless that point falls exactly where it ought to be, your predictive model will be off kilter.

Leverage (“L”) is an easy statistic to master. It equals the number of coefficients in your model divided by the number of unique runs (dups do not count!).

You have seen what happens when all the runs are completely leveraged (L=1). But even one run at a leverage of 1 creates issues. For example, consider a hypothetical experiment aimed at establishing a linear fit of a key process attribute Y on a single factor X. The researchers intend to make 20 runs at two levels. However, due to circumstances beyond their control, they only achieve one run at high level. The 10 points at the low end come in at a leverage of 0.1 each, so none of them individually create much influence on the fitting. That’s good. But the single point at the highest level exhibits a leverage of 1, so it will be exactly fitted wherever it may go. That’s not good, but it may be OK if the result is where it ought to be. However, if something unusual happens at high level, there will be no way of knowing. I would be very skeptical of such an experiment—best to go for a complete ‘do over.’

Watch for leverages close to 1.0. Consider replicating these points, or make sure they are run very carefully.

What if no runs exhibit leverage of 1, but some are highly leveraged relative to others?

Some designs, such as standard two-level factorials with no center points, produce runs with equal leverage. However, others do not. For example, a two-level design on 4 factors with 4 center points features 16 runs with a leverage of 0.9875—far exceeding the center-point leverages of 0.05. Nevertheless, applying generally accepted guidelines that leverages less than 2 times the average cause no great concern, this design gets a pass—the average leverage being 0.8. A two-level design with center points is like a teeter-totter, points at the center are at the fulcrum and thus create very low leverage.

I advise you focus only on runs with leverage greater than 2 times the average leverage (or any with leverage of 1, of course). It is best to identify high-leverage points before running the experiment via a design evaluation and, if affordable, replicate them, thus reducing their leverage.

Do not be greatly concerned if leverages get flagged after you reduce insignificant terms from your model. For example, see the case study by our founder Pat Whitcomb in his article on “Bad Leverages” in the March 1998, Stat-Teaser—a must read if you want to get a good grasp on leverage.

Keep in mind that, despite being flagged for high leverage (2x average), a design point may generate a response that typifies how the process behaves at that setting. In that case it does not invalidate the model. Apply your subject matter and/or ask an expert colleague to be the judge of that.

General advice on leverage and situations to avoid

If you use standard DOE templates or optimal tools to lay out an experiment, it is unlikely that your design will include points with leverage over twice the average leverage. But, if you override the defaults and warnings in your software, issues with leverage can arise. For example, I often see published factorial designs with only 1 center point—not the 3 or 4 that our software advises. This creates a leverage of 1 for the curvature test—not good. Believe it or not, as a peer reviewer for a number of technical journals I’ve also seen many manuscripts that lay out the recommended number of center points for standard designs (e.g., 4 for a two-level factorial). But they all show the same results. As already explained, when it comes to leverage do not be duped by ‘dups.’

The dangers of happenstance data

I am particularly wary of historical data with runs done haphazardly (no plan). These often create a cloud of points at one end with very few at the opposite extreme. For example, see the scatter plot in Figure 2 (real data from a study of infection rates after varying number of days at various hospitals in the USA).

Figure 2: A real-life dataset with a badly leveraged point

In this case, the point at the upper right exhibits a leverage of 0.99 versus all the other 12 points averaging 0.17. If possible, replicating such a high-leverage point would be very helpful, thus reducing its leverage by half. Better yet, do two more replicates to reduce this problematic point’s leverage by one-third. Though not emerging as an outlier in the diagnostics (very unlikely for a highly leveraged point--it will be closely fitted), this particular result must be carefully evaluated and ignored if determined to be exceptional.

Conclusion

Pay attention to leverage, ideally before you complete your experiment, but if you are developing a model from existing data, do so in the diagnostics from your statistical software. Beware of totally leveraged runs—this being the worse-case scenario. If not quite this bad, watch for leverages more than twice the average—if possible, replicate them. Otherwise, apply engineering and scientific expertise to decide if the results can be accepted.

July Publication Roundup

posted by Rachel Poleke, Mark Anderson on Aug. 4, 2025

Here's the latest Publication Roundup! In these monthly posts, we'll feature recent papers that cited Design-Expert® or Stat-Ease® 360 software. Please submit your paper to us if you haven't seen it featured yet!

Featured Articles

Microwave-assisted extraction of bioactive compounds from Urtica dioica using solvent-based process optimization and characterization
Scientific Reports volume 15, Article number: 25375 (2025)
Authors: Anjali Sahal, Afzal Hussain, Ritesh Mishra, Sakshi Pandey, Ankita Dobhal, Waseem Ahmad, Vinod Kumar, Umesh Chandra Lohani, Sanjay Kumar

Mark's comments: Kudos to this team for deploying a Box-Behnken response-surface-method design--convenient by only requiring 3 levels of each of their 3 factors (power, time and sample-to-solvent ratio)--to optimize their process. Given all the raw data I was able to easily copy it out and import it into my Stat-Ease software and check into the modeling--no major issues uncovered. The authors did well by diagnosing residuals and making use of our numerical optimization tools to find the most desirable factor combination for their multiple-response goals.

Be sure to check out this important study, and the other research listed below!

More new publications from July

Improving the heterotrophic media of three Chlorella vulgaris mutants toward optimal color, biomass and protein productivity
Scientific Reports volume 15, Article number: 23325 (2025)
Authors: Mafalda Trovão, Miguel Cunha, Gonçalo Espírito Santo, Humberto Pedroso, Ana Reis, Ana Barros, Nádia Correia, Lisa Schüler, Monya Costa, Sara Ferreira, Helena Cardoso, Márcia Ventura, João Varela, Joana Silva, Filomena Freitas, Hugo Pereira
Calibration and establishment for the discrete element simulation parameters of pepper stem during harvest period
Scientific Reports volume 15, Article number: 21143 (2025)
Authors: Jiaxuan Yang, Jin Lei, Xinyan Qin, Zhi Wang, Jianglong Zhang, Lijian Lu
Design Expert Software Being Used to Explore the Factors Affecting the “Water Garden”
American Journal of Analytical Chemistry, 16, 107-116
Authors: Zelin Miu, Yichen Lu
Quality improvement of recycled carbon black from waste tire pyrolysis for replacing carbon black N330
Scientific Reports volume 15, Article number: 23726 (2025)
Authors: Tawan Laithong, Tarinee Nampitch, Peerapon Ourapeepon, Natacha Phetyim
Development and in vitro characterization of embelin bilosomes for enhanced oral bioavailability
Journal of Research in Pharmacy, Year 2025, Volume: 29 Issue: 4, 1616 - 1626, 05.07.2025
Authors: Shreya Firake Devanshi Pethani Jeet Patil Avinash Bhujbal Rahul Gondake Dhanashree Sanap , Sneha Agrawal
Performance optimization and mechanism research of C20 coal gangue concrete based on response surface and water resistance
AIP Advances, 15, 075316 (2025)
Authors: Yong Cui, Xiwen Yin, Qiuge Yu
Multi-objective optimization of boiler combustion efficiency and emissions using genetic algorithm and recurrent neural network in 660-MW coal-fired power plant
Eastern-European Journal of Enterprise Technologies, 3(8 (135), 23–33
Authors: Mohamad Arwan Efendy, Ahmad Syihan Auzani, Sholahudin Sholahudin
Optimization and characterization of polyhydroxybutyrate produced by Vreelandella piezotolerans using orange peel waste
Scientific Reports volume 15, Article number: 25873 (2025)
Authors: Mahmoud H. Hendy, Amr M. Shehabeldine, Amr H. Hashem, Ahmed F. El-Sayed, Hussein H El-Sheikh
Multi-functional electrodialysis process to treat hyper-saline reverse osmosis brine: producing high value-added HCl, NaOH and energy consumption calculation
Environmental Sciences Europe volume 37, Article number: 121 (2025)
Authors: Haia M. Elsayd, Gamal K. Hassan, Ahmed A. Affy, M. Hanafy, Tamer S. Ahmed
Quality by Design-Based Method for Simultaneous Determination of Glimepiride and Lovastatin in Self-Nano Emulsifying Drug Delivery System
Separation Science Plus, 8: e70097
Authors: Priyanka Paul, Raj Kamal, Thakur Gurjeet Singh, Ankit Awasthi, Rohit Bhatia
Sustainable Adsorbents for Wastewater Treatment: Template-Free Mesoporous Silica from Coal Fly Ash
Chemical Engineering & Technology, 48: e70077
Authors: Thapelo Manyepedza, Emmanuel Gaolefufa, Gaone Koodirile, Dr. Isaac N. Beas, Dr. Joshua Gorimbo, Bakang Modukanele, Dr. Moses T. Kabomo

June 2025 Publication Roundup

posted by Rachel Poleke, Mark Anderson on July 1, 2025

Featured Article

Orange-Fleshed Sweet Potatoes, Grain Amaranth, Biofortified Beans, and Maize Composite Flour Formulation Optimization and Product Characterization
Food Science and Nutrition, Volume 13, Issue 6. June 2025.
Authors: Julius Byamukama, Robert Mugabi, Dorothy Nakimbugwe, John Muyonga

Mark's comments: "It's good to see response surface methods for optimization of food recipes via mixture design. I appreciate publications that include all the data needed to assess the predictive modeling. Kudus to EU for funding research like this that alleviates malnutrition in vulnerable populations."

Be sure to check out this important study, and the other research listed below!

More new publications from June

Optimization and induction effect evaluation of complex inducer of Aquilaria sinensis based on factorial design
Scientific Reports, volume 15, Article number: 19656 (2025)
Authors: Qiuyue Ding, Baoyi Qin, Shimin Deng, Jie Chen, Ziwei Liu, Weiping Zhou, Xiaoying Chen, Weimin Zhang, Xin Zhou, Xiaoxia Gao
Adsorption of crystal violet using thiazolium ionic liquid-crosslinked alginate hydrogels: Modelling using Box-Behnken experimental design
International Journal of Biological Macromolecules, Volume 318, Part 1, July 2025, 144951
Authors: Merve Ceylan, Jülide Hızal, Elif Nur Özer, Ivaylo Tankov, Rumyana Yankova
High-Performance Ultrathin Membrane with Molecular Arrangement Ordered for Proton Exchange Membrane
ACS Applied Polymer Materials, published June 10, 2025
Authors: Yuqing Zhang, Ailing Zhang, Kaixiang Zhou, Yongjiang Li
Impact of addition of polyvinyl chloride on the properties of clayey soil using experimental approach and optimization for geotechnical engineering applications
Scientific Reports, volume 15, Article number: 19901 (2025)
Authors: Ghania Boukhatem, Said Berdoudi, Messaouda Bencheikh, Mohammed Benzerara, Mehmet Serkan Kırgız, N. Nagaprasad, Krishnaraj Ramaswamy
Copper Corrosion in Blended Diesel-Biodiesel: Corrosion Rate Evaluation and Characterization
Chemical Engineering & Technology, e70053, 12 June 2025
Authors: Lekan Taofeek Popoola, Celestine Chidi Nwogbu, Usman Taura, Yuli Panca Asmara, Alfred Ogbodo Agbo, Paul C. Okonkwo
The Effects of Novel Thymoquinone-Loaded Nanovesicles as a Promising Avenue to Modulate Autism Associated Dysregulation by Restoring Oxidative Stress in Autism in Mice
International Journal of Nanomedicine, 24 June 2025 Volume 2025:20 Pages 8041—8061
Authors: Nermin Eissa, Jana K Alwattar, Petrilla Jayaprakash, Dana Chkier, Aala Osama Ahmed, Anum Ahmed, Rameen Rizwan, Sulthan Mujeeb, Mohamad Rahal, Bassem Sadek

May 2025 Publication Roundup

posted by Rachel Poleke, Mark Anderson on June 2, 2025

Featured Article

Enhanced 4-chlorophenol adsorption from aqueous solution using eco-friendly nanocomposite
Ecological Engineering & Environmental Technology, 26(5), pp.174-189
Authors: Fadia A. Sulaiman, Rasha Khalid Sabri Mhemid, Noor A. Mohammed

Mark's comments: It is great to see the application of response surface methods (RSM) to reduce the release of toxic chlorophenols to our environment, particularly via such an eco-friendly process utilizing a natural polymer--xanthan gum. The 3D graphics are compelling and well supported by the reported statistics. I also appreciate that all the raw data is including, making it possible for me to reproduce the results.

Be sure to check out this important study, and the other research listed below!

More new publications from May

Design of Experiments Assisted Formulation Optimization and Evaluation of Efavirenz Solid Dispersion Adsorbate for Improvement in Dissolution and Flow Properties
Drug Design, Development and Therapy, 2025;19:3715-3734
Authors: Mujtaba MA, Rashid MA, Alhamhoom Y, Gangane P, Jagtap MJ, Akbar MJ, Wathore SA, Kaleem M, Elhassan GO, Khalid M
Design of Experiments Approach for the Development of a Validated UPLC-Q-ToF/MS Method to Quantitate Soy-Derived Bioactive Peptide Lunasin in Rabbit Plasma: Application to a Pharmacokinetic Study
Biomedical Chromatography, 2025, 39: e70098
Authors: Kowmudi, G., Anoop, K., Varshini, M., Nagappan, K., Konanki, S., Praveen, T
Development of a Stability-Indicating RP-HPLC Method for Pioglitazone in Cubosomal and Biological Matrices: A Quality by Design-Driven, Lean Six Sigma, and Green Chemistry Approach
Separation Science Plus, 2025, 8: e70055
Authors: Vaibhavi D. Torgal, Vinayak Mastiholimath, Rahul Koli
Optimization process of coffee pulp wines combined with the artificial neural network and response surface methodology
Scientific Reports, volume 15, Article number: 16684 (2025)
Authors: Rongsuo Hu, Fei Xu, Liyan Zhao, Wenjiang Dong
Formulation optimization of furosemide floating-bioadhesive matrix tablets using waste-derived Citrus aurantifolia peel pectin as a polymer
Scientific Reports, volume 15, Article number: 16704 (2025)
Authors: Ebrahim Abdela Siraj, Yohannes Mulualem, Fantahun Molla, Ashagrachew Tewabe Yayehrad, Anteneh Belete
Bio-induced overproduction of heterocycloanthracin-like bacteriocin in Lysinibacillus macroides by Aspergillus austroafricanus: optimization of medium conditions and evaluation of potential applications
BMC Biotechnology volume 25, Article number: 41 (2025)
Authors: Philomena Edet, Maurice Ekpenyong, Atim Asitok, David Ubi, Cecilia Echa, Uwamere Edeghor, Sylvester Antai
Structural characterization, gelling properties, and beef preservation applications of pectin extracted from sweetpotato residue using a hydrothermal method
International Journal of Biological Macromolecules, Volume 314, 2025, 144348
Authors: Linchong Hui, Chan Zhang, Junjie Yu, Man Liu, Kunlong Yang, Ling Shen, Bingqian Hu, Jun Tian, Yong-xin Li
Calibration of soil contact parameters for planting sand shrubs in the desert regions of Inner Mongolia
Scientific Reports volume 15, Article number: 17231 (2025)
Authors: Zhang Nannan, Pei Chenghui, Zhang Yantang, Cui Shaoyu, Liang Lingzhi, Liu Zhigang
Development of a novel tailor-made cocktail from recombinant crude enzymes for efficient saccharification of pretreated elephant grass
International Journal of Biological Macromolecules, 26 May 2025, 144645
Authors: Aishwarya Aishwarya, Arun Goyal
Sustainable discharge printing of marigold-dyed cotton with eucalyptus wood ash extract and its optimisation by response surface methodology
Coloration Technology, first published: 27 May 2025
Authors: Harshal Patil, Devansh Chaudhari, Ashok Athalye
Chitosan-coated nanostructured lipid carriers of amantadine for nose-to-brain delivery: formulation optimization, in vitro-ex vivo characterization, and in vivo anti-parkinsonism assessment
International Journal of Biological Macromolecules, Volume 316, Part 2, June 2025, 144497
Authors: Archita Kapoor, Abdul Hafeez, Poonam Kushwaha, Nargis Ara

Salvaging a designed experiment via covariate analysis

posted by Mark Anderson on May 16, 2025

Ideally all variables other than those included in an experiment are held constant or blocked out in a controlled fashion. However, sometimes a variable that one knows will create an important effect, such as ambient temperature or humidity, cannot be controlled. In such cases it pays to collect measurements run by run. Then the results can be analyzed with and without this ‘covariate.’

Douglas Montgomery provides a great example of analysis of covariance in section 15.3 of his textbook Design and Analysis of Experiments. It details a simple comparative experiment aimed at assessing the breaking strength in pounds of monofilament-fiber produced by three machines. The process engineer collected five samples at random from each machine, measuring the diameter of each (knowing this could affect the outcome) and testing them out. The results by machine are shown below with the diameters, measured in mils (thousandths of an inch), provided in the parentheses:

36 (20), 41 (25), 39 (24), 42 (25), 49 (32)
40 (22), 48 (28), 39 (22), 45 (30), 44 (28)
35 (21), 37 (23), 42 (26), 34 (21), 32 (15)

The data on diameter can be easily captured via a second response column alongside the strength measures. Montgomery reports that “there is no reason to believe that machines produce fibers of different diameters.” Therefore, creating a new factor column, copying in the diameters and regressing out its impact on strength leads to a clearer view of the differences attributed to the machines.

I will now show you the procedure for handling a covariate with Stat-Ease software. However, before doing so, analyze the experiment as planned and save this work so you can do a before and after comparison.

Figure 1 illustrates how to insert a new factor. As seen in the screenshot, I recommend this be done before the first controlled factor.

Figure 1: Inserting a new factor column for the covariate entered initially as a response

The Edit Info dialog box then appears. Type in the name and units of measure for the covariate and the actual range from low to high.

Figure 2: Detailing the covariate as a factor, including the actual range

Press “Yes” to confirm the change in actual values when the warning pops up.

Warning box for changing actual values to coded values.

Figure 3: Warning about actual values.

After the new factor column appears, the rows will be crossed out. However, when you copy over the covariate data, the software stops being so ‘cross’ (pun intended).

Press ahead to the analysis. Include only the main effect of the covariate in your model. The remainder of the terms involving controlled factors may go beyond linear if estimable. As a start, select the same terms as done before adding the covariate.

In this case, the model must be linear due to there being only one factor (machine) and it being categorical. The p-value on the effect increases from 0.0442 (significant at p<0.05) with only the machine modeled—not the diameter—to 0.1181 (not significant!) with diameter included as a covariate. The story becomes even more interesting by viewing the effects plots.

Effect plot for Strength without covariate.

Figure 4: No covariate.

Effect plot for Strength with covariate.

Figure 5: With covariate accounted for.

You can see that the least significant difference (LSD) bars decrease considerably from Figure 4 to Figure 5 without and with the covariate; respectively. That is a good sign—the fitting becomes far more precise by taking diameter (the covariate) into account. However, as Montgomery says, the process engineer reaches “exactly the opposite conclusion”—Machine 3 looking very weak (literally!) without considering the monofilament diameter, but when doing the covariate analysis, it becomes more closely aligned with the other two machines.

In conclusion, this case illustrates the value of recording external variables run-by-run throughout your experiment whenever possible. They then can be studied via covariate analysis for a more precise model of your factors and their effects.

This case is a bit tricky due to the question of whether fiber strength by machine differs due to them producing differing diameters, in which case this should be modeled as the primary response. A far less problematic example would be an experiment investigating the drying time of different types of paint in an uncontrolled environment. Obviously, the type of paint does not affect the temperature or humidity. By recording ambient conditions, the coating researcher could then see if they varied greatly during the experiment and, if so, include the data on these uncontrolled variables in the model via covariate analysis. That would be very wise!

PS: Joe Carriere, a fellow consultant at Stat-Ease, suggested I discuss this topic—very appealing to me as a chemical process engineer. He found the monofilament machine example, which I found very helpful (also good by seeing agreement in statistical results between our software and the one used by Montgomery).

PPS: For more advice on covariates, see this topic Help.

Stat-Ease Blog

Categories

Beware of totally leveraged runs!

A challenge for a statistical sleuth

What is leverage?

What if no runs exhibit leverage of 1, but some are highly leveraged relative to others?

General advice on leverage and situations to avoid

The dangers of happenstance data

Conclusion

July Publication Roundup

Featured Articles

More new publications from July

June 2025 Publication Roundup

Featured Article

More new publications from June

May 2025 Publication Roundup

Featured Article

More new publications from May

Salvaging a designed experiment via covariate analysis