Note

Screenshots may differ slightly depending on software version.

# Weibull Regression in Python (Stat-Ease 360® only)

## Introduction

This tutorial demonstrates the Weibull analysis of lifetime data using Stat-Ease 360® software tools for interfacing with Python. In this tutorial you will perform a Weibull regression on lifetime data using a Python graphical user interface (GUI) and export the regression equation for the mean lifetime to an equation-only response to be maximized using Stat-Ease 360’s numerical optimization.

Note

The prerequisites to completing this tutorial include a working Python installation with following packages: lifelines, PySimpleGUI, pandas, numpy, and matplotlib. If you haven’t already done so, you may wish to complete the Python Introduction before continuing. See the documentation for your particular Python installation for instructions on installing packages required by this tutorial.

## Example Experiment

An experiment was performed to increase the lifetime of cylindrical roller ball bearings by comparing the effects of standard (std) and modified (mod) types of outer ring osculation, inner ring heat treatment, and cage design. The lifetimes (Life) or times to failure in hours were measured for these ball bearing cylinders for each of the eight treatment combinations in a $$2^3$$ factorial design [Hel89].

Open the ball bearings lifetime data set in Stat-Ease 360 by selecting Help, Tutorial Data, and Ball Bearings. Next, click on the Design node to display the eight experimental runs,

## Preliminary Analysis

To demonstrate why one might wish to perform a non-traditional regression, we first perform a preliminary analysis of the ball bearing data set.

Click on the node for the first analysis, R1:Life, and then click on the Effects tab to examine the half-normal plot as shown below,

Examination of the half-normal plot indicates that the only significant factors are outer ring osculation (A), the inner ring heat treatment (B), and their interaction (AB). These terms have been selected in the plot. The Cage Design (C) is not significant. However, this assessment relies on the assumptions of normality and constant variance, both of which should be checked by fit diagnostics.

Click on the Diagnostics tab and examine the Normal Plot tab.

The normal plot of residuals indicates that the assumption of normality of residuals may be violated. The residuals don’t all fall on a roughly straight line that would indicate a normal distribution.

Next, examine the Resid. vs. Pred. tab.

The characteristic “megaphone” or “funnel” pattern in residuals versus predicted plot suggests that the assumption of constant variance important to standard regression is violated. The variance appears to be a function of the mean rather than constant. Larger predicted mean response corresponds to larger variance in this case.

This is typical of log-normal data and the Box-Cox plot recommends the log transformation,

At this stage, we could apply the recommended log transformation and analyze the transformed data. This traditional method often produces acceptable results, but at the expense of the added complication of a transformation. An alternative method is to perform an analysis that assumes a distribution more appropriate to the data. We turn to such a method next.

## Weibull Regression

One distribution widely used in the analysis of lifetime data is the Weibull distribution [RPMF22]. In what follows, we will leverage Stat-Ease 360’s ability to interface with Python to perform a Weibull regression using the lifelines Python package and its tools for Weibull regression [DP19]. See the lifelines online documentation for more information.

For the purpose of this tutorial, we suppose that the Life data for the ball bearing follow a Weibull distribution with probability of survival to time $$t$$ given by,

$S(t|\mathbf{x}) = \text{Survival Function} = \exp\left(~-\left(\frac{t}{\lambda(\mathbf{x})}\right)^\rho~\right), t > 0$

Here, $$\mathbf{x}' = (A,B,C,...)$$ is a point in the design space (factor settings), $$\rho$$ is the distribution’s shape parameter, and $$\lambda(\mathbf{x})$$ is the polynomial scale parameter function with coefficients $$\mathbf{\beta}$$ evaluated at $$\mathbf{x}$$,

$\lambda(\mathbf{x}) = \beta_0 + \beta_A A + \beta_B B+...+ \beta_{AB} A B + \beta_{BC} B C +...$

The parameters to be estimated in the regression are $$\rho$$, $$\beta_0, \beta_A, \beta_B, \beta_{AB},...$$, and so on, depending on the terms in the model.

Since the experimenters were interested in maximizing the average life of the ball bearings, the equation of interest is the mean lifetime. For the Weibull distribution the mean lifetime is,

$\hat{t}(\mathbf{x}) = \text{Mean lifetime} = e^{\lambda(\mathbf{x})}~\Gamma\left(1+1/\rho\right)$

Here, $$\Gamma$$ refers to the gamma function.

This is the equation that will be exported to Stat-Ease 360 once the parameters have been estimated.

Change the view back to Design and click the Python script editor icon (). This brings up the script editor with a blank sheet on the left for writing Python code and a sheet on the right for output.

Load the Weibull regression GUI example script by selecting Help and Examples and Weibull Regression from the script editor’s menu,

To run the script and launch the GUI, click the Python icon (), this time in the script editor toolbar (see previous screen shot).

This opens the Weibull Regression interface,

From left to right and down, the GUI consists of a section for selecting the model to use in the Weibull regression, a section for selecting the duration (lifetime) column, a section for available commands, and an output section at the bottom.

By default, the Weibull regression GUI starts with the main effects model selected, but it is informative to fit the highest model possible first. In the Model Term Selection list, add the interactions by clicking on them and then click the Analyze button in the commands section. Other than the yellow highlighting, the screen should appear as below,

The effects in yellow highlighting on the lower right all involve factor C:Cage Design. The analysis shows that the terms C, A*C, and B*C, are insignificant to the 0.05 level. (Note that the colon, in “A:C” for example, indicates an interaction term and it can be read “A*B”). So, there is no evidence that cage design or its interactions with the other factors affects the lifetime of the ball bearings. The other terms, including the Intercept, A, B, A*B, and $$\rho$$ are all significant at the 0.05 level or better. That is, there is some evidence that the outer ring osculation and inner ring heat treatment significantly affect ball bearing life. This finding is consistent with the preliminary analysis.

These tests are represented graphically on the left where the confidence intervals for the individual parameters are compared to zero. Parameters whose confidence interval crosses zero are insignificant at the 0.05 level. The horizontal axis is the log of the accelerated failure rate for each term. (Note: The accelerated failure rate for each parameter is just that parameter exponentiated, so taking the log gives the parameter value. The location of the little boxes in the plot correspond to the “coef” column in the table on the right and the ends of the confidence intervals correspond to the “coef lower 95%” and “coef upper 95%” columns, respectively.)

To reanalyze the model with only significant terms, click on C, A*C, and B*C to remove them from the model. Then click the Analyze button again,

This time the yellow highlighting picks out the coefficients. The coefficients are also displayed in full precision as part of the equation at the bottom of the screen on the right.

Since the first level for A:Outer Ring Osculation and B:Inner Ring Heat Treatment is the standard (std) level, a positive coefficient for A and B means that the modified (mod) design for these components significantly increases the lifetime of the ball bearings. The presence of the interaction, A*B, means that these two factors interact to increase the lifetime even more than the main effects alone.

Export the equation to Stat-Ease 360 by pressing the Export Equation button. A confirmation “Equation exported to response: Weibull-Life” should be displayed at the bottom of the screen on the right.

You can now close the the Weibull Regression GUI by clicking the Close button. Then close the Script Editor to return to the Design node.

## The Exported Equation

The spreadsheet now contains an additional response, R2:Weibull-Life. The values in this column are the predictions for the mean lifetime at the design points. There is also an additional response Analysis node for the Weibull-Life equation,

To view the exported equation, right click the R2:Weibull-Life column and select Simulate….

A page for selecting the simulation type is displayed first. The Weibull regression equation was exported as an equation only simulation so that it can be used immediately in numerical optimization,

Click the Next button to view the exported Weibull regression equation.

To review this equation in the future, simply repeat the steps above.

Click Finish to close the simulation dialog and then click on the R2:Weibull-Life analysis node. Because there is an interaction present, click on the Interaction graph. The setup should appear as below,

The interaction graph indicates that the lifetime (Weibull-Life) of the ball bearings is significantly greater when using the modified type for both the outer ring osculation (A) and the inner ring heat treatment (B) as opposed to using standard (std) for one and modified for the other. This is the interaction effect.

We next verify this result using Numerical Optimization.

## Numerical Optimization

A response analysis based only on an equation can be used immediately in optimization. Click on the Numerical Optimization node and then click on the R2:Weibull-Life column in the list. Choose the maximize goal type to maximize the mean lifetime of the roller ball bearings,

Click on the Solutions tab and Ramps. This view summarizes the result for the first solution,

In this case, the modified versions of the outer ring osculation, inner ring heat treatment, and cage design are the settings that maximize the lifetime of the roller ball bearings.

However, recall that the cage design (C) was excluded as insignificant in the Weibull regression. Use the Factors Tool to select the second solution from the Solution dropdown,

The only difference between the first and second solution is the setting for C:Cage Design, modified versus standard.

Finally, click on the Report tab to view the solutions in list form. The solutions are listed in descending order of desirability,

As with the ramps, we see that the two top solutions have the same predicted lifetime and differ only in the setting for the insignificant factor, C:Cage Design. This outcome was worthy of mention in the original study because one of the cage designs – we are not told which one – was much less expensive than the other. So, even discovering insignificant factors can have practical consequences.

## Summary and Conclusion

This tutorial has demonstrated how to use Stat-Ease 360’s Python interface and the lifelines Python package to obtain a Weibull equation that can be used in Stat-Ease 360’s numerical optimization.

The Weibull regression script also serves as an example of GUI creation in Python that can be modified for your own applications. For more information on graphical user interface creation in Python, see the PySimpleGUI documentation.

References

DP19

Cameron Davidson-Pilon. Lifelines: survival analysis in python. Journal of Open Source Software, 4(40):1317, 2019. URL: https://doi.org/10.21105/joss.01317, doi:10.21105/joss.01317.

HMvdW+20

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020. URL: https://doi.org/10.1038/s41586-020-2649-2, doi:10.1038/s41586-020-2649-2.

Hel89

C. Hellstrand. The necessity of modern quality improvement and some experience with its implementation in the manufacture of rolling bearings. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences., 327:529–537, 1989.

Hun07

J. D. Hunter. Matplotlib: a 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. doi:10.1109/MCSE.2007.55.

RPMF22

Steven E. Rigdon, Rong Pan, Douglas C. Montgomery, and Laura J. Freeman. Design of Experiments for Reliability Achievement. John Wiley & Sons, Inc., 2022. ISBN 978-1-1192-3769-3.

WesMcKinney10

Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, 56 – 61. 2010. doi:10.25080/Majora-92bf1922-00a.