Note

Screenshots may differ slightly depending on software version.

This tutorial demonstrates the Weibull analysis of lifetime data using Stat-Ease 360^{®} software tools for interfacing with Python. In this
tutorial you will perform a Weibull regression on lifetime data using a Python graphical user interface (GUI) and export the regression equation for the mean
lifetime to an equation-only response to be maximized using Stat-Ease 360’s numerical optimization.

Note

The prerequisites to completing this tutorial include a working Python installation with following packages: `lifelines`

, `PySimpleGUI`

,
`pandas`

, `numpy`

, and `matplotlib`

. If you haven’t already done so, you may wish to complete the
Python Introduction before continuing. See the documentation for your particular Python installation for
instructions on installing packages required by this tutorial.

An experiment was performed to increase the lifetime of cylindrical roller ball bearings by comparing the effects of standard (std) and modified (mod) types of outer ring osculation, inner ring heat treatment, and cage design. The lifetimes (Life) or times to failure in hours were measured for these ball bearing cylinders for each of the eight treatment combinations in a \(2^3\) factorial design [Hel89].

Open the ball bearings lifetime data set in Stat-Ease 360 by selecting **Help, Tutorial Data**, and **Ball Bearings**. Next, click on the **Design** node
to display the eight experimental runs,

To demonstrate why one might wish to perform a non-traditional regression, we first perform a preliminary analysis of the ball bearing data set.

Click on the node for the first analysis, **R1:Life**, and then click on the **Effects** tab to examine the half-normal plot as shown below,

Examination of the half-normal plot indicates that the only significant factors are outer ring osculation (A), the inner ring heat treatment (B), and their interaction (AB). These terms have been selected in the plot. The Cage Design (C) is not significant. However, this assessment relies on the assumptions of normality and constant variance, both of which should be checked by fit diagnostics.

Click on the **Diagnostics** tab and examine the **Normal Plot** tab.

The normal plot of residuals indicates that the assumption of normality of residuals may be violated. The residuals don’t all fall on a roughly straight line that would indicate a normal distribution.

Next, examine the **Resid. vs. Pred.** tab.

The characteristic “megaphone” or “funnel” pattern in residuals versus predicted plot suggests that the assumption of constant variance important to standard regression is violated. The variance appears to be a function of the mean rather than constant. Larger predicted mean response corresponds to larger variance in this case.

This is typical of log-normal data and the Box-Cox plot recommends the log transformation,

At this stage, we could apply the recommended log transformation and analyze the transformed data. This traditional method often produces acceptable results, but at the expense of the added complication of a transformation. An alternative method is to perform an analysis that assumes a distribution more appropriate to the data. We turn to such a method next.

One distribution widely used in the analysis of lifetime data is the Weibull distribution [RPMF22]. In what follows, we will
leverage Stat-Ease 360’s ability to interface with Python to perform a Weibull regression using the `lifelines`

Python package and its tools for
Weibull regression [DP19]. See the lifelines online documentation for more information.

For the purpose of this tutorial, we suppose that the Life data for the ball bearing follow a Weibull distribution with probability of survival to time \(t\) given by,

\[S(t|\mathbf{x}) = \text{Survival Function} = \exp\left(~-\left(\frac{t}{\lambda(\mathbf{x})}\right)^\rho~\right), t > 0\]

Here, \(\mathbf{x}' = (A,B,C,...)\) is a point in the design space (factor settings), \(\rho\) is the distribution’s shape parameter, and \(\lambda(\mathbf{x})\) is the polynomial scale parameter function with coefficients \(\mathbf{\beta}\) evaluated at \(\mathbf{x}\),

\[\lambda(\mathbf{x}) = \beta_0 + \beta_A A + \beta_B B+...+ \beta_{AB} A B + \beta_{BC} B C +...\]

The parameters to be estimated in the regression are \(\rho\), \(\beta_0, \beta_A, \beta_B, \beta_{AB},...\), and so on, depending on the terms in the model.

Since the experimenters were interested in maximizing the average life of the ball bearings, the equation of interest is the mean lifetime. For the Weibull distribution the mean lifetime is,

\[\hat{t}(\mathbf{x}) = \text{Mean lifetime} = e^{\lambda(\mathbf{x})}~\Gamma\left(1+1/\rho\right)\]

Here, \(\Gamma\) refers to the gamma function.

This is the equation that will be exported to Stat-Ease 360 once the parameters have been estimated.

Change the view back to **Design** and click the Python script editor icon (). This brings up the script editor with a blank sheet
on the left for writing Python code and a sheet on the right for output.

Load the Weibull regression GUI example script by selecting **Help** and **Examples** and **Weibull Regression** from the script editor’s menu,

To run the script and launch the GUI, click the Python icon (), this time in the script editor toolbar (see previous screen shot).

This opens the Weibull Regression interface,

From left to right and down, the GUI consists of a section for selecting the model to use in the Weibull regression, a section for selecting the duration (lifetime) column, a section for available commands, and an output section at the bottom.

By default, the Weibull regression GUI starts with the main effects model selected, but it is informative to fit the highest model possible first. In the **Model Term Selection**
list, add the interactions by clicking on them and then click the **Analyze** button in the commands section. Other than the yellow highlighting, the screen should appear as below,

The effects in yellow highlighting on the lower right all involve factor C:Cage Design. The analysis shows that the terms C, A*C, and B*C, are insignificant to the 0.05 level. (Note that the colon, in “A:C” for example, indicates an interaction term and it can be read “A*B”). So, there is no evidence that cage design or its interactions with the other factors affects the lifetime of the ball bearings. The other terms, including the Intercept, A, B, A*B, and \(\rho\) are all significant at the 0.05 level or better. That is, there is some evidence that the outer ring osculation and inner ring heat treatment significantly affect ball bearing life. This finding is consistent with the preliminary analysis.

These tests are represented graphically on the left where the confidence intervals for the individual parameters are compared to zero. Parameters whose confidence interval crosses zero are insignificant at the 0.05 level. The horizontal axis is the log of the accelerated failure rate for each term. (Note: The accelerated failure rate for each parameter is just that parameter exponentiated, so taking the log gives the parameter value. The location of the little boxes in the plot correspond to the “coef” column in the table on the right and the ends of the confidence intervals correspond to the “coef lower 95%” and “coef upper 95%” columns, respectively.)

To reanalyze the model with only significant terms, click on C, A*C, and B*C to remove them from the model. Then click the **Analyze** button again,

This time the yellow highlighting picks out the coefficients. The coefficients are also displayed in full precision as part of the equation at the bottom of the screen on the right.

Since the first level for A:Outer Ring Osculation and B:Inner Ring Heat Treatment is the standard (std) level, a positive coefficient for A and B means that the modified (mod) design for these components significantly increases the lifetime of the ball bearings. The presence of the interaction, A*B, means that these two factors interact to increase the lifetime even more than the main effects alone.

Export the equation to Stat-Ease 360 by pressing the **Export Equation** button. A confirmation “Equation exported to response: Weibull-Life” should be displayed at the bottom of the
screen on the right.

You can now close the the Weibull Regression GUI by clicking the **Close** button. Then close the Script Editor to return to the **Design** node.

The spreadsheet now contains an additional response, **R2:Weibull-Life**. The values in this column are the predictions for the mean lifetime at the design points. There is also
an additional response **Analysis** node for the Weibull-Life equation,

To view the exported equation, right click the **R2:Weibull-Life** column and select **Simulate…**.

A page for selecting the simulation type is displayed first. The Weibull regression equation was exported as an equation only simulation so that it can be used immediately in numerical optimization,

Click the **Next** button to view the exported Weibull regression equation.

To review this equation in the future, simply repeat the steps above.

Click **Finish** to close the simulation dialog and then click on the **R2:Weibull-Life** analysis node. Because there is an interaction present,
click on the **Interaction** graph. The setup should appear as below,

The interaction graph indicates that the lifetime (Weibull-Life) of the ball bearings is significantly greater when using the modified type for both the outer ring osculation (A) and the inner ring heat treatment (B) as opposed to using standard (std) for one and modified for the other. This is the interaction effect.

We next verify this result using **Numerical Optimization**.

A response analysis based only on an equation can be used immediately in optimization. Click on the **Numerical Optimization** node and
then click on the **R2:Weibull-Life** column in the list. Choose the maximize goal type to maximize the mean lifetime of the roller ball bearings,

Click on the **Solutions** tab and **Ramps**. This view summarizes the result for the first solution,

In this case, the modified versions of the outer ring osculation, inner ring heat treatment, and cage design are the settings that maximize the lifetime of the roller ball bearings.

However, recall that the cage design (C) was excluded as insignificant in the Weibull regression. Use the **Factors Tool** to select the
second solution from the **Solution** dropdown,

The only difference between the first and second solution is the setting for C:Cage Design, modified versus standard.

Finally, click on the **Report** tab to view the solutions in list form. The solutions are listed in descending order of desirability,

As with the ramps, we see that the two top solutions have the same predicted lifetime and differ only in the setting for the insignificant factor, C:Cage Design. This outcome was worthy of mention in the original study because one of the cage designs – we are not told which one – was much less expensive than the other. So, even discovering insignificant factors can have practical consequences.

This tutorial has demonstrated how to use Stat-Ease 360’s Python interface and the `lifelines`

Python package to obtain a Weibull equation that can be used in Stat-Ease 360’s
numerical optimization.

The Weibull regression script also serves as an example of GUI creation in Python that can be modified for your own applications. For more information on graphical user interface creation in Python, see the PySimpleGUI documentation.

References

- DP19
Cameron Davidson-Pilon. Lifelines: survival analysis in python.

*Journal of Open Source Software*, 4(40):1317, 2019. URL: https://doi.org/10.21105/joss.01317, doi:10.21105/joss.01317.- HMvdW+20
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy.

*Nature*, 585(7825):357–362, September 2020. URL: https://doi.org/10.1038/s41586-020-2649-2, doi:10.1038/s41586-020-2649-2.- Hel89
C. Hellstrand. The necessity of modern quality improvement and some experience with its implementation in the manufacture of rolling bearings.

*Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.*, 327:529–537, 1989.- Hun07
J. D. Hunter. Matplotlib: a 2d graphics environment.

*Computing in Science & Engineering*, 9(3):90–95, 2007. doi:10.1109/MCSE.2007.55.- RPMF22
Steven E. Rigdon, Rong Pan, Douglas C. Montgomery, and Laura J. Freeman.

*Design of Experiments for Reliability Achievement*. John Wiley & Sons, Inc., 2022. ISBN 978-1-1192-3769-3.- WesMcKinney10
Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors,

*Proceedings of the 9th Python in Science Conference*, 56 – 61. 2010. doi:10.25080/Majora-92bf1922-00a.