# One-Factor Categoric (pt 2)¶

## Digging Deeper Into Diagnostics¶

If your bowling data is active in Design-Expert® software from Part 1 of this tutorial, continue on. Otherwise, load the Bowling data by clicking on the Help menu and selecting Tutorial Data then the Bowling item. Then, under the Analysis branch (you may already be here) click the Score node and press the Diagnostics tab.

We’re now going to look at a new graph in the Diagnostics Tool. In the layout toolbar, select the single split icon to maximize the current plot.

Now select the DFFITS tab.

This statistic, which stands for diff**erence in **fits, measures the change in each predicted value that occurs when that response is deleted. The larger the absolute value of DFFITS, the more it influences the fitted model. (For more details on this statistic and the related deletion diagnostic, DFBETAS, see our program Help or refer to Raymond Myers’ Classical and Modern Regression with Applications, 2nd Edition (PWS Pub. Co., 1990).)

Notice that one point lies above the rest. (The pattern on your graph may differ from what we show here due to randomized run order, but this isn’t a concern in this discussion.) The top-most point is Mark’s high game, which earlier created controversy, particularly among competitors Pat and Shari. Mark’s point falls below the benchmark of plus-or-minus 1.22 for the DFFITS. So, taking all other diagnostics into consideration, we don’t advise that this particular run be investigated further. Nevertheless, for purposes of learning how to use new Design-Expert software features, right-click Mark’s top point with your mouse and select Highlight Point as shown below.

Myers demonstrates mathematically that the DFFITS statistic is really the externally studentized residual multiplied by high leverage points. Select the Leverage tab and you’ll see that all runs exhibit equal leverage here because an equal number of runs were made at each treatment level (all three bowlers rolled six games each).

Therefore, this DFFITS exhibits a pattern identical to that shown on the externally studentized residual graph, which you studied in the preceding tutorial. The reason we’re reviewing this is to set the stage for what you’ll do later in this tutorial – unbalance the leverages to make this session more significant for diagnostic purposes.

Note

Pop-Out View: Now is a good time to go back to the DFFITS plot and select Pop-Out View from the View menu.

Next go to the Resid. vs Run tab and verify the statement above that in this case these two plots (DFFITs and residuals versus run) exhibit the same pattern. (You may need to press Control-Tab to get the windows you want on the same screen.)

After comparing, close the pop-out window so your screen doesn’t get too cluttered.

Here’s one final Design-Expert software feature for you before we leave the Diagnostics. Select the Report tab to get a table of statistics case-by-case in standard order for the entire experiment. For those of you who prefer numbers over pictures (statisticians for sure!), this should satisfy your appetite. Notice that Mark’s high 195 game is highlighted in blue text as shown below. That’s because we highlighted it on the graphs. If there were outliers or other out of bounds runs, they would be colored on the graph, too.

Remember, you can right-click any value in reports of this nature within Design-Expert software to view context-sensitive Help with statistical details.

## Modifying the Design Layout¶

Design-Expert offers great flexibility when modifying data in its design layout. We’ll see in this bowling scenario how our software allows you to modify an existing design with added blocks and factor levels.

The outcome of the bowling match appears to be definitive, especially from Mark’s perspective. But Pat and Shari demand one more chance to prove themselves worthy of the team. They still think Mark’s high 195 game was a fluke, even though this isn’t supported by the diagnostic analysis. Mark objects and a dispute ensues.

Attempting compromise, the team captain decides to toss out the highest and lowest games for each of the three bowlers and replace them with two new scores each. But Ben, a newly hired programmer and avid bowler, arrives at the alley and is allowed to participate in this second block of runs. (Yes, this makes little sense, but it will add some interest to this tour of Design-Expert’s flexibility for design and analysis of experiments – no matter how convoluted they become in actuality.)

It quickly becomes apparent that this new kid does things differently. He’s a lefty with a huge hook that’s hard to control. To aggravate this variability, Ben does something very different from other bowlers – he does not put his thumb in the ball’s hole made for that purpose. When Ben’s odd approach works, the pins go flying. But as likely as not, that ball slides off into the left gutter or careens over the edge on the right.

The results for Ben and the three original bowling team candidates are below.

To enter this new data (and ignore some of the old), click the Design node in the navigation pane (left sidebar). You should now see the bowling data from the first tutorial. Mark’s high 195 game remains highlighted in blue text (assuming you clicked on it as instructed on page 17 of this tutorial while performing the diagnostics).

Right click the top-left column header and click Block. This design attribute is now needed to accommodate the new bowler’s (Ben’s) incoming score data.

Right click the Response column header and choose Sort Ascending.

Mark’s best game now drops to the very bottom. Let’s single him out first to placate Pat and Shari. Right-click the square button at the left of the last row (Mark’s 195 score). Click Set Row Status, then Ignore as shown below.

By the way, it’s OK to change your mind when modifying your design layout: You can ‘un-ignore’ a row by clicking Set Row Status, Normal.

Now let’s really get Pat’s and Shari’s hopes high by excluding their low games from consideration. Click the square button (in the Select column) to the left of the top row (Pat’s low 140 game) and, while pressing down the Shift key, also click the button in the Select column’s second row (Shari’s low 145 game). Release the Shift key. Keep your mouse within the Select column’s first or second row, right-click and choose Set Row Status, Ignore for these two low games, as shown below.

Now move down a few rows and click the square button in the Select column’s row showing Mark’s low 165 game.

Notice the two rows below Mark’s low 165 game – the high games for Shari (166) and Pat (167). It’s now time for Shari and Pat to pay the price for complaining. While first pressing and holding down the Shift key, click the following two square buttons in the Select column’s row: Shari’s high 166 game and Pat’s high 167 game. Release the Shift key. Three rows should now be highlighted in light blue as shown below. Keep your mouse within the Select column’s highlighted three rows, right-click and choose Set Row Status, Ignore.

Now let’s restore the original layout order. Right-click the Factor 1 (A: Bowler) column header, then choose Sort Ascending. Compare your screen with what we show below. If there are differences, fix them now to match this screenshot. However, remember that the run number is random, so you don’t need to fix that.

Now create a new block (needed for the second round of bowling) by clicking the “Block” column header, selecting “Levels”, then clicking the ellipsis next to “Block 1” under the Block section of Design Properties pane (on the left side of the screen).

You’ll see a form allowing you to assign names to the block(s). Don’t bother doing this now. As shown below, change Number of Levels to 2. Press the Tab key to see the change take effect.

Click OK. It seems that nothing changed, but actually the program now knows that you will be conducting another block of runs.

Now you are ready to begin adding and/or duplicating rows. This can be accomplished in different ways, depending on your ingenuity. We’ll follow routes revealing as many of the editing features as possible, although they may not demonstrate the most elegant approaches. As shown below, right click the Select column’s square button at the left of the first row (Pat’s 160 game) to bring up the editing menu. Then choose, Insert Row->Before This Row.

You now see a new row containing blanks for the bowler and the score. (Don’t worry if it’s being ignored – crossed out, that is – for the moment.) Click the first row’s block cell directly below the block field header, then click the list arrow.

Select Block 2 as shown below.

Click the blank field for bowler and press the list arrow. Select Pat. (We’re using categorical factors here, but if this were a numerical field, you’d enter a value.)

Again, right-click the Select column’s square button at the left of the first row to bring up the editing menu as shown below. Click Duplicate.

Design-Expert may pop up a warning like the one shown below.

The program is recognizing a potential problem here and is alerting you that only one bowler is in the second block. You need not worry at this stage because you will be adding others. Click the check option Do not show this warning again. Don’t worry – you will not be unprotected indefinitely. This warning will be re-enabled the next time you start the program.

Press OK to proceed.

Right-click the Block column header and choose Sort Ascending.

Two new rows are now seen at the bottom of your design layout. We need two new rows apiece for Shari and Mark. Let’s simply duplicate Pat’s two new rows and update the names. Do this by first clicking the Select column’s square button at the left of Pat’s first new row, so it is highlighted. Then while holding down the Shift key, click the Select column’s square button at the left of Pat’s second new row. Both rows should now be highlighted.

Now right-click any Select column’s square button at the left of the highlighted block and select Duplicate. (If the warning screen pops up again, click OK.)

In the first duplicated row, click the field for Bowler and select Mark.

Do the same for the last row. You now should have two new rows for both Pat and Mark. Click the Select column’s square button at the left of Mark’s first new row, so it is highlighted. Then while holding down the Shift key, click the Select column’s square button at the left of Mark’s second new row. Both rows should now be highlighted. As before, right-click any Select column’s square button at the left of the highlighted block and select Duplicate.

In the first duplicated row, click the field for Bowler and select Shari. Do the same for the last row.

But what about the new guy – Ben? We need to identify him as a new competitor in this bowling contest. Do this by clicking the header for factor 1, then selecting Levels (under Categoric Factor in the Design Properties pane), and clicking the ellipsis (…) to the right.

Change Number of Levels to 4 (see below), then click in the Name field for level 4 and enter Ben.

Press OK. Now duplicate two more rows by clicking the Select column’s square button at the left of the first of Shari’s two new games at the bottom of the list. While holding down the Shift key, click the Select column’s square button at the left of the last run (Control click to select multiple, non-consecutive rows). Finally, right-click any Select column’s square button at the left of the highlighted block and select Duplicate.

In both of these new duplicated rows, click the fields for Bowler and select Ben.

Note

An important aside: Always randomize your run orders for actual experiments. For purposes of this tutorial, this will just be a bother, so do this only if you wish to try it out, but it’s very easy to do – simply right-click the Run column-header and do this for Block 2 as shown.

How to randomize the run order in the second block

To make it easier to enter the results, double-click the Factor 1 (A: Bowler) column header and to Sort Ascending. Then double-click the Block column header to Sort Ascending. Now enter the eight new scores as shown below.

Go ahead now and re-analyze your data by clicking the Score node under Analysis. Move through Transform and click on the Effects tab. A warning pops up that the design is not “orthogonal.”

This is a mathematical artifact of our ad hoc addition of runs in a second block. It will not create any material impact on the outcome so just press on via OK and click the square appearing at the end of the green triangles (error estimates) on the half-normal plot of effects. This puts A-Bowler in your model.

Proceed to ANOVA (overlooking this model being not significant) and then to Diagnostics. Now go to the Normal Plot. As you will see, something is abnormal about this data. Do you notice that the residuals now line up very poorly, especially at the extreme points as shown below? On the floating Diagnostics Tool change Color by to A: Bowler.

Now (referring to the color key at the left of the plot) you see that the results from Ben do not fit with the others (his games are the two outliers – low and high). Considering his odd, unstable style of bowling, this should be no surprise. Click the Resid. vs Run button to bring up the externally studentized residuals – a good tool for detecting outliers. Drag your mouse over Ben’s residuals at the far right (if you have to, rope them each individually.

Both points should now be highlighted. We must ignore or delete them. (Sorry Ben, odd behavior by programmers is considered normal at Stat-Ease, but not when it comes to bowling!)

Click the Design node (upper left) to get back to the home base of the design layout. Notice that Ben’s games are conveniently highlighted in blue text so they can easily be deleted.

Note

Ignoring data: It provides no advantage in this case, which features only one response measure, but you can ignore a specific result by right-clicking that cell and setting Set Cell Status to Ignore as shown below.

Ignoring a single cell – an option that’s not recommended for this case

In this case you could ignore his entire runs (we explained how to do this earlier). Better yet, simply delete them altogether. No offense to Ben, but given that he only bowled two games and his unorthodox style creates such abnormal variability, it is best now to click the Select column’s square button at the left of his first score of 200 (making him feel really bad), shift-click the button below it for the second game of 130 (not so sorry to see this gone!), then without moving your mouse, right-click and select Delete Row(s).

Click Yes on the warning that pops up about categoric contrasts (a safety precaution). Then go ahead and re-analyze the results.

It turns out that the added games cause no change in the overall conclusions as to who’s the better bowler. Mark remains on top. It would now be appropriate to recover the low and high games for each bowler from block 1. Because this data was not deleted, only ignored, getting it back is simply a matter of right-clicking to the left of each of the six suspect rows and changing Set Row Status to Normal. (Or, if you’re adept at manipulating lines of text or data with your mouse, do all rows at once using a click and shift-click.) Give this a try! Then re-analyze one last time.

By working through this exercise, you now see how easy it is to manipulate Design-Expert’s design layout.

P.S. Still feeling bad about deleting Ben’s scores? Don’t worry – he gets to bowl with Pat and Shari in a lesser league. After bowling for an entire year (roughly 100 games), it will become clear whether Ben’s crazy way of bowling will pay off by achieving a good average overall. After all, his 2 game average of 165 wasn’t so bad, just inconsistent (high variability). With more data, his true ability will become more apparent.