Tutorial: O-ring Failure Rates Prior to the Challenger Shuttle Loss

Coping with missing information

In this tutorial, we will use a real data set where unwise interpretation of incomplete data had serious consequences to illustrate how such selection effects could be modeled. You will

Background

On January 28, 1986, the Space Shuttle Challenger was destroyed in an explosion during launch. The cause was eventually found to be the failure of an O-ring seal that normally prevents hot gas from leaking between two segments of the solid rocket motors during their burn. The ambient atmospheric temperature of just 36 degrees Fahrenheit, significantly colder than any previous launch, was determined to be a significant factor in the failure.

A relevant excerpt from the Report of the Presidential Commission on the Space Shuttle Challenger Accident reads:

Temperature Effects

The record of the fateful series of NASA and Thiokol meetings, telephone conferences, notes, and facsimile transmissions on January 27th, the night before the launch of flight 51L, shows that only limited consideration was given to the past history of O-ring damage in terms of temperature. The managers compared as a function of temperature the flights for which thermal distress of O-rings had been observed-not the frequency of occurrence based on all flights (Figure 6). In such a comparison, there is nothing irregular in the distribution of O-ring "distress" over the spectrum of joint temperatures at launch between 53 degrees Fahrenheit and 75 degrees Fahrenheit. When the entire history of flight experience is considered, including"normal" flights with no erosion or blow-by, the comparison is substantially different (Figure 7).

This comparison of flight history indicates that only three incidents of O-ring thermal distress occurred out of twenty flights with O-ring temperatures at 66 degrees Fahrenheit or above, whereas, all four flights with O-ring temperatures at 63 degrees Fahrenheit or below experienced O-ring thermal distress.

Consideration of the entire launch temperature history indicates that the probability of O-ring distress is increased to almost a certainty if the temperature of the joint is less than 65.

Top: number of incidents as a function of temperature, showing only launches with at least 1 incident; Bottom: same, including launches that suffered 0 incidents

The data above show the number of incidences of O-ring damage found in previous missions as a function of the temperature at launch; these have been transcribed below.

The data in the figure above are transcribed and read into an array here. We store the launch temperatures in oring_temps and the corresponding number of incidents in oring_incidents.

Here's a quick plot to show that we did that right (cf above).

For this notebook, we will simplify the data for each launch from integer (how many incidents of O-ring damage) to boolean (was there any damage, or not). This cell stores the temperatures corresponding to "failure" (any incidents) and "success" (no incidents).

1. Defining a model

Before worrying about missing data, let's define a model that we might want to fit to the complete data. We're interested in whether the probability of having zero O-ring incidents (or non-zero incidents, conversely) is a function of temperature. One possible parametrization that allows this is the logistic function, which squeezes the real line onto the range (0,1).

For reasons that may be clear later, I suggest defining the model in terms of the probability of success (zero incidents)

$P_\mathrm{success}(T|T_0,\beta,P_\mathrm{cold},P_\mathrm{hot}) = P_\mathrm{cold} + \frac{P_\mathrm{hot} - P_\mathrm{cold}}{1 + e^{-\beta(T-T_0)}}$,

with parameters $T_0$ and $\beta$ respectively determining the center and width of the logistic function, and $P_\mathrm{cold}$ and $P_\mathrm{hot}$ determine the probabilities of success at very low and high temperatures (which need not be 0 or 1).

As we'll see in a moment, a model like this provides a smooth, linear-ish transition between two extreme values, without imposing the strong prior that $P_\mathrm{success}$ must drop to zero at some point, for example.

1a. Implement this function and have a look

Plot the function for a few different parameter values. If you've never worked with the logistic function (or a similar sigmoid function) before, this will give you an idea of how flexible it is.

1b. PGM and priors

Given the definition of the data and model above, draw the PGM for this problem, write down the corresponding probability expressions, and write down the likelihood (all assuming we have the complete data set).

Choosing priors is a little tricky because we're interested in the model's predictions at $T=36$ degrees F, which is an extrapolation even for the complete data set.

We'd like our model to be consistent with no trend a priori - that way we can see relatively straightforwardly whether the data require there to be a trend. A pleasingly symmetric way to allow this is to put identical, independent priors on $P_\mathrm{cold}$ and $P_\mathrm{hot}$, in particular including the possibility that $P_\mathrm{cold} > P_\mathrm{hot}$ even though that isn't what we're looking for. Thus, a solution with $P_\mathrm{cold}=P_\mathrm{hot}$, i.e. no trend, is perfectly allowed.

Our temperature data are given in integer degrees, so it doesn't make sense to allow values of $\beta$ too much greater than 1, since the data would not resolve such a sudden change (which would increasingly make $P_\mathrm{success}$ resemble a step function). By definition, $\beta>0$ (it's a "rate" parameter).

In principle, we might allow $T_0$ to take any value. But, arguably, the most sensible thing we can do with such limited information is test whether there is evidence for a trend in the probability of O-ring failure within the range of the available data (or, a little more casually, the range of the figure from the report, above). Given the flexibility already provided by the choices above, there's little obvious benefit to allowing $T_0$ to vary more than this.

In summary, my suggestion is the following uniform priors:

As always, you're welcome to mess around with other priors if you disagree. However, for the work you turn in, use the priors above.

Implement a log-prior function below.

1c. Model fitting code

Since the point of this tutorial is model design rather than carrying out a fit, a bunch of code is given below. Naturally, you should ensure that you understand what the code is doing, even though there's nothing to add.

Here we follow a similar, though simpler, approach to the object oriented code used in the model evaluation/selection notebooks, since the models we'll compare all have the same set of free parameters. The Model object will take log-prior and log-likelihood functions as inputs in its constructor (instead of deriving new classes corresponding to different likelihoods), and will deal with the computational aspects of fitting the parameters. It will also provide a posterior prediction for the thing we actually care about, the failure probability at a given temperature. To do this, we need to marginalize over the model parameters; that is, we compute the posterior-weighted average of $1-P_\mathrm{success}$, at some temperature of interest, over the parameter space.

2. Solution for complete data

First, let's see what the solution looks like when there are no missing data. Complete the likelihood function appropriate for a complete data set below.

Now we put the Model code to work. The default options below should work well enough, but do keep an eye the usual basic diagnostics as provided below, and make any necessary changes. First we instantiate the model...

... and run the fit. Note that the parameters are not likely to be individually well contrained by the data, compared with the prior. We don't necessarily care about this - the important question is what the posterior predictive distribution for the probability of failure at a given temperature looks like. (We do, or course, need the chains to be converged and adequately sampled, however.)

Here are the usual diagnostics:

Finally, remove burn-in and plot the marginal posteriors:

Assuming that went well, let's visualize the predicted failure probability as a function of temperature. The solid and dashed lines show the posterior-predictive median and and percentile-based 68% credible interval for $P_\mathrm{failure} = 1 - P_\mathrm{success}$ at each temperature.

Does this make curve sense compared with inspection of the data? Any surprises?

Checkpoint: Let's look at the probability of failure at 36 F. This will print the posterior prediction median and CI lower and upper bounds. For comparison, I find approximately $0.83_{-0.25}^{+0.13}$.

3. Censored (but somewhat informed) success temperatures

Imagine we are in a slightly better situation than that shown in the top panel of Figure 6 from the report. Namely, we are given

  1. the temperatures of launches where there were O-ring failures (failure_temps and Nfailure above),
  2. the number of launches with no failures (Nsuccess = len(success_temps)),
  3. a range of temperatures containing the successful launches, but not the precise temperatures of each.

For (3), we'll just use the actual min and max of success_temps, and implementing the prior on unknown temperatures as uniform in this range. In the next section, we'll look at the results with a less informed prior on the success temperatures.

3a. Censored model definition

Work out how to adjust your PGM and expression for the likelihood to reflect our ignorance of the temperatures of successful launches.

Implement the (log)-likelihood for the censored model. Here are some hints/suggestions if you want them:

  1. This doesn't require as dramatic a change to the model as truncation would, more a re-definition of the sampling distribution for the censored points.
  2. A model component that was previously fixed by observation or effectively determined precisely is now indeterminate.
  3. We can marginalize over our newfound ignorance analytically, taking advantage of the fact that the integral of the logistic function is analytic (see Wikipedia).

3b. Censored model fit

We can now carry out the usual steps. Again, the choices made below will probably work, but change them if need be and check the usual diagnostics.

Now let's compare the posterior predictions to the previous result.

Does your censored model manage to make consistent predictions to the model fitted to the complete data? If there are clear differences, do they make sense in light of what information has been hidden? Is there still evidence for a temperature-dependent failure rate?

Checkpoint: Looking at a balmy temperature of 75 F this time, I find a failure probability of approximately $0.18_{-0.08}^{+0.10}$.

4. Censored (less informed) success temperatures

As a point of comparison, let's fit a model in which the temperature range for the censored (success) data is much less well constrained. This is arguably more analogous to what we might do by eye if presented with the first figure in this notebook, knowing that successful launches were absent from the figure, but without the context that those launches has all taken place in warm weather.

In particular, let's take the prior on the success temperatures to be uniform over the range shown in the figure. We followed poor practice by defining success_Tmin and success_Tmax at global scope earlier, and then using them from global scope in ln_like_censored, but it does mean we can just redefine them below and then re-use the likelihood function. If your implementation differs (i.e., was more sensible), you might need to change some more details.

This seems like it will lead to somewhat different posterior predictions. Let's check.

The vertical, dotted line added in this plot marks the ambient temperature of 36 F at the Challenger launch.

Does this more censored model manage to make consistent predictions to the model fitted to the complete data? If there are clear differences, do they make sense in light of what information has been hidden? Is there still evidence for a temperature-dependent failure rate?

Checkpoint: looking now at 60 F, I find a failure probability of approximately $0.32_{-0.10}^{+0.11}$.

5. Draw a conclusion

In light of all your work above, comment on the report's assertion that "Consideration of the entire launch temperature history indicates that the probability of O-ring distress is increased to almost a certainty if the temperature of the joint is less than 65."

6. OPTIONAL: Truncated success temperatures

Modify the model such that the launches with zero O-ring incidents are truncated rather than censored. We no longer know how many such data points there are in the complete data set. If you want, you can still use prior information on the temperature range that incident-free launches happened in (as above), and some vague prior on the total number of launches (say, 25-ish). Run through the analysis in this scenario and compare the results with those above.

Note that emcee cannot readily sample over an integer-value parameter, so the most straightforward implementation would involve marginalizing over the number of truncated data points within the likelihood function.