Tutorial: Credible Intervals/Regions

Here we'll compare different conventions for defining "best values" and credible intervals (CI's) to summarize posterior PDFs.

You will

In principle, the calculations involved in doing these things are not difficult, but we also don't think they're terribly illuminating. On the other hand, you'll need to know how to produce CI's and CR's to summarize the results of pretty much every inference from now on. The upshot is that this tutorial resembles a walkthrough of particular software much more than most.

First, we define two concrete test cases to work with. Per the goals for this notebook, these will not be in the form of posterior PDF functions, but rather lists of samples from posterior PDFs. (If you do find yourself wanting to compute CI's from a posterior function, you could always evaluate the function over a grid, and coerce those results into the form of the 1D or 2D histogram objects used below, then proceed in the same way.)

We'll find 1D "best values" and CI's for each of these, and afterward turn to the case of a joint CR for parameters $x$ and $y$, where x=test1D_1 and y=test1D_2.

We'll see histograms of these samples below, but in brief test1D_1 is a skewed but single-peaked distribution, while test1D_2 is a bimodal mixture of 2 Gaussians.

We'll find intervals corresponding to "$1\sigma$" and "$2\sigma$" equivalent probability. Recall that these are $\approx$0.683 and 0.954, or more precisely:

1D intervals

Let's start with the 1D best value and CI for test_1D_1.

First, we'll find the mode and highest posterior density CI's using the only package we know of that does the latter correctly.

With this package, there are actually two steps. The first converts the samples into a histogram, optionally doing some smoothing, and optionally accounting for non-uniform weights (a feature of some samplers we will see later). After that, there is a second function call to find the mode and CI's. This is so that the second function can also be used in cases where we have, e.g., direct evaluations of the posterior on a grid (we would just have to coerce those into the same format as the histogram that gets made from samples).

A simple call of the first function, whist, produces the following:

Note that, by default, this function does some smoothing, specifically using a kernel density estimate method. This is why the blue curve (which is what is actually returned), is smoother than the raw histogram shown in the background. Note that the mode of the blue curve may not be identical to the $x$ value where the the raw histogram is highest. It's often true that the mode of an unsmoothed histogram will be thrown off by monte carlo noise (this is the reason for smoothing in the first place). In general, one should smooth until noisy features that you're pretty sure shouldn't be there are gone, and no more than that - assuming you're confident enough to make that call (if not, you may need to get more samples). Hence, this is a plot that one would want to look at with real eyeballs, if only briefly, to ensure that results based on the smoothed or unsmoothed histogram are reasonable.

There may be times when you want to not smooth, or to have finer control over the smoothing (e.g., if you believe the identified mode is still off due to remaining noise). See the docstring for details, but for example one could instead do no smoothing with a set number of bins

or smoothing on a fixed scale with more bins

For reference, take a look at the returned object, a dictionary that straightforwardly holds the abscissae and estimated density of the PDF.

Next, assuming this looks reasonable (we'll use the first call to whist above), we feed the output into whist_ci:

This plot should look familiar from the notes. It shows the mode and the 1 and 2 sigma intervals, whose bounding probabilities are indicated by the horizontal lines. Here you should look at the intersection of the blue/solid and green/dashed lines with the black/solid PDF - if they aren't all coming together in the same place, then the underlying histogram needs to be binned more finely. (Try this with the bin=20 version above for comparison.)

The return value contains a number of useful (and some redundant) values. Of most interest are the mode and interval min and max values. low and high are just the distance of min and max from the mode. Finally, center is the average of min and max, and width is half the difference between min and max. These are convenient for cases where the CI is symmetric about the mode (at the precision we would be reporting them).

For completeness, level contains the probability defining the interval that each entry belongs to, prob is the probability contained within each entry (these are identical when an interval is simply connected; we'll see the distinction below), and density is the height of the green lines in the figure above.

In text, we would normally report the constraint as $\mathrm{mode}^{+\mathrm{high}}_{-\mathrm{low}}$, or $\mathrm{center}\pm\mathrm{width}$ if the CI is symmetric.

Next, compute the alternative "best fit" and CI's using the median/quantiles method from the notes. Store the median as a scalar and the intervals as shape (2,) arrays holding the endpoints of each interval.

Even though their use is not recommended, we'll also compare to the mean/standard deviation. The naively defined 1 and 2 $\sigma$ intervals would be mea1 +/- std1 and mea1 +/- 2*std1.

Here we'll show the 1 and 2 $\sigma$ median/quantile (orange) and mean/standard deviation (green) summaries over the plot from earlier (at arbitrary y-axis values). You can see that none of them quite agree with one another, and that the mean/standard deviation $2\sigma$ interval includes negative values where the PDF is zero!

Go through the same procedure for test1D_2 next.

Interesting... this is an example of a multimodal PDF where the $1\sigma$ highest-density CI is multiply connected (i.e. not contiguous). Note that ci2 contains two entries for level 0.683 to reflect this. The corresponding values in prob show the integrated probability in each of them, if that's useful.

How would one summarize the $1\sigma$ highest density CI in text? The short answer is: honestly, and that means using more characters than $x \pm \sigma_x$. One could simply say that the CI consists of two disjoint intervals, and what those intervals are. At the same time, the existeance of a multiply connected CI for one parameter suggests that there may be similar structure in the marginalized posteriors of other parameters, or pairs of parameters, so we'd want to carefully look at all of them to see if that's the case. And, of course, in a real situation this structure might actually tell us something interesting about the model that should be commented on.

2D regions

Next we have the case of 2D credible regions (CR's), where we'll take test1D_1 and test1D_2 to be samples of two parameters from the same model.

To find the HPD regions, we have analogous functions to the ones used above. However, whist2d does not have a KDE smoothing option, so in general you will want to manually adjust the number of bins (in each parameter) and the size of the smoothing kernel (in units of bins). For example, compare the following:

The return value of this function is a dictionary with x, y and z entries, corresponding to the two parameter values at the bin centers, and the density in each bin. These have dimensionality 1, 1, and 2, respectively.

To find the CR's, we call whist2d_ci.

This returns the mode and the list of points making up the contours in the above plot in a slightly complicated structure that you can examine if you're interested.

One should test whether the amount of smoothing used is too much. (This is important in the 1D case also, but the KDE method is more likely to undersmooth.) Overlaying smoothed and less smoothed contours is a useful way of doing this.

In this comparison, the more smoothed contours should look like... a smoothed version of the squiggly ones, as opposed to being much broader.

Note that the function that extracts the contours in whist2d_ci (distinct from the one that plots them) is not foolproof - it seems especially likely to screw up if there are too many complicated squiggles. So it is (a) extra-important to do this check, and (b) best to use whist2d for the check rather than the function (below) that simply plots contours once you've found them. Once you know what the contours are supposed to look like, you can make sure nothing pathalogical has happened when they get re-plotted (as below). If you do find yourself looking at weirdly blocky contours when using ci2D_plot, when everything looked fine with whist2d_ci, there isn't currently a great solution; however, the pathology seems to be rare enough that e.g. throwing away a few samples might be enough to avoid it.

There is no 2D analog of the median/quantile convention for CI's, but we can compare with the equivalent (still not recommended) regions defined via the parameter means and covariance.

Note that the stand-alone contour plotting function ci2D_plot, adds a single contour level at a time, so it was called twice.

Options to the various lower-level matplotlib functions that are called can be used to customize things, e.g.:

An increasingly common way to visualize multiple 1D and 2D marginalized PDFs from a high-dimensional posterior distribution is the so-called "triangle plot" that has the 1D PDFs on its diagonal and each corresponding 2D PDF off the diagonal, in a triangular matrix of plots. For example, this call finds all of the CI's and CR's (note that it has to pop up a contour plot due to the way whist2d_ci works internally).

We can display the triangle thusly:

Again, lots of options can be customized, and other things can be added to any of the panels via the returned array of matplotlib axes. For example,

Alternative packages

There are other packages that can be better for getting a quick look at plots like those above, in particular corner and pygtc. We'll use both of these in later tutorials. They have the advantage that you get the whole triangle plot from a single function call. The downside is that all the calculations are redone each time you call the function, and it isn't so straightforward to do the sanity checks mentioned above. So we recommend using one of those packages only for quickly visualizing samples from a PDF. In contrast, with incredible it's straightforward to do the hard calculations once, and then tweak the display to your heart's content, so it might be more suitable for publication graphics (in addition to providing the CI calculations). To summarize the options as of this writing: