Tutorial: Credible Intervals/Regions¶
Here we'll compare different conventions for defining "best values" and credible intervals (CI's) to summarize posterior PDFs.
You will
- compare conventions that report the posterior mode and highest density region; median and quantiles of the posterior; and mean and standard deviation of the posterior (the latter being highly discouraged)
- compute the above from samples from a PDF rather than from an analytic function, since in the future samples are usually the only characterization of the posterior that we will have
- use these methods to produce credible intervals (CI's) for a single parameter, and credible regions (CR's) for 2 parameters
In principle, the internal calculations involved in doing these things are not difficult, but we also don't think they're terribly illuminating. On the other hand, you'll need to know how to produce CI's and CR's to summarize the results of pretty much every inference from now on. The upshot is that this tutorial resembles a walkthrough of particular software much more than most.
# !pip install incredible pygtc
from os import getcwd
from yaml import safe_load
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
%matplotlib inline
from pygtc import plotGTC
thisTutorial = 'credible_regions'
if getcwd() == '/content':
# assume we are in Colab, and the user's data directory is linked to their drive/Physics267_data
from google.colab import drive
drive.mount('/content/drive')
datapath = '/content/drive/MyDrive/Physics267_data/' + thisTutorial + '/'
else:
# assume we are running locally somewhere and have the data under ./data/
datapath = 'data/'
Generate samples to analyze¶
Normally getting samples from the posterior would be the most challenging part of the problem, but for now we're focusing on what happens afterwards. For our first example, we'll use the distribution that was the from posterior in the Bayes' Law tutorial, though instead of using those particular parameters again we'll just assign you new ones for variety.
beta_pars = safe_load(open(datapath+'beta.yaml', 'r').read())
beta_pars
Let's generate a good number of samples from this PDF. (Yes, this is a PDF that we know and so we could compute things directly from its functional form, but in general this won't be the case, so we will practice working with samples.)
theta1 = st.beta.rvs(beta_pars['alpha'], beta_pars['beta'], size=100000)
Here's what they look like:
plt.rcParams['figure.figsize'] = (4.0, 3.0)
plt.hist(theta1, bins=100, density=True, histtype='step');
plt.xlabel(r'$\theta_1$');
To give ourselves a funkier second example, we'll also generate samples from a bimodal distribution, made from mixing 2 Gaussians.
theta2 = np.concatenate((st.norm.rvs(loc=-2.0, size=60000), st.norm.rvs(loc=2.0, size=40000)))
plt.hist(theta2, bins=100, density=True, histtype='step');
plt.xlabel(r'$\theta_2$');
You'll find 1D "best values" and CI's for each of these, and afterward turn to the case of a joint 2D CR for parameters $\theta_1$ and $\theta_2$.
In particular, you'll find intervals corresponding to "$1\sigma$" and "$2\sigma$" equivalent probability. Recall that these are $\approx$0.683 and 0.954, or more precisely:
st.chi2.cdf([1.0, 4.0], df=1)
We reiterate that nothing you do below should depend on knowledge of the PDFs that these samples came from. In general, we won't be able to write marginalized posteriors down simply, nor would it be straightforward to compute anything of interest analytically. Instead, we're practicing using samples to characterize the distribution they were generated from.
1D intervals¶
Let's start with the 1D best value and CI for theta1
.
First, we'll find the mode and highest posterior density CI's using the only package we know of that does the latter correctly.
import incredible as cr
With this package, there are actually two steps. The first converts the samples into a histogram, optionally doing some smoothing, and optionally accounting for non-uniform weights (a feature of some samplers we will see later). After that, there is a second function call to find the mode and CI's. This is so that the second function can also be used in cases where we have, e.g., direct evaluations of the posterior on a grid (we would just have to coerce those into the same format as the histogram that gets made from samples).
A simple call of the first function, whist
, produces the following:
h1 = cr.whist(theta1, plot=plt);
The returned object is a simple dictionary holds the abscissae (h1['x']
) and estimated density (h1['density]
) of the PDF.
Note that, by default, this function does some smoothing, specifically using a kernel density estimate method. This is why the blue curve (which is what is actually returned), is smoother than the raw histogram shown in the background. This means that the mode of the blue curve may not be identical to the $x$ value where the the raw histogram is highest. It's often true that the mode of an unsmoothed histogram will be thrown off by monte carlo noise (this is one reason for smoothing in the first place). In general, one should smooth until noisy features that you're pretty sure shouldn't be there are gone, and no more than that - assuming you're confident enough to make that call (if not, you may need to get more samples). Hence, this is a plot that one would want to look at with real eyeballs, if only briefly, to ensure that results based on the smoothed or unsmoothed histogram are reasonable.
There may be times when you want to not smooth, or to have finer control over the smoothing (e.g., if you believe the identified mode is still off due to remaining noise). See the docstring for details, but for example one could instead do no smoothing with a specified number of bins:
h1_bin20 = cr.whist(theta1, plot=plt, bins=20, smooth=False);
or smoothing on a fixed scale with more bins
cr.whist(theta1, plot=plt, bins=200, smooth=1);
Next, assuming this looks reasonable (we'll use the first call to whist
above), we feed the output into whist_ci
:
ci1 = cr.whist_ci(h1, plot=plt);
This plot should look familiar from the notes. It shows the mode and the 1 and 2 sigma intervals, whose bounding probabilities are indicated by the horizontal lines. Here you should look at the intersection of the blue/solid and green/dashed lines with the black/solid PDF - if they aren't all coming together in the same place, then the underlying histogram needs to be binned more finely. For example, the following with bin=20
will probably be too chunky.
ci1_bin20 = cr.whist_ci(h1_bin20, plot=plt);
The return value contains a number of useful (and some redundant) values. Of most interest are the mode
and interval min
and max
values. low
and high
are just the distance of min
and max
from the mode. Finally, center
is the average of min
and max
, and width
is half the difference between min
and max
. These are convenient for cases where the CI is symmetric about the mode (at the precision we would be reporting them), or if we think the mode has been thrown off center to due insufficient smoothing but the interval really should be symmetric.
ci1
For completeness, level
contains the probability defining the interval that each entry belongs to, prob
is the probability contained within each entry (these are identical when an interval is simply connected; we'll see the distinction below), and density
is the height of the green lines in the figure above.
In text, we would normally report the constraint as $\mathrm{mode}^{+\mathrm{high}}_{-\mathrm{low}}$, or $\mathrm{center}\pm\mathrm{width}$ if the CI is symmetric.
Next, compute the alternative "best fit" and CI's using the median/percentiles method from the notes. Store the median as a scalar and the intervals as shape (2,) arrays holding the endpoints of each interval.
# med1 = ...
# quant1_1 = ...
# quant1_2 = ...
# YOUR CODE HERE
raise NotImplementedError()
print(med1, quant1_1, quant1_2)
If you have any doubts, you can check these results against the median
and ppf
functions of st.beta
. They won't be identical, since you're working with a finite set of samples, but they should be very close (better than 0.01). This won't catch you if you're plugging the wrong probabilities into both functions, so be careful with that.
Even though their use is not recommended, we'll also compare to the mean/standard deviation. The naively defined 1 and 2 $\sigma$ intervals would be mea1 +/- std1
and mea1 +/- 2*std1
.
mea1 = np.mean(theta1)
std1 = np.std(theta1)
# print the intervals in the same format as above, for ease of comparison
print(mea1, [mea1-std1, mea1+std1], [mea1-2*std1, mea1+2*std1])
Here we'll show the 1 and 2 $\sigma$ median/quantile (orange) and mean/standard deviation (green) summaries over the plot from earlier (at arbitrary y-axis values). You can see that none of them quite agree with one another, and that the mean/standard deviation $2\sigma$ interval includes negative values where the PDF is zero!
ci1 = cr.whist_ci(h1, plot=plt, plot_levels=False);
plt.plot(quant1_2, [2.,2.], color='C1')
plt.plot(quant1_1, [2.05,2.05], color='C1')
plt.plot(med1, 2.05, 'o', color='C1', label='median/quantiles')
plt.plot(mea1, 2.55, 'o', color='C2',label='mean/stdev')
plt.plot([mea1-std1, mea1+std1], [2.55,2.55], color='C2')
plt.plot([mea1-2*std1, mea1+2*std1], [2.5,2.5], color='C2');
plt.legend();
Go through the same procedure for theta2
next.
h2 = cr.whist(theta2, plot=plt);
# med2 = ...
# quant2_1 = ...
# quant2_2 = ...
# YOUR CODE HERE
raise NotImplementedError()
print(med2, quant2_1, quant2_2)
mea2 = np.mean(theta2)
std2 = np.std(theta2)
print(mea2, [mea2-std2, mea2+std2], [mea2-2*std2, mea2+2*std2])
ci2 = cr.whist_ci(h2, plot=plt, plot_levels=False);
plt.plot(quant2_2, [0.2,0.2], color='C1')
plt.plot(quant2_1, [0.205,0.205], color='C1')
plt.plot(med2, 0.205, 'o', color='C1', label='median/quantiles')
plt.plot(mea2, 0.155, 'o', color='C2', label='mean/stdev')
plt.plot([mea2-std2, mea2+std2], [0.155,0.155], color='C2')
plt.plot([mea2-2*std2, mea2+2*std2], [0.15,0.15], color='C2')
plt.legend();
ci2
Interesting... this is an example of a multimodal PDF where the $1\sigma$ highest-density CI is multiply connected (i.e. not contiguous). Note that ci2
contains two entries for level
0.683 to reflect this. The corresponding values in prob
show the integrated probability in each of them, if that's useful.
How would one summarize the $1\sigma$ highest density CI in text? The short answer is: honestly, and that means using more characters than $x \pm \sigma_x$. One could simply say that the CI consists of two disjoint intervals, and what those intervals are. At the same time, the existeance of a multiply connected CI for one parameter suggests that there may be similar structure in the marginalized posteriors of other parameters, or pairs of parameters, so we'd want to carefully look at all of them to see if that's the case. And, of course, in a real situation this structure might actually tell us something interesting about the model that should be commented on.
2D regions¶
Next we have the case of 2D credible regions (CR's), where we'll take theta1
and theta1
to be samples of two parameters from the same model.
To find the HPD regions, we have analogous functions to the ones used above. However, whist2d
does not have a KDE smoothing option, so in general you will want to manually adjust the number of bins (in each parameter) and the size of the smoothing kernel (in units of bins). For example, compare the following:
h3 = cr.whist2d(theta1, theta2, bins=25, smooth=0, plot=True);
h3 = cr.whist2d(theta1, theta2, bins=50, smooth=1, plot=True);
The return value of this function is a dictionary with x
, y
and z
entries, corresponding to the two parameter values at the bin centers, and the density in each bin. These have dimensionality 1, 1, and 2, respectively.
To find the CR's, we call whist2d_ci
.
ci3 = cr.whist2d_ci(h3, plot=plt);
This returns the mode (blue point above) and the list of points making up the contours in the above plot in a slightly complicated dictionary that you can examine if you're interested.
One should test whether the amount of smoothing used is too much. (This is important in the 1D case also, but the KDE method with default settings is more likely to undersmooth.) Overlaying smoothed and less smoothed contours is a useful way of doing this.
h3a = cr.whist2d(theta1, theta2, bins=100, smooth=0);
ci3a = cr.whist2d_ci(h3a, mode_fmt=None, contour_color='r', plot=plt);
h3 = cr.whist2d(theta1, theta2, bins=50, smooth=1);
ci3 = cr.whist2d_ci(h3, mode_fmt=None, plot=plt);
In this comparison, the more smoothed contours should look like... a smoothed version of the squiggly ones, as opposed to being much broader.
For contrast, here is what that comparison would look like if we used less binning and more smoothing.
# this is how you plot contours from whist2d_ci without having to repeat the calculations
cr.ci2D_plot(ci3a, plt, Line2D_kwargs={'color':'r'});
h3b = cr.whist2d(theta1, theta2, bins=25, smooth=2); # too much smoothing!
ci3b = cr.whist2d_ci(h3b, mode_fmt=None, plot=plt);
There is no 2D analog of the median/quantile convention for CI's, but we can compare with the equivalent (still not recommended) regions defined via the parameter means and covariance.
cr.ci2D_plot(ci3, plt);
cv = np.cov(theta1, theta2)
cr.cov_ellipse(cv, center=[mea1,mea2], level=0.68268949, plot=plt, fmt='r--');
cr.cov_ellipse(cv, center=[mea1,mea2], level=0.95449974, plot=plt, fmt='r--');
Note that the stand-alone contour plotting function ci2D_plot
, adds a single contour level at a time, so it was called twice.
Options to the various lower-level matplotlib
functions that are called can be used to customize things, e.g.:
# plotting contours for different levels with different colors is clunky currently,
# requiring a separate call for each (specified by `which`)
cr.ci2D_plot(ci3, plt, which=[1], fill=True, fill_kwargs={'color':str(0.954)});
cr.ci2D_plot(ci3, plt, which=[0], fill=True, fill_kwargs={'color':str(0.683)});
An increasingly common way to visualize multiple 1D and 2D marginalized PDFs from a high-dimensional posterior distribution is the so-called "triangle plot" that has the 1D PDFs on its diagonal and each corresponding 2D PDF off the diagonal, in a triangular matrix of plots. For example, this call finds all of the CI's and CR's.
tri = cr.whist_triangle(np.array([theta1, theta2]).T, bins=50, smooth2D=1);
(Under the hood, tri
is a list-of-lists in lower-trianglular matrix format, holding the various histogram and contour outputs we looked at above, so its contents can be manually filled in/replaced if you are tweaking everything to be beautiful.)
We can display the triangle thusly:
cr.whist_triangle_plot(tri, paramNames=[r'$x$', r'$y$']);
Again, lots of options can be customized, and other things can be added to any of the panels via the returned array of matplotlib
axes. For example,
f, a = cr.whist_triangle_plot(tri, paramNames=[r'$x$', r'$y$'], linecolor1D='b',
fillcolors2D=[(0.026, 0.818, 1.000), (0.714, 0.936, 1.000)]);
cr.cov_ellipse(cv, center=[mea1,mea2], level=0.68268949, plot=a[1][0], fmt='r--');
cr.cov_ellipse(cv, center=[mea1,mea2], level=0.95449974, plot=a[1][0], fmt='r--');
Alternative packages¶
There are other packages that can be better for getting a quick look at plots like those above, in particular pygtc
and corner
. We'll use these in later tutorials. They have the advantage that you get the whole triangle plot from a single function call. The downside is that all the calculations are redone each time you call the function, and it isn't so straightforward to do the sanity checks mentioned above. So we recommend using one of those packages only for quickly visualizing samples from a PDF. In contrast, with incredible
it's straightforward to do the hard calculations once, and then tweak the display to your heart's content, so it might be more suitable for publication graphics (in addition to providing the CI calculations). To summarize the options as of this writing:
pygtc
: looks nicer thancorner
; can put multiple posteriors on the same plotcorner
: arguably a little easier to use thanpygtc
, but uglier; downgraded because the default 2D probability levels shown are not the conventional onesincredible
: provides fine-tuned control over CR calculations, and ability to save results rather than just a plot, but needs some hand holdingarviz
: a relatively new addition that looks like it may become the one-stop-shop for CRs and MCMC diagnostics eventually. Steeper learning curve, but worth a look
For comparison with the plots above, here is an example of a basic call to pygtc
.
plotGTC(np.array([theta1, theta2]).T, paramNames=[r'$x$', r'$y$'],
figureSize=5, customLabelFont={'size':12}, customTickFont={'size':12}, customLegendFont={'size':16});