In this notebook, we will use Metropolis sampling to solve the inference problem that we previously struggled to carry out on a grid. Have a look back at that tutorial if you need a refresher on the general setup. You will:
pygtc
to make a triangle plotTutorialName = 'toy_metro'
exec(open('tbc.py').read()) # define TBC and TBC_above
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
%matplotlib inline
from pygtc import plotGTC
import incredible as cr
We'll begin by reading in the mock data set you worked with the grid-based tutorial, and storing it in an Image
object, as we did before.
mock_image = np.loadtxt('data/toy_photometry.dat')
class Image:
def __init__(self, image):
self.im = image
self.imx,self.imy = np.meshgrid(range(image.shape[1]), range(image.shape[0]))
data = Image(mock_image)
plt.imshow(data.im, origin='lower');
We'll still want to refer to the truth model used to generate the above data:
paramnames = ['x0', 'y0', 'mu0', 'sigma'] # the canonical parameter order, because I said so
truth = {'x0':17.1, 'y0':12.75, 'mu0':50.0, 'sigma':3.0}
This time we're going to do a fit using the Metropolis algorithm. Unlike the case of Gibbs sampling, this doesn't require (conversely: doesn't get to directly exploit) any nice mathematical properties of the posterior, so you can recycle the log_prior
and log_likelihood
functions from your grid-based analysis.
def log_prior(x0, y0, mu0, sigma):
TBC()
def mean_img(x0, y0, mu0, sigma):
'''
Return an array with the expected mean counts in each pixel for these model parameters
'''
TBC()
def log_likelihood(data, **params):
'''
`data` is an Image object, as defined above
'''
mu = mean_img(**params)
TBC()
TBC_above()
And here is the free log-posterior function again:
def log_posterior(data, **params):
lnp = log_prior(**params)
if np.isfinite(lnp):
lnp += log_likelihood(data, **params)
return lnp
As always, let's check that they return finite values without obvious bugs.
print( log_prior(**truth) )
print( log_likelihood(data, **truth) )
print( log_posterior(data, **truth) )
Next, we need a proposal distribution. I'll use a multivariate Gaussian centered on the current position. This is translationally invariant, so later on we can use the simple Metropolis acceptance rule instead of the slightly more complex Metropolis-Hastings rule. A Gaussian isn't necessarily the best choice in general, since the most likely proposals are very small steps, but it will do for the moment.
To further keep it simple, let's make the proposal independent in each parameter (a diagonal covariance matrix for the 4-dimensional Gaussian). Similarly to the grid method, you'll want to guess the appropriate order of magnitude for steps in each parameter, which is the same order as the width of the posterior, and you may need to return to this point to adjust them after seeing the performance.
Since we're assuming a diagonal covariance, let's go ahead and just represent the proposal distribution as 4 univariate Gaussians, as in the dictionary below.
Aside: you may not have seen it before in this class, but calling scipy.norm()
as below produces what they call a "frozen" probability distribution object, with fixed parameters. The standard deviation is whatever you specify via the scale
argument, and the (unspecified) mean would remain at the default value of 0.0. This means that each entry should be interpreted as a distribution for the displacement of a proposal from the current position of a chain. In other words, the proposal distributon density for a change to $x_0$ could be computed with proposal_distribution['x0'].pdf(x0_proposed - x0_current)
. Other methods of scipy
probability distributions, in particular random number generation, are also available; the main difference from how we've used them before is that we can't/don't need to specify parameter values in the call because they've already been frozen in.
TBC()
# proposal_distribution = {'x0':st.norm(scale=...) ,
# 'y0':st.norm(scale=...),
# 'mu0':st.norm(scale=...),
# 'sigma':st.norm(scale=...)}
Next, define a function that returns a proposed point in parameter space, given the current location and the above dictionary of proposal distributions.
Technical note: You might be tempted to begin this function with a command like proposal = current_params
. If so, remember that, in Python, b = a
does not make a copy of a
if a
is a dictionary (or a numpy
array for that matter). Both b
and a
would point to the same data in memory. The safest/quickest way to get a new dictionary with the same structure as a
whose values can then be safely overwritten is with b = a.copy()
.
def propose(current_params, dists):
"""
current_params: dictionary holding current position in parameter space
dists: dictionary of proposal distributions
Return value: a new dictionary holding the proposed destination in parameter space
"""
TBC()
TBC_above()
See if it works:
print('Test starting position:', truth)
print('Test proposal:', propose(truth, proposal_distribution))
Finally, the sampler itself. Write a function that takes the current parameter values and log-posterior value as input (along with the data and proposal distributions), and returns the next set of parameters values and corresponding log-posterior as a tuple. These can be identical to the inputs, if the proposal is rejected.
def step(data, current_params, current_lnP, proposal_dists):
"""
data: Image object
current_params: dictionary of parameter values
current_lnP: log-posterior density corresponding to current_params
proposal_dists: dictionary of proposal distributions
Return value: a tuple holding the next parameter dictionary and corresponding log-posterior density
"""
TBC()
# trial_params = ...
# trial_lnP = ...
# if [accept/reject condition]:
# return (trial_params, trial_lnP)
# else:
# return (current_params, current_lnP)
TBC_above()
And, again, make sure it works without crashing:
# try is several times to get a mixture of acceptances and rejections, hopefully
for i in range(10):
print(step(data, truth, log_posterior(data, **truth), proposal_distribution))
There's fundamentally no functional difference between fitting the 2-parameter or 4-parameter models with this method. But, for symmetry with the other notebooks, let's start with the 2-parameter fit (with $x_0$ and $y_0$ fixed to the truth) anyway.
Given the work you've done above, a quick and dirty way to accomplish this is to use a delta-function proposal distribution for $x_0$ and $y_0$, ensuring that they never change in value. That way we don't need to change any of the existing code to explicitly account for some parameters being fixed.
proposal_distribution2 = proposal_distribution.copy()
proposal_distribution2['x0'] = st.norm(scale=0.0)
proposal_distribution2['y0'] = st.norm(scale=0.0)
Now let's run a chain, starting the free parameters, $\mu_0$ and $\sigma$, from random but broadly reasonable values.
params = truth.copy() # to get x0 and y0
params ['mu0'] = st.uniform.rvs()*100.0
params['sigma'] = st.uniform.rvs()*9.9 + 0.1
Make sure you understand how the cell below is making use of the functions you defined above.
%%time
nsamples = 10000
samples2 = np.zeros((nsamples, 2))
current_lnP = log_posterior(data, **params)
for i in range(samples2.shape[0]):
params,current_lnP = step(data, params, current_lnP, proposal_distribution2)
samples2[i,:] = [params[k] for k in ['mu0', 'sigma']]
Let's do the most basic (yet still extremely important) visual check to see how our sampler performed, looking at traces of the Markov chain for each parameter. (It's ok if you haven't read the notes on MCMC Diagnostics yet; we will go more in-depth later.) These trace plots show the value of each parameter as a function of iteration, and we'll add a line showing the value that was used to create the mock data.
param_labels = [r'$x_0$', r'$y_0$', r'$\mu_0$', r'$\sigma$']
plt.rcParams['figure.figsize'] = (16.0, 6.0)
fig, ax = plt.subplots(2,1);
cr.plot_traces(samples2, ax, labels=[r'$\mu_0$', r'$\sigma$'], truths=[truth['mu0'], truth['sigma']])
Exactly what you see here (in particular, how correlated each sequence appears and how many repeated values there are) will depend on the width of your proposal distributions. But, hopefully, you can see the sequence finding its way to a part of parameter space that it likes, and then continuing to jump around there. If you want to, you can go back and adjust the proposal scales to be smaller (if there are many rejected proposals) or larger (if the accepted steps are tiny compared with the long-term variation).
We can use plotGTC
to quickly visualize the posterior. This package shows us all the 1D marginalized posteriors and every pair of 2D marginalized posteriors (as a contour plot), after some smoothing, in a triangular grid. However, before doing so, we want to remove the "burn-in" period during which the chain is finding its way to the region of high posterior probability. I set this to be the initial 1000 steps below, but you might need to adjust it for your results (though if your burn-in phase is longer than this, it might be worth re-running with a smaller proposal scale).
Dashed lines show the truth values the data were simulated from.
burnin = 1000 # adjust if needed
samples2 = samples2[burnin:,:]
plotGTC(samples2, paramNames=[r'$\mu_0$', r'$\sigma$'], truths=[truth['mu0'], truth['sigma']],
figureSize=5, customLabelFont={'size':12}, customTickFont={'size':12});
Checkpoint: The cell below will compare your samples with some we have generated. They won't be identical, but should be extremely close if you've used the priors and data specified above.
ours = np.loadtxt('solutions/metro.dat.gz')
plotGTC([samples2, ours], paramNames=[r'$\mu_0$', r'$\sigma$'], chainLabels=['yours', 'ours'],
figureSize=5, customLabelFont={'size':12}, customTickFont={'size':12}, customLegendFont={'size':16});
In order to compare to the corresponding grid analysis, let's compute credible intervals with the same code:
plt.rcParams['figure.figsize'] = (12.0, 4.0)
fig, ax = plt.subplots(1,2)
marg1d = [cr.whist(samples2[:,i]) for i in range(samples2.shape[1])]
ci1d = [cr.whist_ci(marg, plot=axes) for marg,axes in zip(marg1d,ax)]
for i in range(2):
ax[i].set_xlabel(param_labels[i+2])
ax[i].axvline(truth[paramnames[i+2]], color='C1', label='truth')
ax[0].legend();
ci1d
Compare the center
and width
summaries of each parameter with what you got from the grid analysis. Do you have the same results to the precision that one would normally report?
TBC() # answer in Markdown
Let's now fit for the source position also ($x_0$ and $y_0$). All we really need to change compared with the last section is to use the original proposal distribution dictionary, which contained non-delta-function proposals for all the parameters. We'll again start at a random but broadly reasonable point in parameter space.
params = {'mu0':st.uniform.rvs()*100.0,
'sigma':st.uniform.rvs()*9.9 + 0.1,
'x0':st.uniform.rvs()*32.0,
'y0':st.uniform.rvs()*32.0}
%%time
nsamples = 10000
samples4 = np.zeros((nsamples, 4))
current_lnP = log_posterior(data, **params)
for i in range(samples4.shape[0]):
params,current_lnP = step(data, params, current_lnP, proposal_distribution)
samples4[i,:] = [params[k] for k in paramnames]
Let's have a look at the traces:
plt.rcParams['figure.figsize'] = (16.0, 12.0)
fig, ax = plt.subplots(4,1);
cr.plot_traces(samples4, ax, labels=param_labels, truths=[truth[k] for k in paramnames])
Again, remove the clear burn-in phase and plot the triangle:
burnin = 1000 # adjust if needed
samples4 = samples4[burnin:,:]
plotGTC(samples4, paramNames=param_labels, truths=[truth[k] for k in paramnames],
figureSize=8, customLabelFont={'size':12}, customTickFont={'size':12});
Checkpoint: Once again, you can compare these to our own solutions; the posteriors should look very similar.
ours = np.loadtxt('solutions/metro4.dat.gz')
plotGTC([samples4, ours], paramNames=param_labels, chainLabels=['yours', 'ours'],
figureSize=8, customLabelFont={'size':12}, customTickFont={'size':12}, customLegendFont={'size':16});
Let's find the credible intervals, for completeness. We won't bother with the goodness of fit, since the grid analysis did address that.
plt.rcParams['figure.figsize'] = (20.0, 4.0)
fig, ax = plt.subplots(1,len(paramnames))
marg1d = [cr.whist(samples4[:,i]) for i in range(samples4.shape[1])]
ci1d = [cr.whist_ci(marg, plot=axes) for marg,axes in zip(marg1d,ax)]
for i in range(len(paramnames)):
ax[i].set_xlabel(param_labels[i])
ax[i].axvline(truth[paramnames[i]], color='C1', label='truth')
ax[0].legend();
Any qualitative observations on how the marginalized posteriors compare to their counterparts from the course grid analysis?
TBC() # answer in Markdown
ci1d # for posterity
For later notebooks, we'll want to see how multiple, independent chains with different starting points behave when using this method. The cell below will take care of running 4 chains, started at different positions.
%%time
chains = [np.zeros((10000,4)) for j in range(4)]
for samples in chains:
params = {'mu0':st.uniform.rvs()*100.0,
'sigma':st.uniform.rvs()*9.9 + 0.1,
'x0':st.uniform.rvs()*32.0,
'y0':st.uniform.rvs()*32.0}
current_lnP = log_posterior(data, **params)
for i in range(samples.shape[0]):
params,current_lnP = step(data, params, current_lnP, proposal_distribution)
samples[i,:] = [params[k] for k in paramnames]
Now we can look at a more colorful version of the trace plots, showing all of the chains simultaneously:
plt.rcParams['figure.figsize'] = (16.0, 12.0)
fig, ax = plt.subplots(len(param_labels), 1);
cr.plot_traces(chains, ax, labels=param_labels, Line2D_kwargs={'markersize':1.0},
truths=[truth[k] for k in paramnames], truth_kwargs={'color':'k', 'linestyle':'--'})
We'll save them for later, without removing burn-in.
for i,samples in enumerate(chains):
np.savetxt('saved/toy_metro_chain_'+str(i)+'.txt.gz', samples, header=' '.join(paramnames))
There you have it - at the cost of some random walking, you have now fit a model with enough free parameters to make a grid-based solution uncomfortably slow. Metropolis and Metropolis-Hastings are extremely widely applicable and therefore powerful, though one needs to think about how to provide an efficient proposal distribution. We'll see the consequences of not doing so in MCMC Diagnostics and some intelligent proposal strategies in More Sampling Methods.
If you have already done the Gibbs Sampling tutorial, comment on the differences you see between the chains that the two methods produce.
# answer in Markdown