Here we will make use of the data introduced in the XMM Image Intro notebook. Many of our early examples of generative models, as well as the Bayes on a Grid and Gibbs/Metropolis tutorials, were based around the task of performing photometry on image data, specifically with images sparse enough that the number of photon detections per pixel is typically small. This is exactly the regime we are in with the XMM data. Specifically, you will
It might sound like there's not much new here, and that's somewhat true. This tutorial is more about putting into practice what you've learned with less scaffolding than we normally provide. It's also the case that this data set is not entirely trivial to wrestle with (and that's with us already cutting some corners, as usual). Plus, we thought it was silly to have a course in statistical methods in astrophysics that didn't model and fit an image of the sky sooner or later.
TutorialName = 'xmm2'
exec(open('tbc.py').read()) # define TBC and TBC_above
import astropy.io.fits as pyfits
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from astropy.visualization import LogStretch
logstretch = LogStretch()
import scipy.stats as st
from scipy.optimize import minimize
The class below is from the XMM Image Intro notebook, with one addition: a mask
array which holds a True
or False
for each pixel, telling us whether we should pay attention to that pixel in the likelihood. This allows us to, by default, ignore the pixels where the exposure map is zero. We'll also use it to ignore most point-like sources.
class Image:
def __init__(self, imdata, exdata, imx=None, imy=None):
self.im = imdata
self.ex = exdata
self.mask = (self.ex > 0.0)
if imx is None or imy is None:
# Note the swapped order of shape[1] and shape[0]. This is the only time we should need to remember this!
self.imx, self.imy = np.meshgrid(np.arange(imdata.shape[1]), np.arange(imdata.shape[0]))
else:
self.imx = imx
self.imy = imy
def cutout(self, x0, x1, y0, y1):
# return the subset of self bounded by pixel indices x0, x1, y0 and y1 (inclusive)
return Image(self.im[y0:y1+1,x0:x1+1], self.ex[y0:y1+1,x0:x1+1],
self.imx[y0:y1+1,x0:x1+1], self.imy[y0:y1+1,x0:x1+1])
def extent(self):
return [np.min(self.imx), np.max(self.imx), np.min(self.imy), np.max(self.imy)]
def display(self, log_image=True):
fig, ax = plt.subplots(1,2);
extent = self.extent()
if log_image:
ax[0].imshow(logstretch(self.im*self.mask), cmap='gray', origin='lower', extent=extent);
ax[0].set_title('image (log scale)');
else:
ax[0].imshow(self.im*self.mask, cmap='gray', origin='lower', extent=extent);
ax[0].set_title('image');
ax[1].imshow(self.ex*self.mask, cmap='gray', origin='lower', extent=extent);
ax[1].set_title('exposure map');
As before, we'll read in and display the data...
datadir = 'data/'
imagefile = datadir + 'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = datadir + 'P0098010101M2U009EXPMAP3000.FTZ'
imfits = pyfits.open(imagefile)
exfits = pyfits.open(expmapfile)
im = imfits[0].data
ex = exfits[0].data
orig = Image(im, ex)
plt.rcParams['figure.figsize'] = (10.0, 10.0)
orig.display()
... and crop the image slightly to better isolate the actual data.
data = orig.cutout(100, 580, 100, 580)
data.display()
As discussed in the Data Intro, the image contains a cluster of galaxies and a number of point-like Active Galactic Nuclei (AGN), most of which are relatively faint and may not be obvious in the images above. In some previous year, we made a by-eye list of pixel $x,y$ positions and radii (also in pixels) for the AGN, where the radii are supposed to encompass the regions where the AGN brightness clearly exceeds the uniform background. As you might guess, there are more sophisticated ways of doing this, but this works pretty well as an approach to removing bright point-like sources (provided you're willing to spend the human time making such a list).
pts = np.loadtxt(datadir+'P0098010101M2U009_ptsrc.txt')
pts
As we'll discuss more in the Model section below, we'll want to mask (set data.mask[i,j]=False
) all of these AGN except for the brightest one, which not coincidentally has the largest "radius" in the table above. Do so.
TBC() # set data.mask=False within all of the AGN masks but the one with the largest radius
Let's check that this did what we want by re-displaying the data; pixels where data.mask==False
are set to zero in the displayed images.
data.display()
You might be wondering why we're choosing to leave one AGN unmasked here. The idea is that, since we know the AGN is tony compared with the telescope's PSF, it could provide us with a way to measure the size of the PSF from the image itself. In real life we would never do this - XMM is a space telescope that doesn't look through the atmosphere, so its PSF is plenty stable and can be well determined from observations of even brighter AGN. On the other hand, there are times when an AGN is projected onto an extended source of interest, and we want to model them both rather than throwing away a chunk of the target. So the problem we'll be solving is, at least, illustrative.
To recap, our model needs to explain
We'll address each of these in turn.
We'll use a common parametric model for the surface brightness (counts per unit time per pixel) of galaxy clusters: the azimuthally symmetric beta model,
$S_\mathrm{CL}(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2}$,
where $r$ is projected distance from the cluster center. Note that this model describes a 2D surface brightness distribution, since $r^2 = (\Delta x)^2 + (\Delta y)^2$. Also note that the pixel spacingis a perfectly respectable units of projected distance for our purposes.
To reduce parameter degeneracies to a manageable level (remember how we like to do that?), we generally make the log of $S_0$, rather than $S_0$ itself, a parameter. In that case, to be explicit, the parameters of this part of the model are:
To get a little intuition for what this function looks like, write a function that evaluates $S(r; S_0, r_c, \beta)$ and make a (log-log) plot of it for a few choices of parameter values, showing the limiting behavior for $r\ll r_c$ and $r\gg r_c$.
TBC() # function definition and plot
The unmasked AGN in our image is point-like (really, it is), so we can describe it using just
Of course, the AGN doesn't appear point like because it's been smeared by the PSF, which we'll get to below. If we wanted to define a surface brightness for the AGN, it would involve a Dirac delta function, as in
$S_\mathrm{AGN}(x,y) = F_a \delta(x-x_a) \delta(y-y_a)$.
As described in the Data Intro, there are several contributors to the background, some of which are vignetted by the telescope optics and some of which are not. Breaking this down would be too much for us to deal with from just this data one data set. Our expectation (not actually verified here) is that the particle-induced background, which is roughly uniform and unvignetted, is generally dominant by far, so we will model only that component. This gives us a single parameter,
Note that there is no "per second" in this case because (see below) an unvignetted model will not be multiplied by the exposure map. (This is purely a conventional thing. We could certainly parametrize the background as the count rate per pixel, using the actual exposure time of the observation. But, given that the data products mix the exposure time and vignetting correction into a single map, the approach above is simpler.)
The models for the cluster and AGN above describe the X-rays that arrive at the telescope aperture. Two subsequent effects need to be accounted for before we can make a prediction for the data, which is the number of counts in each image pixel. First, the PSF smears out the image, and second the exposure map simultaneously accounts for the vignetting (reducing the apparent brightness of everything far away from the center of the field of view) and the exposure time (converting count rates to counts). As defined above, the expectation for the background is uniform, and needs neither of these corrections (it's already a prediction for the counts in each pixel).
Putting this all together, the mean counts/pixel predicted by the model can be written
$\mu(x,y) = b + T(x,y) \cdot \mathrm{PSF} \otimes \left[ S_\mathrm{AGN} + S_\mathrm{CL} \right]$,
where $T(x,y)$ is the exposure map and $\otimes$ indicates a convolution.
The exposure map is given to us, and we don't have a basis for assigning any uncertainty, so we'll take it to be fixed. The PSF we'll model as a symmetric 2D Gaussian; this is not entirely accurate, but will do for our purposes. This gives us yet another parameter,
Over to you: draw the PGM and enumerate the corresponding probabilistic relationships for the model.
TBC() # answer in Markdown
For simplicity and reproducability, we'll use wide, uniform priors on each of the parameters defined above, with the stipulation that the cluster center and AGN must be located within the image held by data
. In addition there are the usual mathematical/physical bounds to do with parameters encoding distances being non-negative, and our understandable desire to not divide by zero. Note that there is also a physical bound on $\beta$ from the requirement that the total flux of the cluster be finite. For completeness, list the prior you will use for each parameter here.
TBC() # answer in Markdown
We're not going to dictate what algorithm you use to find the posterior distribution. Before doing so, however, we'd like you to find an approximate posterior mode using scipy.optimize.minimize
. Experience has shown that this is an exceptionally good idea for this problem, rather than trying to start a standard MCMC chain from a guessed initial position.
For concreteness, write a function called cash
that returns $-2\ln(\mathrm{posterior})$ which you can minimize to find the posterior mode. (For the sampling distribution relevant to this problem, an author named Cash was responsible for pointing out the appropriate function form, and the fact that it isn't $\chi^2$.) cash
should take a parameter array as it's argument, since that's what minimize
expects to work with.
For most MCMC samplers you might use, writing a log-posterior function that cash
simply calls would be most convenient. But that's up to you.
Your implementation should also include a separate function, modelImage
, that evaluates $\mu(x,y)$ as given above. Being able to visualize the model prediction will make debugging much easier, trust us. To enable code in the notebook below to work, the argument list for this function should be the list of parameters in the order we define below (the actual argument names can vary):
param_names = ['b', 'x0', 'y0', 'lnS0', 'rc', 'beta', 'rpsf', 'xa', 'ya', 'lnFa']
param_labels = [r'$b$', r'$x_0$', r'$y_0$', r'$\ln S_0$', r'$r_c$', r'$\beta$', r'$R_\mathrm{psf}$',
r'$x_a$', r'$y_a$', r'$\ln F_a$']
One last, very important thing. This is going to be (for this class) a relatively expensive posterior to evaluate, at least if coded naively. Therefore, please do not actually perform the PSF convolution of the cluster model (and recall that the convolution is analytic for a point-like source). This is purely to save us time. However, in this particular case the cluster happens to be much more extended than the PSF, to it won't have a huge impact on the model we fit. For a much less massive or more distant cluster, this would be a truly terrible corner to cut.
Off you go then.
def modelImage(b, x0, y0, lnS0, rc, beta, rpsf, xa, ya, lnFa):
# return an image containing the predicted expected counts in each pixel, mu(x,y)
TBC()
def cash(params):
# return -2*log(posterior)
# `params' is an array of parameters in canonical order (given by `param_names', above)
# accessing `data' from global scope rather than through an argument is highly recommended here - it's bad
# coding style but better coding efficiency
TBC()
TBC_above()
Next, we'll want initial values for each parameter. This problem turns out to be pretty sensitive to the initial guess - not only will Markov chains converge very slowly if we start them in a silly place (which is one reason we're finding the posterior mode first), but even the optimization will struggle if the initial guess is too unreasonable. So it's worth taking some time to do this right. Here are my suggestions:
If you have a tool like SAOImage DS9, some of those suggestions will be much easier to check out, but they can also all be done here in the notebook. Either way, fill in the guess dictionary below, including your reasoning in comments if it differed from above:
TBC()
# guess = {'b':...,
# 'x0':...,
# 'y0':...,
# 'lnS0':...,
# 'rc':...,
# 'beta':...,
# 'rpsf':...,
# 'xa':...,
# 'ya':...,
# 'lnFa':...,
# }
guessvec = [guess[k] for k in param_names]
Now we can have a look at the model image produced by this guess (and revise the guess if need be):
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.imshow(logstretch(modelImage(*guessvec)), origin='lower', extent=data.extent());
And we can make sure that the cash
function returns something:
cash(guessvec)
Before turning things over to scipy.optimize.minimize
, we need to be able to tell it the allowed range for each parameter. This is done through a list (in parameter order) of lower and upper bounds, with None
standing for $\pm\infty$. Define this, consistent with your priors above, here.
TBC()
# bounds = [(...,...), # b
# (...,...), # x0
# (...,...), # y0
# (...,...), # lnS0
# (...,...), # rc
# (...,...), # beta
# (...,...), # rpsf
# (...,...), # xa
# (...,...), # ya
# (...,...), # lnFa
# ]
Finally, let's run the minimizer (expect this to take 30-60 seconds, depending on your guess) ...
%%time
bestfit = minimize(cash, guessvec, bounds=bounds)
bestfit
... and look at the result a little more naturally:
bestvec = bestfit.x
best = {k:v for k,v in zip(param_names,bestvec)}
best
Checkpoint: it should be possible for the minimizer to run successfully, i.e. without giving up and/or throwing an error.
Let's see if this set of parameters looks any different by eye:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.imshow(logstretch(modelImage(*bestvec)), origin='lower', extent=data.extent());
Next, you'll produce samples from the posterior, using your choice of method. Be warned that this posterior function (at least as I implemented it) is relatively slow. If you are using an MCMC method, do make sure you've converged, but after that it isn't necessary to run super-long to get lots of minimally correlated samples. (For a real research project we would either bite the bullet and run for a long time, or, more likely, do a better job coding.)
In your notebook, we expect to see:
If you use nested sampling or some other non-MCMC method, substitute the equivalent diagnostics and plots.
TBC()
Checkpoint: As we've done before, we can quickly compare your posterior samples to our own solutions. If your samples are not equally weighted, you will need to add the weights
argument to plotGTC
below.
solution = np.loadtxt('solutions/xray.dat.gz')
plotGTC([samples, solution], paramNames=param_labels, chainLabels=['yours', 'ours'],
figureSize=16, customLabelFont={'size':12}, customTickFont={'size':12}, customLegendFont={'size':16});
As ever, we're not done until we've done at least a simple check that the fitted model is consistent with generating the data. As in our toy photometry problem from earlier, we can do a quick frequentist goodness-of-fit test of the best fit (whose parameters we saved in best
and bestvec
) by comparing the Cash statistic of the best fit with its theoretical expectation if the model is correct. You can refer to that notebook for the basic outline. Here, go a little further by computing the p-value associated with $C$, i.e. the probability of getting a $C$ for the best fit that is at least as large as what we did get.
Note that the theoretical expectations computed by the cashstatistic
package are for a particular form of the Cash statistic which may or may not differ from your cash
function by a constant, so you'll want to also use the package's function to compute $C$ from the best fit for the comparison (or at least check that your function is consistent).
Below, compute and print
TBC()
Checkpoint: I get a pee value of ~0.03. So-so.
To get a feel for specifically which part of the model might be less than ideal, make a visualization that compares $S(r)$, azimuthally averaged brightness as a function of radius about the best-fit cluster center, from the data and a number of posterior samples. Looking at the image, we would expect to see a steady drop from the cluster center outwards, with a small bump due to the AGN, eventually approaching a constant (if the background is indeed uniform). The AGN itself will be washed out by averaging it with all the other azimuths at the same radius, so a second plot centered on the AGN position, out to smaller radial distance, might be a good idea.
TBC()
Comment on what you see in light of the Cash statistic test above.
TBC() # answer in Markdown
Here we are, finally having done some analysis of an astronomical image. Specifically, you've now performed X-ray photometry on a point-like source and an extended source. Admittedly, we cut a few corners, but this is still a more complete job than a surprising amount of the literature does. Hopefully you also learned a few things about juggling and fitting models to real data.