Tutorial: X-ray Image Modeling

Here we will make use of the data introduced in the XMM Image Intro notebook. Many of our early examples of generative models, as well as the Bayes on a Grid and Gibbs/Metropolis tutorials, were based around the task of performing photometry on image data, specifically with images sparse enough that the number of photon detections per pixel is typically small. This is exactly the regime we are in with the XMM data. Specifically, you will

It might sound like there's not much new here, and that's somewhat true. This tutorial is more about putting into practice what you've learned with less scaffolding than we normally provide. It's also the case that this data set is not entirely trivial to wrestle with (and that's with us already cutting some corners, as usual). Plus, we thought it was silly to have a course in statistical methods in astrophysics that didn't model and fit an image of the sky sooner or later.

Data

The class below is from the XMM Image Intro notebook, with one addition: a mask array which holds a True or False for each pixel, telling us whether we should pay attention to that pixel in the likelihood. This allows us to, by default, ignore the pixels where the exposure map is zero. We'll also use it to ignore most point-like sources.

As before, we'll read in and display the data...

... and crop the image slightly to better isolate the actual data.

As discussed in the Data Intro, the image contains a cluster of galaxies and a number of point-like Active Galactic Nuclei (AGN), most of which are relatively faint and may not be obvious in the images above. In some previous year, we made a by-eye list of pixel $x,y$ positions and radii (also in pixels) for the AGN, where the radii are supposed to encompass the regions where the AGN brightness clearly exceeds the uniform background. As you might guess, there are more sophisticated ways of doing this, but this works pretty well as an approach to removing bright point-like sources (provided you're willing to spend the human time making such a list).

As we'll discuss more in the Model section below, we'll want to mask (set data.mask[i,j]=False) all of these AGN except for the brightest one, which not coincidentally has the largest "radius" in the table above. Do so.

Let's check that this did what we want by re-displaying the data; pixels where data.mask==False are set to zero in the displayed images.

You might be wondering why we're choosing to leave one AGN unmasked here. The idea is that, since we know the AGN is tony compared with the telescope's PSF, it could provide us with a way to measure the size of the PSF from the image itself. In real life we would never do this - XMM is a space telescope that doesn't look through the atmosphere, so its PSF is plenty stable and can be well determined from observations of even brighter AGN. On the other hand, there are times when an AGN is projected onto an extended source of interest, and we want to model them both rather than throwing away a chunk of the target. So the problem we'll be solving is, at least, illustrative.

Model specification

To recap, our model needs to explain

  1. Galaxy cluster Abell 1835 (the big blob in the center).
  2. Various other sources (smaller blobs). These are point-like sources - mostly active galactic nuclei (AGN) - that have been smeared out by the telescope's point spread function (PSF).
  3. A roughly uniform background, consisting of unresolved AGN, diffuse X-rays from the Galactic halo and local hot bubble, and events due to particles (solar wind protons and cosmic rays) interacting with the CCD.

We'll address each of these in turn.

Cluster model

We'll use a common parametric model for the surface brightness (counts per unit time per pixel) of galaxy clusters: the azimuthally symmetric beta model,

$S_\mathrm{CL}(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2}$,

where $r$ is projected distance from the cluster center. Note that this model describes a 2D surface brightness distribution, since $r^2 = (\Delta x)^2 + (\Delta y)^2$. Also note that the pixel spacingis a perfectly respectable units of projected distance for our purposes.

To reduce parameter degeneracies to a manageable level (remember how we like to do that?), we generally make the log of $S_0$, rather than $S_0$ itself, a parameter. In that case, to be explicit, the parameters of this part of the model are:

  1. $x_0$, the $x$ coordinate of the cluster center;
  2. $y_0$, the $y$ coordinate of the cluster center;
  3. $\ln(S_0)$, log of the normalization in surface brightness units (counts/s/pixel);
  4. $r_c$, a radial scale (called the "core radius");
  5. $\beta$, which determines the slope of the profile (typical value is 2/3, and values from 1/2 to 1 are common enough).

To get a little intuition for what this function looks like, write a function that evaluates $S(r; S_0, r_c, \beta)$ and make a (log-log) plot of it for a few choices of parameter values, showing the limiting behavior for $r\ll r_c$ and $r\gg r_c$.

AGN model

The unmasked AGN in our image is point-like (really, it is), so we can describe it using just

  1. $x_a$, the $x$ coordinate of the AGN;
  2. $y_a$, the $y$ coordinate of the AGN;
  3. $\ln(F_a)$, the log of the count rate (not really a flux, but we'll use the symbol $F$ anyway) of the AGN. As with the cluster model, parametrizing the log allows us to more easily explore over orders of magnitude in flux.

Of course, the AGN doesn't appear point like because it's been smeared by the PSF, which we'll get to below. If we wanted to define a surface brightness for the AGN, it would involve a Dirac delta function, as in

$S_\mathrm{AGN}(x,y) = F_a \delta(x-x_a) \delta(y-y_a)$.

Background model

As described in the Data Intro, there are several contributors to the background, some of which are vignetted by the telescope optics and some of which are not. Breaking this down would be too much for us to deal with from just this data one data set. Our expectation (not actually verified here) is that the particle-induced background, which is roughly uniform and unvignetted, is generally dominant by far, so we will model only that component. This gives us a single parameter,

  1. $b$, the average background counts per pixel.

Note that there is no "per second" in this case because (see below) an unvignetted model will not be multiplied by the exposure map. (This is purely a conventional thing. We could certainly parametrize the background as the count rate per pixel, using the actual exposure time of the observation. But, given that the data products mix the exposure time and vignetting correction into a single map, the approach above is simpler.)

From a model of the sky to predicted counts

The models for the cluster and AGN above describe the X-rays that arrive at the telescope aperture. Two subsequent effects need to be accounted for before we can make a prediction for the data, which is the number of counts in each image pixel. First, the PSF smears out the image, and second the exposure map simultaneously accounts for the vignetting (reducing the apparent brightness of everything far away from the center of the field of view) and the exposure time (converting count rates to counts). As defined above, the expectation for the background is uniform, and needs neither of these corrections (it's already a prediction for the counts in each pixel).

Putting this all together, the mean counts/pixel predicted by the model can be written

$\mu(x,y) = b + T(x,y) \cdot \mathrm{PSF} \otimes \left[ S_\mathrm{AGN} + S_\mathrm{CL} \right]$,

where $T(x,y)$ is the exposure map and $\otimes$ indicates a convolution.

The exposure map is given to us, and we don't have a basis for assigning any uncertainty, so we'll take it to be fixed. The PSF we'll model as a symmetric 2D Gaussian; this is not entirely accurate, but will do for our purposes. This gives us yet another parameter,

  1. $R_\mathrm{PSF}$, the standard deviation of the PSF in pixel units.

Translation to a generative model

Over to you: draw the PGM and enumerate the corresponding probabilistic relationships for the model.

Priors

For simplicity and reproducability, we'll use wide, uniform priors on each of the parameters defined above, with the stipulation that the cluster center and AGN must be located within the image held by data. In addition there are the usual mathematical/physical bounds to do with parameters encoding distances being non-negative, and our understandable desire to not divide by zero. Note that there is also a physical bound on $\beta$ from the requirement that the total flux of the cluster be finite. For completeness, list the prior you will use for each parameter here.

Finding the posterior mode

We're not going to dictate what algorithm you use to find the posterior distribution. Before doing so, however, we'd like you to find an approximate posterior mode using scipy.optimize.minimize. Experience has shown that this is an exceptionally good idea for this problem, rather than trying to start a standard MCMC chain from a guessed initial position.

For concreteness, write a function called cash that returns $-2\ln(\mathrm{posterior})$ which you can minimize to find the posterior mode. (For the sampling distribution relevant to this problem, an author named Cash was responsible for pointing out the appropriate function form, and the fact that it isn't $\chi^2$.) cash should take a parameter array as it's argument, since that's what minimize expects to work with.

For most MCMC samplers you might use, writing a log-posterior function that cash simply calls would be most convenient. But that's up to you.

Your implementation should also include a separate function, modelImage, that evaluates $\mu(x,y)$ as given above. Being able to visualize the model prediction will make debugging much easier, trust us. To enable code in the notebook below to work, the argument list for this function should be the list of parameters in the order we define below (the actual argument names can vary):

One last, very important thing. This is going to be (for this class) a relatively expensive posterior to evaluate, at least if coded naively. Therefore, please do not actually perform the PSF convolution of the cluster model (and recall that the convolution is analytic for a point-like source). This is purely to save us time. However, in this particular case the cluster happens to be much more extended than the PSF, to it won't have a huge impact on the model we fit. For a much less massive or more distant cluster, this would be a truly terrible corner to cut.

Off you go then.

Next, we'll want initial values for each parameter. This problem turns out to be pretty sensitive to the initial guess - not only will Markov chains converge very slowly if we start them in a silly place (which is one reason we're finding the posterior mode first), but even the optimization will struggle if the initial guess is too unreasonable. So it's worth taking some time to do this right. Here are my suggestions:

If you have a tool like SAOImage DS9, some of those suggestions will be much easier to check out, but they can also all be done here in the notebook. Either way, fill in the guess dictionary below, including your reasoning in comments if it differed from above:

Now we can have a look at the model image produced by this guess (and revise the guess if need be):

And we can make sure that the cash function returns something:

Before turning things over to scipy.optimize.minimize, we need to be able to tell it the allowed range for each parameter. This is done through a list (in parameter order) of lower and upper bounds, with None standing for $\pm\infty$. Define this, consistent with your priors above, here.

Finally, let's run the minimizer (expect this to take 30-60 seconds, depending on your guess) ...

... and look at the result a little more naturally:

Checkpoint: it should be possible for the minimizer to run successfully, i.e. without giving up and/or throwing an error.

Let's see if this set of parameters looks any different by eye:

Find the posterior distribution

Next, you'll produce samples from the posterior, using your choice of method. Be warned that this posterior function (at least as I implemented it) is relatively slow. If you are using an MCMC method, do make sure you've converged, but after that it isn't necessary to run super-long to get lots of minimally correlated samples. (For a real research project we would either bite the bullet and run for a long time, or, more likely, do a better job coding.)

In your notebook, we expect to see:

If you use nested sampling or some other non-MCMC method, substitute the equivalent diagnostics and plots.

Checkpoint: As we've done before, we can quickly compare your posterior samples to our own solutions. If your samples are not equally weighted, you will need to add the weights argument to plotGTC below.

Checking the goodness of fit

As ever, we're not done until we've done at least a simple check that the fitted model is consistent with generating the data. As in our toy photometry problem from earlier, we can do a quick frequentist goodness-of-fit test of the best fit (whose parameters we saved in best and bestvec) by comparing the Cash statistic of the best fit with its theoretical expectation if the model is correct. You can refer to that notebook for the basic outline. Here, go a little further by computing the p-value associated with $C$, i.e. the probability of getting a $C$ for the best fit that is at least as large as what we did get.

Note that the theoretical expectations computed by the cashstatistic package are for a particular form of the Cash statistic which may or may not differ from your cash function by a constant, so you'll want to also use the package's function to compute $C$ from the best fit for the comparison (or at least check that your function is consistent).

Below, compute and print

Checkpoint: I get a pee value of ~0.03. So-so.

To get a feel for specifically which part of the model might be less than ideal, make a visualization that compares $S(r)$, azimuthally averaged brightness as a function of radius about the best-fit cluster center, from the data and a number of posterior samples. Looking at the image, we would expect to see a steady drop from the cluster center outwards, with a small bump due to the AGN, eventually approaching a constant (if the background is indeed uniform). The AGN itself will be washed out by averaging it with all the other azimuths at the same radius, so a second plot centered on the AGN position, out to smaller radial distance, might be a good idea.

Comment on what you see in light of the Cash statistic test above.

Parting thoughts

Here we are, finally having done some analysis of an astronomical image. Specifically, you've now performed X-ray photometry on a point-like source and an extended source. Admittedly, we cut a few corners, but this is still a more complete job than a surprising amount of the literature does. Hopefully you also learned a few things about juggling and fitting models to real data.