Tutorial: X-ray Image Data¶
In which we introduce a real astronomical data set, namely an image produced from X-ray CCD data. There's a fair bit of domain-specific information here which is not strictly necessary for this class, but it's useful stuff to see if you haven't worked with imaging data before (regardless of wavelength). Do note that we are still glossing over some things, though, since a more rigorous analysis that accounts for all instrumental and systematic effects would be more involved than we want any tutorials in this class to be.
In terms of the statistical analysis, we will be sticking with things you've done before, just practicing in a more realistic context where we might want to use a few tricks to speed up our calculations. We will make a few pointed suggestions along those lines, including the use of numba
, which you don't have to take... but this is a problem where taking 20% off the run time is noticeable. Unlike previous notebooks, we won't prescribe a specific algorithm or package for you to use. Exactly how to carry out the fits is up to you, although you are expected to follow best practices (testing convergence, etc.). This is not a terribly difficult inference, but - warning!- it can be tricky to get chains to sample efficiently. Good practice for situations that are not quite as nice as our pedagogical examples from earlier. It will probably take longer than typical to run (possibly quite a bit longer).
In more detail, you will
- be introduced to imaging data (specifically X-ray);
- fit a simple/symmetric model for the spatial distribution of emission from the gas in a cluster of galaxies, plus an instrumental background;
- fit a second model that includes a dense core of bright gas at the cluster center;
- compare the two for your own data, and (in conjunction with classmates) see how this varies among clusters.
from os import getcwd
from os.path import exists as file_exists
from yaml import safe_load, safe_dump
from astropy.io import fits
from astropy.visualization import LogStretch
from astropy.wcs import WCS
logstretch = LogStretch()
from numba import njit
import numpy as np
from regions import Regions
import scipy.stats as st
import matplotlib.pyplot as plt
%matplotlib inline
import incredible as cr
from pygtc import plotGTC
thisTutorial = 'xray_image'
if getcwd() == '/content':
# assume we are in Colab, and the user's data directory is linked to their drive/Physics267_data
from google.colab import drive
drive.mount('/content/drive')
datapath = '/content/drive/MyDrive/Physics267_data/' + thisTutorial + '/'
else:
# assume we are running locally somewhere and have the data under ./data/
datapath = 'data/'
Nature of the data¶
Modern X-ray CCDs are technologically similar to the CCDs used in optical astronomy: when a photon hits a pixel, one or more electrons are promoted into the conduction band and trapped there until being read out. The main practical difference is that X-ray photons are rarer and their energies much higher.
This means that:
- Only for exceptionally bright sources will we ever have $>1$ photon hit a given pixel in an integration, if we read out the CCD every few seconds.
- We do not get 1 electron promoted per photon, as is the case in visible wavelength CCDs. Instead, the number of electrons is roughly proportional to the photon's energy, which means that these imaging devices are actually imaging spectrometers.
- When we say "counts" in this context, we mean "pixel activation events" rather than number of electrons trapped, so that (as in optical astronomy) we're referring to the number of photons detected (or other events that look like photons to the detector).
For simplicity, we will not be dealing with "raw" data as described above, but with images that have been made from it. In particular, the spectral information has been thrown away, apart from limiting the counts in our image to a particular energy band (0.6-2.0 keV in this case). The images have also been binned by a factor of 4 to reduce the computational overhead (i.e. the number of image pixels) for this notebook. The original data are from the Chandra X-ray Observatory, and have been cleaned, filtered, and had other things done to them that are beyond the scope of this problem.
The 2 files we need are:
- the image, in units of counts per image pixel ($4\times4$ detector pixels)
- the exposure map
The latter is a little confusing, as it plays multiple roles. In particular, the exposure map encodes
- vignetting, which effectively decreases the telescope's sensitivity with increasing distance from the optical axis;
- the quantum efficiency of each detector pixel, especially dead or damaged pixels; (again, recorded like the flat field in optical bands);
- the effective collecting area of the telescope for the photon energies recorded in the image.
The first two of these are functionally like the flat field image used in optical astronomy. The latter point means that the exposure map has units of area, specifically $\mathrm{cm}^2$. Oddly enough, in this case, the exposure map does not include the actual exposure time of the observation, so we will need to account for that later, giving the map units of $\mathrm{cm}^2\cdot\mathrm{s}$. This is exactly what we need to transform a model of the sky in $\mathrm{counts}\,\mathrm{s}^{-1}\,\mathrm{cm}^{-2}\,\mathrm{pixel}^{-1}$ into a prediction for the expected number of counts in each image pixel. (The model defined this way is in terms of surface brightness, i.e. counts or energy per unit area, time, and solid angle.) Note that, from here on out, "pixel" will refer to an entry in the data arrays (an image pixel), not a physical pixel in the detector.
Both files are in FITS image format, which we can read in using astropy.io.fits
.
imfits = fits.open(datapath + 'image_0.6-2.0keV.fits.gz')
exfits = fits.open(datapath + 'expmap_1.25kev.fits.gz')
Let's see what we've got:
imfits.info()
Filename: ../../data/public/xray_image/image_0.6-2.0keV.fits.gz No. Name Ver Type Cards Dimensions Format 0 PRIMARY 1 PrimaryHDU 983 (883, 883) int16
exfits.info()
Filename: ../../data/public/xray_image/expmap_1.25kev.fits.gz No. Name Ver Type Cards Dimensions Format 0 PRIMARY 1 PrimaryHDU 173 (883, 883) float32
In general, FITS files can contain multiple data structures, but each of these has only an image array and supporting header information. The image in imfits
is of integer type (remember, counts!), while exfits
is of floating point type.
Here we extract the actual array of data from each object into numpy
arrays.
im = imfits[0].data
ex = exfits[0].data
print(im.shape, im.dtype)
print(ex.shape, ex.dtype)
(883, 883) >i2 (883, 883) >f4
Note: If we wanted im
to be floating point for some reason, we would need to cast it, as in im = imfits[0].data.astype('float64')
. We'll use this type converseion below, but it's not necessary here.
As mentioned above, we will need the exposure time to convert a model for surface brightness into a prediction for the counts in each pixel. So let's go ahead and extract that information from the FITS header, and multiply it into the exposure map.
ex *= imfits[0].header['exposure']
Let's have a look the image and exposure map. It's often helpful to stretch images on a logarithmic scale because some sources can vary spatially in brightness by orders of magnitude. (It's often still not easy to see any detail without manually adjusting the bias and contrast, but we'll see a simple way to do that interactively in a minute.) The exposure map varies much less, so a linear scale works better in that case.
plt.rcParams['figure.figsize'] = (12.0, 6.0)
fig, ax = plt.subplots(1,2);
ax[0].imshow(logstretch(im), cmap='gray', origin='lower');
ax[0].set_title('image (log scale)');
ax[1].imshow(ex, cmap='gray', origin='lower');
ax[1].set_title('exposure map');
Aside: An endless source of confusion and bugs is the fact that images are conventionally indexed by line and sample (vertical and horizontal position), while we conventionally order coordinates in 2 dimensions $x,y$ (horizontal and vertical). In addition, the usual convention for FITS images orders the lines from the bottom of the image up rather than the top down. Hence, to display the image the right way up, we need to use the
origin='lower'
option toimshow
above.
Note that information from multiple CCDs has been combined here, and that X and Y in the image arrays correspond to celestial coordinates (right ascension and declination) rather than X and Y on a given detector or in the focal plane. There are fancy packages that can display such images with celestial coordinates on the axes (the translation from image pixel position to celestial coordinates is encoded in the FITS header), but we won't need to deal with such things in this exercise.
In the image, we can see:
- A galaxy cluster (the big blob).
- Various other sources (smaller blobs). These are point-like sources - mostly active galactic nuclei (AGN) - that have been smeared out by the telescope's point spread function (PSF). You might even be able to see clearly how the size of the PSF increases with distance from the optical axis.
- A roughly uniform background, consisting of fainter AGN, diffuse X-rays from the Galactic halo and local hot bubble, and events due to particles (solar wind protons and cosmic rays) interacting with the CCD.
- In the data used for the public solutions, we can also see a bright streak connecting the brightest part of the cluster to the upper edge of the CCD. This is due to charge transfer inefficiency, as opposed to a physical structure in the Universe. Fortunately, the surface brightness needs to be extremely large for this effect to show up above the background, so you're unlikely to see it in your own data.
The exposure map shows:
- An overall gradient with distance from the optical axis - this is the vignetting function of the telescope.
- Fuzzy boundaries between the CCDs (fuzzy because of dithering), and a number of "bad columns" where the exposure has been set to zero (usually this is due to issues with the readout electronics rather than damaged detector pixels). The latter may not correspond to zeros in our exposure map, again because of dithering, and because we have binned up the original, detector-resolution map.
The effects of (2) are also visible in the counts image. The vignetting is less obvious, since we mostly see background in regions where the vignetting is significant, and most of the background is due to particles that are not vignetted (they aren't focused by the optics, and hit the detector from all directions approximately equally).
The exact details will of course vary depending on which data set you were assigned, but in general the comments above should apply.
Working with the data¶
As we've seen above, the image is encoded in a standard numpy
array. But, recalling the aside, we need to be careful about indexing the array. Specifically, we need to remember that the first array index corresponds to $y$ (the vertical direction in the image), and the second index corresponds to $x$ (horizontal). Alternatively, we could create arrays that hold the $x$ and $y$ indices corresponding to each pixel. This is a little wasteful in terms of computer memory, but allows us to do calculations involving the positions of and distances between points in the image without constantly relying on our human memories to recall something counterintuitive.
The following cell adds such index arrays (imx
and imy
) to a data dictionary, along with the image and exposure map.
data = {'im': im, 'ex':ex}
data['imx'], data['imy'] = np.meshgrid(np.arange(im.shape[1]), np.arange(im.shape[0]))
In case it wasn't clear, a quick look at imx
and imy
probably explains what they're good for:
plt.rcParams['figure.figsize'] = (6.0, 6.0)
fig, ax = plt.subplots(1,2);
ax[0].imshow(data['imx'], cmap='gray', origin='lower');
ax[0].set_title('imx');
ax[1].imshow(data['imy'], cmap='gray', origin='lower');
ax[1].set_title('imy');
The range of values in each is what you'd expect for indices into arrays of this size:
print(data['imx'].min(), data['imx'].max())
print(data['imy'].min(), data['imy'].max())
0 882 0 882
With these defined, we can easily compute, e.g., the distance of each pixel from a given location $(x_0,y_0)$, without ever having to think about the indexing conventions again.
Before jumping into the analysis, there is one more practical issue to take care of, namely the various sources in the image that are not the galaxy cluster of interest. In principle, we could fit a very complex model that includes all of these sources. But we don't actually care about them, and doing so would require modeling the point spread function of the telescope, which varies in a complicated way over the field of view (it can, however, be ignored for the cluster itself, given its inherent fuzziness). A better option is to simply mask (i.e. ignore) parts of the image where non-cluster sources are visible.
There are clever ways to algorithmically detect sources (and specifically point-like sources) in images like this, given knowledge of how the PSF varies across the field of view, but they're not foolproof, and in any case are beyond the scope for us. Instead, we will just manually identify the regions to be masked, using a tool called JS9. (This is a Javascript approximation of DS9; if you're already familiar with that, have it installed and prefer to use it, then go ahead.) It should magically appear when running the following cell.
%%html
<div class="JS9Menubar"></div>
<div class="JS9"></div>
<link type="text/css" rel="stylesheet" href="//js9.si.edu/jupyter/js9-allinone.css">
<script type="text/javascript" src="//js9.si.edu/jupyter/js9-allinone.js"></script>
As convenient as it is to have a tool like this just pop up in a notebook, it isn't (as far as we know) possible to directly open a file from Google drive, so if you're used to running these notebooks in Colab, you will still need to download the image FITS file to your local hard drive. Then open it using the menu
- File $\rightarrow$ open local...
Like above, we will want to log-scale it
- Scale $\rightarrow$ log
Move the mouse around over the image. At the top, you can see: the number of counts under the mouse cursor, the celestial coordinates corresponding to that position, and the physical coordinates (yet another coordinate system, defined with respect to the detector module).
Left click + drag will change the contrast and bias; right click + drag will pan the view. It's worth taking a minute to play with the contrast and bias to see how you might change these in order to (1) best see the detail of the cluster or (2) best see the point sources showing up above the background (and/or above the cluster).
On to defining the regions to mask. Click Regions $\rightarrow$ circle. A green circle should appear in the middle of the window. Your goal is to add regions like this that enclose
- the point-like sources,
- the fuzzy edges between the CCDs, and
- any CCDs we will not use.
For the latter two, you should use the rectangle shape, which can be elongated and rotated, rather than the circle.
You don't have to go crazy circling anything that might possibly be a point source (it's easy to go too far and mask statistical fluctuations in the background), but do remove anything that is reasonably clear, keeping in mind that they will appear larger the farther they are from the optical axis. The Edit $\rightarrow$ copy/paste shortcuts are your friend here. As for which CCDs not to use:
- If your data include a $2\times2$ grid of CCDs, then any additional CCDs not in the $2\times2$ grid should be removed.
- Otherwise, your data will look like 3 or 4 CCDs in a row, possibly with 1 or 2 off to the side. In this case, 2 of the CCDs in the row will clearly have many more counts than the others. These are the only 2 that should be used.
If in doubt, refer to the solutions as a guide. (Note that the solutions also include a box covering the streak mentioned above, which you probably will not have to deal with.)
When you're done, click Regions $\rightarrow$ list, and save the text in the window that pops up in as a string in the cell below. You can also save a region file (this just contains the same text) to disk for safe keeping, in case you want to come back and edit the regions in JS9 again.
# region_string = """
# FK5
# circle(...
# ...
# box(...
# ...
# """
We will now use the regions
package, in conjunction with the coordinate information in the FITS header, to define an array of the same size as the image, that holds the value True
for pixels within the union of the regions above and False
outside of them.
wcs = WCS(imfits[0].header)
regions = Regions.parse(region_string, format='ds9')
# One wonders why the below is not a single function call... perhaps in some future version...
regions_pix = [region.to_pixel(wcs) for region in regions]
region_mask = np.full(im.shape, 0.0)
for region in regions_pix:
region_mask += region.to_mask().to_image(im.shape)
region_mask = (region_mask != 0.0)
WARNING: FITSFixedWarning: 'datfix' made the change 'Set DATEREF to '1998-01-01' from MJDREF. Set MJD-END to 53973.732083 from DATE-END'. [astropy.wcs.wcs]
In case that explanation was too convoluted, this is what the mask looks like:
plt.rcParams['figure.figsize'] = (12.0, 6.0)
fig, ax = plt.subplots(1,2);
ax[0].imshow(logstretch(im), cmap='gray', origin='lower');
ax[0].set_title('image (log scale)');
ax[1].imshow(region_mask, cmap='gray', origin='lower');
ax[1].set_title('region mask from JS9');
We will similarly want to mask pixels that have very low values in the exposure map. Recall that (other than the places where there isn't an actual CCD), these are largely due to bad columns or damaged detector pixels. Rather than perfectly trust these lowered exposure values, it's safer to entirely mask these. Below, define an exp_mask
that works like the region_mask
(i.e. Boolean type, and True
in the pixels you want to mask) that masks pixels where the exposure map is less than half of its maximum value. Note that you do not need to invoke the regions
package to do this; it's just a function of the exposure map array that we already have.
# exp_mask = ...
plt.rcParams['figure.figsize'] = (12.0, 6.0)
fig, ax = plt.subplots(1,2);
ax[0].imshow(ex, cmap='gray', origin='lower');
ax[0].set_title('exposure map');
ax[1].imshow(exp_mask, cmap='gray', origin='lower');
ax[1].set_title('low exposure mask');
Let's visualize what's left of the image if we take away the union of those two masks. You should still see the extended emission from the cluster, but the other sources and regions we identified above should be blank (again, compare with the solutions if in doubt).
combined_mask = np.logical_or(region_mask, exp_mask)
plt.rcParams['figure.figsize'] = (16.0, 6.0)
fig, ax = plt.subplots(1,3);
ax[0].imshow(logstretch(im), cmap='gray', origin='lower');
ax[0].set_title('image (log scale)');
ax[1].imshow(logstretch(np.where(combined_mask, np.nan, im)), cmap='gray', origin='lower');
ax[1].set_title('masked image (log scale)');
ax[2].imshow(np.where(combined_mask, 0.0, ex), cmap='gray', origin='lower');
ax[2].set_title('masked exposure map');
Notionally, we can use combined_mask
to limit all the calculations we'll need to do (e.g. summing the log-likelihood over pixels) to the useful part of the image. In practice, the code will be signifcantly more efficient if we re-define the data that we want to use as 1D arrays that only include the pixels of interest. This is another reason that having our imx
and imy
arrays is nice - to take advantage of this trick, we need to have the x
and y
positions of each pixel explicitly saved so we will know where they belong. In case some of these numpy
functions are unfamiliar, the following saves an array of indices, corresponding to the unmasked pixels, into the image or exposure map arrays once they have been collapsed to 1D.
goodpix = np.flatnonzero(~combined_mask)
We can then use the flatten
function to change the shape of each array to 1D, and subscript them with goodpix
to extract just the pixels we care about. We also change the data types of the image and exposure map here, because the final computational speed-up we will recommend (later on) requires them to be 64-bit for whatever reason.
data['flatim'] = data['im'].flatten()[goodpix].astype('int64') # numba refuses to work with short int
data['flatex'] = data['ex'].flatten()[goodpix].astype('float64') # numba refuses to work with single precision float
data['flatx'] = data['imx'].flatten()[goodpix]
data['flaty'] = data['imy'].flatten()[goodpix]
data['goodpix'] = goodpix
Just to be absolutely sure that we haven't lost something vital, and tp test your array juggling skills, let's use these flattened/shortened arrays to reconstruct the part of the image that we care about. Define a new_im
array that should be identical to data['im']
in the pixels we care about and nan
in those we don't care about, using only the flattened data above. (You can also use the shape
of data['im']
, just not its contents.)
# new_im = ...
plt.rcParams['figure.figsize'] = (12.0, 6.0)
fig, ax = plt.subplots(1,2);
ax[0].imshow(logstretch(data['im']), cmap='gray', origin='lower');
ax[0].set_title('original image (log scale)');
ax[1].imshow(logstretch(new_im), cmap='gray', origin='lower');
ax[1].set_title('rebuilt image (log scale)');
Let's make sure they're exactly the same:
assert np.all(np.logical_or(data['im']==new_im, np.isnan(new_im)))
Defining the model¶
This is an unusual problem for us in that it's entirely possible that no simple model we can write down will be a genuinely good fit to the data. After all, the object we're looking at is a complex thing, well resolved by the telescope, and possibly hosting many kinds of non-trivial yet detectable features. So, instead, we will keep things simple and use a model that will not perfectly explain the data, but will be good enough for some purposes.
More specifically, we'll use the azimuthally symmetric "beta model" (no relation to the beta distribution), a common parametric model for the surface brightness of galaxy clusters:
$S_\mathrm{CL}(x,y) = S_\mathrm{CL}(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2}$,
where $r$ is projected distance from the cluster center. Note that this model describes a 2D surface brightness distribution, since $r^2 = (\Delta x)^2 + (\Delta y)^2$. Also note that we can get away with using the Pythagorean theorem like this to compute angular distance over sufficiently small areas (like our field of view), even though the sky is a sphere rather than a plane.$^1$
We are basically free to choose what units we will define $S_0$ and $r_c$ to have, as long as the model can eventually make a prediction for the counts in each pixel. The most convenient choice for our limited purposes is for $S_0$ to have units of $\mathrm{counts}\,\mathrm{s}^{-1}\,\mathrm{cm}^{-2}\,\mathrm{pixel}^{-1}$ and $r_c$ to have units of pixels - this bakes in specifics about our telescope and binning (remember, these are image pixels), but that's fine for us.
The beta model isn't the entirety of the model we'll need, but let's take a second to code up the function above first.
@njit
def betaModel(r, S0, rc, beta):
'''
Evaluate the beta model given above as a function of radius
'''
Long aside: the @njit
decorator here tells numba
(a package we imported above) that this function is one that it should replace with a compiled version. The benefit is that executing the function can become significantly faster than if Python were interpretting it each time. There is a necessary downside, which is that numba
must be able to identify the data type of every variable that's used in the function the first time it's executed (this is when the compilation happens, "Just In Time"), and those variables must have the same data type in all future calls to the function.$^2$ The upshot is that we end up writing code that is a little more barebones and less easily generalizable, e.g. avoiding the use of object oriented methods that numba
isn't sophisticated enough to decipher, at least in njit
functions.
In this particular case, we found speedups of ~20% using numba
throughout the model evaluation, which seemed worth the extra effort. So, if you for some reason cannot get the notebook to work using numba
, you can just remove the @njit
line(s) and do the problem without it; you'll just end up waiting longer. But this functionality is useful enough in real life, where calculations can take significant time, that it's worth learning about and giving it a try.
It's always a good idea to test any function we write, and this is even more true if the first execution can potentially throw a compilation error. So here we go:
betaModel(1.0, 1.0, 1.0, 0.67) # get it compiled and check for errors of whatever kind
If you have the patience, you can see the difference that compilation makes by running the timing cell below, then going back and redefining the function with @njit
commented out and re-running it. (This isn't perfect, because both are pretty fast and the for loop itself will eat up some time, but you will probably still see a factor of a few difference.)
%%time
for i in range(1000000):
betaModel(1.0, 0.0, 1.0, 0.67);
CPU times: user 317 ms, sys: 1.55 ms, total: 318 ms Wall time: 318 ms
Moving on, the plot below shows the impact of the $\beta$ parameter on this function. Broadly,
- $S_0$ normalizes the profile,
- $r_c$ is a "core radius" within which the profile is approximately flat, and
- $\beta$ determines the slope of the profile at large radii.
In addition, we will have the position of the cluster center in $x$ and $y$ as free parameters: $(x_0,y_0)$.
Incidentally, $\beta=2/3$ is a "canonical" value, in that the beta model with this value describes an isothermal sphere of gas in hydrostatic equilibrium (i.e. an oversimplified model). That doesn't mean that $2/3$ is the value that you should or will find, but as least provides a broad idea of what might be reasonable.
plt.rcParams['figure.figsize'] = (6., 4.)
rgrid = 10.**np.linspace(-1., 1.)
plt.loglog(rgrid, betaModel(rgrid, 1.0, 1.0, 0.5), label=r'$\beta=0.5$')
plt.loglog(rgrid, betaModel(rgrid, 1.0, 1.0, 0.67), label=r'$\beta=0.67$')
plt.loglog(rgrid, betaModel(rgrid, 1.0, 1.0, 1.0), label=r'$\beta=1.0$')
plt.legend();
plt.xlabel(r'$r/r_\mathrm{c}$');
plt.ylabel(r'$S(r)/S_0$');
While we cleverly masked out the visible non-cluster sources above, the relatively uniform background still needs to be included in our model. In practice, this background consists of both real X-rays that are focused by the optics and therefore vignetted, and events from particle showers due to cosmic rays interacting with mass in the spacecraft; the latter hit the detector from all directions and are not vignetted by the optics. To keep things simple, we will model only the particle-induced component, since it's much larger than the real X-ray background once all of the point-like sources bright enough to see have been removed. Since this background is not vignetted, the simplest approach is to define it such that we will not need to multiply it by the exposure map to predict how many counts in each pixel it produces. That is, we will parametrize the background as a uniform expected number of background counts per pixel, $B$, with details like the length of the observation implicitly built in. The complete model prediction for the counts in a pixel is then
$N(x,y) = S_\mathrm{CL}(x,y) \, E(x,y) + B$
where $E(x,y)$ is the exposure map.
To recap, our model parameters are
- $x_0$: x pixel coordinate of the cluster center,
- $y_0$: y pixel coordinate of the cluster center,
- $S_0$: cluster surface brightness normalization,
- $r_c$: cluster core radius,
- $\beta$: cluster slope parameter,
- $B$: background count density.
We'll be comparing this model with one where the cluster is described by the sum of 2 concentric beta models,
$S_\mathrm{CL}(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2} + S_0' \left[1.0 + \left(\frac{r}{r_c'}\right)^2\right]^{-3\beta' + 1/2}$
where the second beta model shares the same center, $(x_0,y_0)$, and is otherwise parametrized by
- $S_0'$,
- $r_c'$,
- $\beta'$.
Think about what priors might make sense for these different parameters in general; as usual, we will arrive at consensus in class.
paramnames = ['x0', 'y0', 'S0', 'rc', 'beta', 'B']
paramnames2 = paramnames + ['S02', 'rc2', 'beta2']
Below, as we always do, specify the full generative model in expressions and a PGM. We may as well include both beta models here, since we can always get back to the single beta model by setting $S_0'=0$.
Space for your generative model
Visualizing the model¶
Comparing the data and model visually is a little tricky in this case. Once we get away from the brightest part of the cluster, the most common values in the image are likely 0 and 1, which is to say that Poisson noise dominates the value in any given pixel. So, for example, if we made a residual image of a given model prediction minus the data, we would still see a great deal of noise at those radii, and it wouldn't simply tell us whether our background parameter had a reasonable value.
Given that we're fitting an azimuthally symmetric model to the cluster to begin with, it seems reasonable to instead compare the data and model in one dimension, as a function of radius from the (model) cluster center. We lose some information by doing this, but on the other hand it does solve the issue above neatly by combining lots of pixels at large radii together before doing a comparison.
Because we are the greatest, here is a complete class that computes and stores such profiles. In detail, it stores the sum of the image, of the exposure map, and the number of pixels in a series of quasi-sensibly defined annuli about the given center. From these we can simply compute the average counts per pixel or the average surface brightness in each annulus, or just look at the number of counts in each.
class Profile:
def __init__(self, img, x0, y0, rmax, nbins=32):
r2 = np.exp(np.linspace(0.0, np.log(rmax), nbins))
r1 = np.concatenate(([0.0], r2[:-1]))
N = np.zeros(r1.shape)
E = np.zeros(r1.shape)
npix = np.zeros(r1.shape)
d = np.sqrt((img['flatx'] - x0)**2 + (img['flaty'] - y0)**2)
for i in range(len(N)):
j = np.logical_and(d >= r1[i], d < r2[i])
N[i] += img['flatim'][j].sum()
E[i] += img['flatex'][j].sum()
npix[i] += j.sum()
r = np.sqrt(r1 * r2)
r[0] = np.sqrt(0.3 * r2[0]) # fiat
j = np.flatnonzero(npix > 0) # divide by zero warnings are annoying
self.r1 = r1[j]
self.r2 = r2[j]
self.r = r[j]
self.N = N[j]
self.E = E[j]
self.npix = npix[j]
This gives us 3 potentially useful ways of looking at the data with respect to some reference point: the number of counts, the average counts per pixel, or the average surface brightness as a function of radius. Below, we see these 3 profiles, arbitrarily centered on the mean pixel position.
stupid_profile = Profile(data, data['flatx'].mean(), data['flaty'].mean(), 600.); # hopefully 600 is large enough for a maximum radius?
fig, ax = plt.subplots(1,3, figsize=(20.0, 4.0))
ax[0].loglog(stupid_profile.r, stupid_profile.N, '.');
ax[1].loglog(stupid_profile.r, stupid_profile.N/stupid_profile.npix, '.');
ax[2].loglog(stupid_profile.r, stupid_profile.N/stupid_profile.E, '.');
ax[0].set_ylabel('Total counts'); ax[1].set_ylabel('Average counts/pixel'); ax[2].set_ylabel('Average surface brightness');
for i in range(3): ax[i].set_xlabel('Radius (pixels)');
These plots might look very silly if the center of the profile is not actually near the center of the cluster. Normally, we'd expect to see the surface brightness decreasing monotonically with radius before flattening as the background becomes dominant. The important thing is that, finally, we have a way to compare the data and model straightforwardly. Which means that you can try out some different values, and narrow down on a set of parameters that are at least plausible. Use the cells below to do so for the single-beta-model case; each plot will include the beta model, the background, and their sum. You can use the image and the profiles above to inform initial guesses at some of them, before refining them.
# guess = {'x0':..., 'y0':..., 'S0':..., 'rc':..., 'beta':..., 'B':...}
guess_profile = Profile(data, guess['x0'], guess['y0'], 600.); # hopefully 600 is large enough for a maximum radius?
fig, ax = plt.subplots(1,3, figsize=(20.0, 4.0))
ax[0].loglog(guess_profile.r, guess_profile.N, '.');
ax[0].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta'])*guess_profile.E + guess['B']*guess_profile.npix, '-', label='Total');
ax[0].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta'])*guess_profile.E, '--', label='beta');
ax[0].plot(guess_profile.r, guess['B']*guess_profile.npix, '--', label='background');
ax[0].legend()
ax[1].loglog(guess_profile.r, guess_profile.N/guess_profile.npix, '.');
ax[1].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta'])*guess_profile.E/guess_profile.npix + guess['B'], '-');
ax[1].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta'])*guess_profile.E/guess_profile.npix, '--');
ax[1].plot(guess_profile.r, guess['B']+0*guess_profile.npix, '--');
ax[2].loglog(guess_profile.r, guess_profile.N/guess_profile.E, '.');
ax[2].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta']) + guess['B']/guess_profile.E*guess_profile.npix, '-');
ax[2].plot(guess_profile.r, betaModel(guess_profile.r, guess['S0'], guess['rc'], guess['beta']), '--');
ax[2].plot(guess_profile.r, guess['B']/guess_profile.E*guess_profile.npix, '--');
ax[0].set_ylabel('Total counts'); ax[1].set_ylabel('Average counts/pixel'); ax[2].set_ylabel('Average surface brightness');
for i in range(3): ax[i].set_xlabel('Radius (pixels)');
Do inference for a single beta model + background¶
You are now on your own to sample the posterior of the above model (with one beta component) given your data.
Requirements:
- Produce equally weighted samples from the posterior and store them (after removing any burn-in and doing any thinning you feel necessary) in
chains
in the form of multiple (at least 4) chains, as either a list of $N_\mathrm{samples} \times N_\mathrm{params}$ arrays, or a single $N_\mathrm{chains} \times N_\mathrm{samples} \times N_\mathrm{params}$ array (i.e., in one of the formats thatincredible
understands). This doesn't mean you necessarily have to use an MCMC method, but if you use some other method you must be able to extract results in the above format (even if it means artificially breaking samples into multiple "chains"). There will be cells below for plotting traces, and tests for convergence and a minimum number of effectively independent samples, that will assume this format, and that must be passed. Parameters should be ordered as inparamnames
. - "Approximate Methods" methods are not allowed.
- If your solution includes finding the posterior maximum, show profile plots for it like those above for the guess.
- Include
%%time
at the start of any cell that takes a while to run.
Suggestions:
- If/when writing a function to evaluate the log-posterior (and/or log-prior, log-likelihood), use the
@njit
decorator to squeeze some extra efficiency out of it. It's probably worth it here, and it's definitely worth learning how to usenumba
. njit
aside, using ascipy
distribution'slogpdf
method is (in this case) noticeably slower than directly coding the log PDF, especially if you leave out normalizing terms that don't depend on the model parameters.- The normalization and core radius parameters of the cluster model are strongly degenerate (this is well known among the clusteratti, and is related to preserving the total flux of the model; you might even have noticed it when refining your guess above). It's extremely common for the normalization to be parametrized by the log of $S_0$ rather than $S_0$ directly to reduce the degeneracy. (Don't forget that priors are not generally invariant under reparametrization.)
- Depending on what method you employ, it may be a good idea to find the posterior maximum before launching chains.
Also note: the solutions are intentionally minimal and vague, since it's no fun if you all decide to just solve things exactly the way we did. This doesn't mean that you shouldn't add various checks of your work, even if you don't see them there. Anyway, here are a bunch of cells for your solution (you don't have to use all of them).
%%time
print("Here we are benchmarking a function that will be called many times.")
Here we are benchmarking a function that will be called many times.
CPU times: user 793 ms, sys: 6.18 ms, total: 799 ms Wall time: 804 ms
%%time
print("Here we are doing some kind of calculation.")
Here we are doing some kind of calculation.
CPU times: user 2.87 s, sys: 14 ms, total: 2.88 s Wall time: 2.89 s
%%time
print("Now we finally get samples from the posterior.")
Now we finally get samples from the posterior.
CPU times: user 8min 31s, sys: 4.17 s, total: 8min 35s Wall time: 8min 54s
Don't forget to check whatever diagnostics apply to your analysis.
Also, don't forget to coerce the results into the format described above.
Testing the single-beta fit¶
Welcome back! We will now look at standard diagnostics of the chains you stored in chains
.
fig, ax = plt.subplots(len(paramnames), 1, figsize=(20, len(paramnames)*3));
if chains.__class__ == list:
cr.plot_traces(chains[:min(8,len(chains))], ax, labels=paramnames);
else:
cr.plot_traces(chains[:min(8,chains.shape[0]),:,:], ax, labels=paramnames);
R = cr.GelmanRubinR(chains)
print("R =", R)
assert np.all(R < 1.1) # this is permissive, but fine for practice, I guess
R = [1.00906184 1.00228357 1.00263847 1.00343508 1.0057456 1.01370015]
maxlag = 750 # you might need to change this
neff = cr.effective_samples(chains, maxlag=maxlag, throw=True)
print("neff =", neff)
assert np.all(neff > 200) # this is not a lot, but fine for practice, I guess
neff = [695.34157466 659.57945742 613.12109672 604.57153732 618.27613321 723.61318041]
Assuming these tests were passed, we'll combine the chains and look at a triangle.
chain1 = np.concatenate(chains, axis=0)
plotGTC(chain1, paramNames=paramnames, figureSize=8, customLabelFont={'size':12}, customTickFont={'size':12});
Finally, let's compare a model representative of the peak of the posterior (hopefully the median will do) to the data in the form of a profile:
fitvec = np.median(chain1, axis=0)
fit = {p:v for p,v in zip(paramnames, fitvec)}
fit_profile = Profile(data, fit['x0'], fit['y0'], 600.); # hopefully 600 is large enough for a maximum radius?
plt.rcParams['figure.figsize'] = (6., 4.)
plt.loglog(fit_profile.r, fit_profile.N/fit_profile.E, '.');
plt.plot(fit_profile.r, betaModel(fit_profile.r, fit['S0'], fit['rc'], fit['beta']) + fit['B']/fit_profile.E*fit_profile.npix, '-', label='Total');
plt.plot(fit_profile.r, betaModel(fit_profile.r, fit['S0'], fit['rc'], fit['beta']), '--', label='beta');
plt.plot(fit_profile.r, fit['B']/fit_profile.E*fit_profile.npix, '--', label='background');
plt.ylabel('Average surface brightness');
plt.xlabel('Radius (pixels)');
plt.legend();
Is this model obviously deficient in some way, or does it look like a decent fit?
I_have_contemplated_this_deeply = False # change to True when true
assert I_have_contemplated_this_deeply
Do inference for a double beta model + background¶
Same as above, for a model with 2 beta components sharing the same center describing the cluster. Store the resulting samples in chains2
, with parameters ordered as in paramnames2
.
%%time
print("Doing something")
Doing something
CPU times: user 25.6 s, sys: 118 ms, total: 25.7 s Wall time: 25.8 s
%%time
print("Getting samples from the posterior")
Getting samples from the posterior
CPU times: user 37min 16s, sys: 15.4 s, total: 37min 32s Wall time: 38min 52s
Testing the double-beta fit¶
Welcome back! We will now look at the same diagnostics as before, for chains2
.
fig, ax = plt.subplots(len(paramnames2), 1, figsize=(20, len(paramnames2)*3));
if chains2.__class__ == list:
cr.plot_traces(chains2[:min(8,len(chains2))], ax, labels=paramnames2);
else:
cr.plot_traces(chains2[:min(8,chains2.shape[0]),:,:], ax, labels=paramnames2);
R = cr.GelmanRubinR(chains2)
print("R =", R)
assert np.all(R < 1.1) # see previous comment
R = [1.01007049 1.00990772 1.03011342 1.03870945 1.03788039 1.02453344 1.03833828 1.03903007 1.02631541]
maxlag = 750 # might need to change this
neff = cr.effective_samples(chains2, maxlag=maxlag, throw=True)
print("neff =", neff)
assert np.all(neff > 150) # see previous comment, but even more so
neff = [911.56116356 598.72063906 265.32221939 203.2590344 198.55110551 397.47457603 199.03412226 206.23140254 317.81870231]
Assuming these tests were passed, we'll combine the chains and look at a triangle.
chain2 = np.concatenate(chains2, axis=0)
plotGTC(chain2, paramNames=paramnames2, figureSize=12, customLabelFont={'size':12}, customTickFont={'size':12});
Finally, let's compare a model representative of the peak of the posterior (hopefully the median will do) to the data in the form of a profile, as well as to the single-beta model.
fitvec2 = np.median(chain2, axis=0)
fit2 = {p:v for p,v in zip(paramnames2, fitvec2)}
fit2_profile = Profile(data, fit2['x0'], fit2['y0'], 600.); # hopefully 600 is large enough for a maximum radius?
fig, ax = plt.subplots(1, 2, figsize=(12, 4));
ax[0].loglog(fit2_profile.r, fit2_profile.N/fit2_profile.E, '.');
ax[0].plot(fit2_profile.r, betaModel(fit2_profile.r, fit2['S0'], fit2['rc'], fit2['beta']) + betaModel(fit2_profile.r, fit2['S02'], fit2['rc2'], fit2['beta2']) + fit2['B']/fit2_profile.E*fit2_profile.npix, '-', label='Total');
ax[0].plot(fit2_profile.r, betaModel(fit2_profile.r, fit2['S0'], fit2['rc'], fit2['beta']), '--', label='beta1');
ax[0].plot(fit2_profile.r, betaModel(fit2_profile.r, fit2['S02'], fit2['rc2'], fit2['beta2']), '--', label='beta2');
ax[0].plot(fit2_profile.r, fit2['B']/fit2_profile.E*fit2_profile.npix, '--', label='background');
ax[0].set_ylabel('Average surface brightness');
ax[0].set_xlabel('Radius (pixels)');
ax[0].legend(); ax[0].set_title('Double-beta model');
ax[0].set_ylim(np.min(fit2_profile.N/fit2_profile.E)*0.1, None);
ax[1].loglog(fit_profile.r, fit_profile.N/fit_profile.E, '.');
ax[1].plot(fit_profile.r, betaModel(fit_profile.r, fit['S0'], fit['rc'], fit['beta']) + fit['B']/fit_profile.E*fit_profile.npix, '-', label='Total');
ax[1].plot(fit_profile.r, betaModel(fit_profile.r, fit['S0'], fit['rc'], fit['beta']), '--', label='beta1');
ax[1].plot(fit_profile.r, fit['B']/fit_profile.E*fit_profile.npix, '--', label='background');
ax[1].set_ylabel('Average surface brightness');
ax[1].set_xlabel('Radius (pixels)');
ax[1].legend(); ax[1].set_title('Single-beta model');
ax[1].set_ylim(np.min(fit_profile.N/fit_profile.E)*0.1, None);
Would you say the double-beta model is an improvement over the single-beta?
I_have_contemplated_this_also = False # change to True when true
assert I_have_contemplated_this_also
Quick and dirty model comparison¶
We'll finish by doing a quick comparison of the two models with the DIC, which you've computed in a previous tutorial. You will need a function to compute the log-likelihood, if you somehow managed to get this far without one (remember, the deviance is in terms of the log-likelihood, not the log-posterior). You can thin each chain to a length of 1000 for this part, to make things go faster.
# DIC1 = ...
# DIC2 = ...
print("DIC for single-beta:", DIC1)
print("DIC for double-beta:", DIC2)
DIC for single-beta: -379347.2511380915 DIC for double-beta: -381236.94055270386
Does the difference between DIC1
and DIC2
match your intuition above?
Parting thoughts¶
As advertised, this notebook hasn't really tested any statistical concepts that you haven't used before, but instead practiced them in a more complete (and more specifically astrophysical) context than we have mostly done up to now. If you're wondering, the analysis you did above is a slight improvement over what you'd find in most of the literature (where typically the cluster center is asserted and fixed, and the fit done to a 1D profile about it), while simultaneously not doing justice to the modeling of the X-ray background.
Because we're curious how they'll compare among the various data sets used by the class, below are the posterior means of the two models again, and the difference in DIC between them. Since we don't have any particular astrophysical question in mind, we will forgo calculating credible intervals and so on, this one time. You're welcome.
print(safe_dump({p:float(fit[p]) for p in fit.keys()}))
B: 0.05760377559837962 S0: 1.0913064031979797e-05 beta: 0.5523713419075944 rc: 4.297661488980443 x0: 481.2308121007575 y0: 552.9641413615096
print(safe_dump({p:float(fit2[p]) for p in fit2.keys()}))
B: 0.08224146654627448 S0: 1.060412887584387e-05 S02: 1.087409233494593e-06 beta: 1.051480053347683 beta2: 0.6754178792032468 rc: 6.3288677042804835 rc2: 17.851522070789038 x0: 481.24341164621484 y0: 553.016665909281
print("DIC1-DIC2:", DIC1-DIC2)
DIC1-DIC2: 1889.6894146123668 Total wall time for solutions: 49.0 min 4.447072744369507 sec!!!
Endnotes¶
Note 1¶
Using $\Delta$RA and $\Delta$Dec (or any global angular coordinates) in the Pythagorean Theorem is generally not correct, however, even for small distances. This is a whole thing that we will not delve further into, because geometry. Meditate upon the appearance of Antarctica in a Mercador projection if you doubt us.
Note 2¶
You know, like in a real programming language. Also, get off my lawn. Ok, I'm done.