Tutorial: X-ray Image Data

In which we introduce a real astronomical data set, namely an image produced from X-ray CCD data. There's a fair bit of domain-specific information here which is not strictly necessary for this class, but it's useful stuff to see if you haven't worked with imaging data before (regardless of wavelength). Do note that we are still glossing over some things, though, since a completely rigorous analysis that accounts for all instrumental and systematic effects would be more involved than we want any tutorials in this class to be. In any case, this notebook is just about looking at the data and describing what's there.

Note: This tutorial counts for nothing and does not need to be turned in. It simply serves as an introduction to the data set used in a later tutorial.

Nature of the data

Modern X-ray CCDs are technologically similar to the CCDs used in optical astronomy: when a photon hits a pixel, one or more electrons are promoted into the conduction band and trapped there until being read out. The main practical difference is that X-ray photons are rarer and their energies much higher.

This means that:

Let's look at some processed data from XMM-Newton for galaxy cluster Abell 1835.

Here the raw "event list" of pixel activations has been processed to form an image, meaning that, other than a broad selection on photon energy, the spectral information has been discarded.

XMM actually has 3 CCD cameras, but we'll just work with 1 for simplicity, and with just one of the available observations.

We'll still need 2 files:

These files are in the course repository,$^1$ respectively:

These are an image produced from 1-3 keV events captured by the MOS2 camera in XMM's first observation of A1835, way back in 2001, and the corresponding exposure map.

Both files are in FITS image format, which we can read in using astropy.io.fits (here aliased to its older name, pyfits).

Let's see what we've got:

imfits is a FITS object, containing multiple data structures. The image itself is an array of integer type (remember, counts!), and size 648x648 pixels, stored in the primary "header data unit" or HDU, and accessed via the data method of the FITS object. The other HDUs hold tables containing the "good time intervals" defined in earlier data processing, which we can ignore for our purposes.

exfits contains only the exposure map, with floating point type.

Here we extract the image (that is, the array) data from each object as numpy arrays.

Note: If we wanted im to be floating point for some reason, we would need to cast it, as in im = imfits[0].data.astype('float64').

Let's have a look the image and exposure map. It's often helpful to stretch images on a logarithmic scale because some sources can differ in brightness by orders of magnitude. The exposure map varies much less, so a linear scale works better in that case.

Aside: An endless source of confusion and bugs is the fact that images are conventionally indexed by line and sample (vertical and horizontal position), while we conventionally order coordinates in 2 dimensions $x,y$ (horizontal and vertical). In addition, the usual convention for FITS images orders the lines from the bottom of the image up rather than the top down. Hence, to display the image the right way up, we need to use the origin='lower' option to imshow above.

Note that information from 7 different CCDs in the MOS2 camera have been combined here, and that X and Y in the image arrays correspond to celestial coordinates (right ascension and declination) rather than X and Y on a given detector or in the focal plane.

In the image, we can see:

  1. Galaxy cluster Abell 1835 (the big blob in the center).
  2. Various other sources (smaller blobs). These are point-like sources - mostly active galactic nuclei (AGN) - that have been smeared out by the telescope's point spread function (PSF).
  3. A roughly uniform background, consisting of unresolved AGN, diffuse X-rays from the Galactic halo and local hot bubble, and events due to particles (solar wind protons and cosmic rays) interacting with the CCD.

The exposure map shows:

  1. An overall gradient with radius - this is the vignetting function of the telescope.
  2. Clear boundaries between the 7 CCDs that make up the MOS2 camera, and a number of "bad rows/columns" where the exposure has been set to zero (usually this is due to issues with the readout electronics rather than damaged pixels).
  3. A vaguely circular cut-out shape along the edge. This is applied in preprocessing to eliminate pixels where the effective exposure is essentially zero. All of the CCDs are, in fact, square. In telescopes of this kind, it's common for dome of the "corner" regions of the field of view to be exposed to calibration sources which we wouldn't want included in our science image. Corners that lack such sources are also sometimes used to get a measurement of the portion of the background that is not focussed by the optics (particle-induced events).

Features (2) and (3) are also visible in the counts image. The vignetting is less obvious, since we mostly see background in regions where the vignetting is significant, and part of the background is due to particles that are not vignetted (they aren't focused by the optics, and hit the detector from all directions approximately equally).

Working with the data

As we've seen above, the image is encoded in a standard numpy array. But, recalling the aside, we need to be careful about indexing the array. Specifically, we need to remember that the first array index corresponds to $y$ (the vertical direction in the image), and the second index corresponds to $x$ (horizontal). Alternatively, we could create arrays that hold the $x$ and $y$ indices corresponding to each pixel. (Note that we will be using the word "pixel" to refer to an entry in the image/exposure map arrays, not a physical pixel in the camera.) This is a little wasteful in terms of computer memory, but allows us to do calculations involving the positions of and distances between points in the image without constantly relying on our human memories to recall something counterintuitive.

Therefore, below is a simple class that should assist in defining these index arrays (imx and imy) and displaying the data. It also provides a way to cut out a subset of the image, carrying along the exposure map and index arrays consistently; this isn't strictly necessary for us, but we might as well remove the empty space around the imaged field of view.

Let's create such an object holding the original data:

To quickly illustrate what imx and imy are for, in case it wasn't clear, let's look at them:

The range of values in each is what you'd expect for indices into arrays of this size:

Next, let's make a cut-out around the very center of the image, and look at it.

Note that imx and imy in stamp hold the Image coordinates of each pixel with respect to the original image, which seems like a potentially useful thing.

Something we will want to do when modeling these data is to compute the distance of each pixel from some specified position, in units of pixels. As a quick test, complete the following to compute an array holding the distance of each pixel in the stamp image from $x=330, y=340$ (roughly the center of the cluster). The entire field of view is a fraction of a degree in size, so we can use Euclidean distance instead of worrying about the curvature of the sky for this purpose.

Checkpoint: The minimum should, of course, be dist=0, and should clearly correspond to the origin we chose above.

So now we can easily compute the distance between all pixels and some reference, in units of the pixel size. We mentioned above that the field of view is small enough to use the Euclidean distance formula, which prompts the question - just what is the size of a pixel in this image?

To find out, we can consult the FITS header of the image. The relevant keywords are near the bottom of the header, and begin CTYPE, CRPIX, CVAL, etc. Note that the exact keywords can vary, since FITS files can have multiple coordinate systems defined. In this case, the header tells us that axis 1 (X) is Right Ascension, axis 2 (Y) is DEClination, and both have a pixel length of 0.0011111... in units of degrees, so 4 arcseconds. (CDELT1 is negative because RA increases to the left by convention. This is another convention that takes some getting used to - even experienced astronomers occasionally get confused by the fact that east is left on the sky - but unlike some conventions, this one actually makes sense if you think about it for long enough.)

If we need to, header values can be extracted easily. This extracts the pixel size in the X direction, and converts it to arcseconds (CUNIT1 tells us that it is originally in units of degrees).

Finishing up

That's it! You can now move on to the real tutorial that uses these data.

Endnotes

Note 1

The data are in the public domain, and can be downloaded from https://heasarc.gsfc.nasa.gov. However, the images and exposure maps are not raw data, and the files in the archive occasionally change slightly as newer, presumably better calibration products are used to regenerate them. To be able to compare quantiatively with our solutions, you should use the files in the course repository (though the differences from the archive versions are not large).