Lab 1.2: Uncertainty Quantification

Lab 1.2: Uncertainty Quantification#

Due: Wednesday, April 30 by 5:00 pm.

In this lab, we will be adding error bars to the color-magnitude diagrams we made in Lab 1.1. By the end of this lab, you will be able to

estimate the gain of the camera and compare to the manufacturer’s value
estimate the error from readout, dark current, and shot noise
propagate into a final uncertainty on magnitudes and color

By this point, you have successfully created calibrated color-magnitude diagrams for both M35 and M3 on each telescope. Amazing! However, in scientific research, it’s not enough to simply present our results — we must also understand the uncertainties that come with them. Every flux measurement, every calibration step, carries with it some level of error. It’s important to quantify these uncertainties because it tells us how much we trust what we see, and where we might need to be cautious.

This is the last part of Lab 1, so we will be leaving most of the structure of the code up to you as preparation for your independent projects. You will need to submit code which calculates all the relevant uncertainties and the same color-magnitude diagrams as Lab 1.1, but now with error bars on both axes!

Poisson Noise#

As the photons from distant sources hit our camera, they kick out electrons that we count in each pixel and create an image. In this document we will assume that one photon produces one electron and use the terms interchangeably unless otherwise specified. The number of electrons has a Poisson distribution, and the uncertainty in this number is called “shot noise”.

If we detect $N$ electrons, the uncertainty on this value due to the shot noise is $\sqrt{N}$. In other words, the signal-to-noise ratio is $S/N=N/\sqrt{N}=\sqrt{N}$. For example, if we detect 10000 electrons from a star, the shot noise uncertainty will be 100 electrons.

But there’s a catch — while the distribution of electrons is Poissonian, the distribution of counts in our image is not. This is because the number of counts output by our detector is not one-to-one with the number of electrons detected.

Gain#

To transform image counts into electrons into the other, we need to determine the gain of the camera. It is usually provided by the camera manufacturer, but we are also going to measure it.

The gain is defined as the ratio of the number of electrons to counts in a pixel:

\[\mathrm{gain} = \frac{n_{e^-}}{n_{\mathrm{counts}}}.\]

This essentially means that 1 count is outputted for every $G$ photons that hit a given pixel in our detector. You might be wondering why this is necessary or even useful. One of the reasons is the existence of the readout noise. Even in an image where all the pixels contain the same number of electrons, we still get fluctuations in the output from the readout. Introducing the gain smooths out these fluctuations in the output image.

We can measure the gain by measuring the differences between the expected and measured standard deviations in our images. For example, suppose 10,000 photons from a star hit our detector. From the properties of the Poisson distribution, we know that the uncertainty (standard deviation) on this number is $\sqrt{10000} = 100$. If the gain of our camera is exactly 1 electron per count, we get:

\[\mu_{\mathrm{counts}} = \frac{\mu_{e^-}}{\mathrm{gain}} = \frac{10000}{1} = 10000\]

\[\sigma_{\mathrm{counts}} = \frac{\sigma_{e^-}}{\mathrm{gain}} = \frac{100}{1} = 100\]

We see that Poisson statistics are obeyed, i.e., $\sigma_{\mathrm{counts}} = \sqrt{\mu_{\mathrm{counts}}}$.

Suppose, now, that the gain is 4 $e^-$/count instead. Then, we get:

\[\mu_{\mathrm{counts}} = \frac{\mu_{e^-}}{\mathrm{gain}} = \frac{10000}{4} = 2500\]

\[\sigma_{\mathrm{counts}} = \frac{\sigma_{e^-}}{\mathrm{gain}} = \frac{100}{4} = 25\]

Clearly, in this case, $\sigma_{\mathrm{counts}} \neq \sqrt{\mu_{\mathrm{counts}}}$.

This then gives us a tool to measure the gain:

\[\mathrm{gain} = \Big ( \frac{\sqrt{\mu_{counts}}}{\sigma_{\mathrm{counts}}} \Big )^2 = \frac{\mu_{\text{counts}}}{\sigma_{\mathrm{counts}}^2}.\]

##Measuring the gain – a recipe

Flat fields offer us a better way to measure the gain because they are (theoretically) uniformly illuminated, i.e., every pixel is receiving the same number of photons, making it easier to assess how $\mu_{\rm{counts}}$ and $\sigma_{\rm{counts}}$ scale.

In the first part of this notebook, we will take you through how to estimate the gain from the flats you’ve already taken.

# If using Colab, mount your Google Drive to access data in the shared folder
from google.colab import drive
drive.mount('/content/drive')

# Import packages!

# numpy is a core package for numerical computing in Python, mostly it does fast array operations
import numpy as np

# matplotlib is a common plotting library (plotly is another good one)
import matplotlib.pyplot as plt
%matplotlib inline

# astropy is useful for reading FITS files, doing coordinate transformations, converting units, and much more
import astropy
import astropy.stats
from astropy.io import fits

# glob and os are useful for navigating your file system
import glob
import os

# miscellaneous
from tqdm.notebook import tqdm
import warnings
from collections import defaultdict

# Specify data directory, and which object we want from which date
data_dir = '/content/drive/MyDrive/PHYSICS100_S2025/data/'
date_string = # TODO: pick a night of observing
telescope_name = # TODO: pick a telescope

# Gather all the filenames for the object and dates
filenames = {k: glob.glob(os.path.join(data_dir, 'calibration', telescope_name, date_string, f'{k}*.fit')) for k in ['flat', 'dark', 'bias']}

# TODO: load the calibration data from the night and telescope chosen above
# Hint: see code from Lab 1.0 and 1.1

# TODO: Compute average bias

# TODO: Compute average darks

# TODO: Pick the set of flats from a single filter (B or V), and for each exposure time:
#
#       1) Create corrected flats by subtracting the average bias and dark from the flat. (Do not normalize!)
#          Store these corrected flats in, e.g., a dictionary with exposure times as the keys.
#
#       2) In neighboring pairs of flats (of the same exposure length), e.g., F_A and F_B:
#             a) Calculate the ratio of the mean signal levels r = F_A/F_B.
#             b) Multiply flat B by this ratio r. This brings flat B to the same average signal level as flat A without affecting its
#                noise structure or the pixel-to-pixel sensitivity variation.
#             c) Subtract the (scaled) flat B from flat A. This ~removes the pixel sensitivity variations.
#             d) Store these pair-differenced flats in, e.g., a dictionary with exposure times as the keys.

all_flats = {}
diff_flats = {}

for time in data['flat']['B'].keys():

  # NOTE: these objects are dictionaries with time as the key!
  all_flats[time] = []
  diff_flats[time] = []

  for flat in data['flat']['B'][time]:

    corrected_flat = # TODO: compute corrected flat with dark and bias
    all_flats[time].append(corrected_flat)

  for i in range(len(all_flats[time])):
    # TODO: append i pair-differenced flats to diff_flats[time]
    # Hint: make sure you are scaling the flats relative to each other

# TODO: Plot one of your pair-differenced flats. The average count value of this image should be ~0. Is this the case?

plt.figure(figsize=(10,6))
plt.imshow(diff_flats[0.3][0], cmap='Grays')
plt.colorbar()
plt.show()

# TODO: In the above image, identify a ~uniform (feature-free) 50x50 pixel region.
#       Then, for every flat in every exposure time, calculate the mean count value in this region.
#       Store these values in a single array.

mean_counts = []

for time in all_flats.keys():
    for flat in all_flats[time]:

      block = flat[:, :] # TODO: pick x and y pixel values that correspond to a 50x50 uniform box
      mean_counts.append() # TODO: append the mean of the chosen block to mean_counts

mean_counts = np.array(mean_counts)

# TODO: For every pair-differenced flat in every exposure time, calculate the variance in the count value in the same region.
#       Divide the resulting variance by 2 to correct for the fact that the variance doubles when you subtract two flats from another.
#       Store these values in a single array.

var_counts = []

for time in diff_flats.keys():
  for flat in diff_flats[time]:

    block = flat[:, :] # TODO: pick the same block as above

    var_counts.append() # TODO: append the variance of the chosen block to var_counts

var_counts = np.array(var_counts)

plt.scatter(, ) # TODO: Plot mean counts as a function of variance. You should see a roughly linear relationship between these two values.

plt.xlabel(r'$\sigma_{\rm{counts}}^2$', fontsize=16, labelpad=10)
plt.ylabel(r'$\mu_{\rm{counts}}$', fontsize=16, labelpad=10)

plt.show()

# TODO: Estimate the gain by using the fact that gain = mean(counts) / variance(counts).

Question: Does the gain you found above match the gain from the header? (Hint: if you are off by a factor of 2, where could that factor have come from? If this is the case, be sure to fix it before submitting!)

Answer:

Sources of Uncertainty#

Broadly speaking, there are four significant sources of noise in our data. They are as follows:

1. Readout noise#

Turning the electrons in a CCD into an image on our computer screen is a relatively complicated process involving transporting the electrons within the CCD, putting them into a capacitor and measuring the voltage, and so on. On average, this voltage measurement correct, but random scatter does occur — this is the readout noise. To measure this effect, we need an image that in principle should have the same pixel value everywhere. Can you think of what type of image exhibits this behavior?

That’s right — the bias! Recall that the bias frame is a “zero-second” exposure dark, meaning all the counts that we measure should come from the charge that is injected uniformly to all pixels (the minimum “bias” signal that brings our CCDs into the linear response region). Any fluctuations that we see are due to the readout noise.

We will call this readout noise $\sigma_{\text{readout}}$. Note that there will be read noise for every individual frame that goes into your coadded image, so think carefully how to estimate the total error. (Hint: the variance of a sum of Gaussians is the sum of the individual variances.)

2. Dark current#

Since our cameras are not cooled to absolute zero, they produce (thermal) electrons even when not exposed to light. We can measure this dark current in our master dark image. We will call this thermal noise $\sigma_{\text{dark}}$. You may neglect this error if you can show it is negligible compared to other sources of noise.

3. Background#

Recall from Lab 1.1 that there is background flux that we subtract from our star flux. The background flux also has noise that we must estimate. One approach is to assume Poisson statistics, but due to the many possible sources of background this may not be a good assumption.

Instead, we will estimate the background noise using the variance of pixels in the background annulus of each star. This will give us a per-pixel background which we can then scale according to the area of the aperture to compute $\sigma_\text{sky}$.

4. Source#

Finally, there is intrinsic Poisson noise in the number of photons coming from our source. We will call this $\sigma_\text{source}$ and estimate this from the counts within each star’s aperture. Each star will have a substantially different value here. Don’t forget to account for the gain before assuming Poisson statistics!

Combined uncertainty#

Since we assume the above effects to be independent from each other, to get a total uncertainty, we can simply sum their variances. Taking a square root of the result gives us the total uncertainty:

\[\sigma_{\text{tot}} = \sqrt{N_{pixels,source}N_{images}\sigma_{\text{readout}}^2 + N_{pixels, source}N_{images}\sigma_{\text{dark}}^2 + \sigma_{\text{source}}^2 + N_{pixels, source}\sigma_{\text{sky}}^2}.\]

Remember: all these different noise components need to be determined from the number of electrons, not the image counts. The simplest approach would be to correct your fluxes according to the gain at the very start, to avoid mistakes. And remember – each star will have a different $\sigma_\text{tot}$!

Propagating to magnitudes#

In the end, we need to place error bars on not the fluxes ($f_*$) but the magnitudes ($m$) of our stars. Remember, flux is related to instrumental magnitude via

\[m_{\mathrm{instr.}} = -2.5 \log_{10}(f_*).\]

In other words, the instrumental magnitude you measure for a star is a function of an uncertain variable. You can propagate your error on $f_*$ to an error on $m_{\mathrm{instr.}}$ using the following.

For a function $g$ of an uncertain variable $x \pm \sigma_x$, the uncertainty on $g$ (i.e., $\sigma_g$) is given by: $$\sigma_g = \sqrt{ \Big( \frac{\partial g}{\partial x} \Big )^2 \sigma_x^2}.$$

# TODO: Put error bars on each point on your color-magnitude diagrams, there should be errors in color and magnitude
!
#       (Hint: Reuse code from Lab 1.1 to make your lives much easier.)

We have given the general idea of how to measure and save uncertainties below. Note that this code block assumes your calibrated, coadded images live in a dictionary called data with keys corresponding to filters.

# Find the flux (in counts/second) within an aperture around each star
source_data = {}

r1 = # TODO: pick an inner radius
r2 = # TODO: pick an outer radius

for filter in ['B', 'V']:

    #TODO : measure source counts

    # this code maps inner circle and outer annuli to actual pixel location in the image
    ny, nx = data[filter].shape
    Y, X = np.ogrid[:ny, :nx]
    var_bkg = []

    for i in tqdm(range(len(sources))):

      dR = np.sqrt((X - sources['x'][i])**2 + (Y - sources['y'][i])**2)
      annulus_mask = (dR >= r1) & (dR < r2) & ~np.isnan(data[filter])
      annulus_pixels = data[filter][annulus_mask]
      var_bkg.append() # TODO: append measured background variance to var_bkg

    # TODO: perform flux and magnitude caluclations
    # TODO: perform uncertainty calculations and add to source_data dictionary

    source_data[filter]['flux'] =
    source_data[filter]['mag'] =

    source_data[filter]['flux_err'] =
    source_data[filter]['mag_err'] =

# TODO: Plot your color magnitude diagram with error bars!
# Note that magnitudes should be corrected for the zero point offset.
# Magnitude correction can be done before or after calculating uncertainties
# since we assume this adds no error to our magnitudes

Question: Do you notice a pattern in the size of the error bars? Why could this be?