Containing:
Dr. Adam Mantz is a Research Scientist in the Kavli Institute for Particle Astrophysics and Cosmology (KIPAC). He has a background in observational astrophysics and cosmology, specializing in studies of clusters of galaxies and the intracluster medium. For much of the past 2 years, he was on "sabbatical", working part time on medical imaging research in the Division of Oncology of the Stanford School of Medicine. He has been involved in the development and teaching of this course since its inception, and has adapted parts of it for the LSSTC Data Science Fellowship Program. Any time the first-person "I" appears in these notes, you may blame him. Any time the first-person "we" appears, it's probably still him being presumptuous.$^1$
Our goal is that students taking this course will:
Understand and apply the principles underlying statistical inference from data, and the role of probabilistic modeling in inference
Gain familiarity with various numerical algorithms used in statistical inference
Apply the above principles and skills to realistic astrophysical inference problems
In a nutshell, this class is about how data are turned into conclusions. The examples and tutorials are taken from astrophysics, but otherwise the content is extremely general.
To be a little more concrete, consider this cartoon of the scientific process:
"Turning data into conclusions" broadly refers to the bold items. In particular, we will not spend time on "data reduction", by which we mean the initial processing steps applied to raw data recorded by some instrument. These details tend to be highly specific to a given detector (or detector type), wavelength range, telescope system, and so on. On the other hand, we will be introducing and working with real astronomical data, so you'll get some experience with it if this is new to you. Even so, there's far too much variety for us to touch on every type of data you might encounter!
This course will also be focused on a particular approach to drawing conclusions from data, known as Bayesian analysis. This is because, as we'll see, Bayes' Law provides a rigorous basis for formalizing what the phrase "turning data into conclusions" (more properly called inference) actually means. The upshot is that we will not be marching through a survey of ad hoc methods and the problems they might be applied to. Instead, the goal will be to carefully define the question we are asking of the data, and then decide what techniques and/or (justifiable) approximations will get us to an answer. This, we hope, will be appealing to students with a physics background.
Finally, this course aims to be practical. So, lest the previous paragraph give you the wrong idea, we will not have problem sets where you are asked to prove theorems or show that such-and-such is true. (No more than one, anyway.) With the exception of a bit of review at the start, the assigned problems will be tutorials that walk through a real-life analysis, albeit usually a simplified one, so that you gain experience putting the methods we will discuss into practice. The goal is that, by the end, you will be able to apply the reasoning and methods of Bayesian analysis in your own work.
Strictly speaking, any procedure applied to data could be called a "statistical method". However, not all such methods allow us to rigorously draw conclusions of interest - in full generality, all we can conclude from the application of some method is what the output of that method is. In contrast, our goal with inference is usually to learn something about a model from the data, not just the number(s) that an algorithm turns the data into. This requires quantifying the probability of various possibilities in light of the data, and this in turn means applying a principled, mathematical framework - or, at least, starting from one, even if we end up making approximations. That isn't to say that there's no place for other methods, but we think it's important and generally useful to understand Bayesian reasoning, regardless. Not to mention, it's enough to fill a quarter on its own.
So, to be perfectly upfront, this class will not significantly delve into
Again, these methods have their uses (and can be interpreted in the Bayesian framework in the right limits), but will not be a major focus.
We use the Python programming language extensively, and it is not generally feasible to substitute a different language. Some previous experience with Python, or the capacity and desire to pick it up really, really fast, is therefore required. You do not need to have any particular experience beyond the basics, or with any specific packages like numpy
or scipy
, though it doesn't hurt.
Previous exposure to the concepts and calculus of probability would be very helpful, though we will review them briefly. In this context, undergraduate quantum mechanics or statistical mechanics will do, as would (naturally) an undergraduate statistics/probability course. Again, this is a case where the amount of effort it takes to get going at the beginning of the course will depend on how much of the learning curve you've already climbed.
Notes: What were once lecture materials for this class have been expanded and turned into (we hope) readable notes. If you're reading this, then you know where to find them already. There is no textbook associated with the class, so the notes (in combination with the tutorials) are intended to be comprehensive.
Tutorials refer to Python-language Jupyter notebooks where you will put the material covered in the notes into practice. Often these will consist of a partially worked inference problem, where the code required to illustrate an understanding of the subject at hand is left out, and most other "background noise" code (the kind that doesn't demonstrate an understanding of statistics, yet is annoying to write) is provided. Normally, a tutorial notebook will have several "checkpoints" where you can compare your results to a known solution to make sure that you're on the right track.
Things this course doesn't have:
Here is a very abbreviated summary of some of the critical ideas in this course. If they don't make sense now, don't worry; we'll be unpacking them much more later on.
i) All data we collect include some degree of randomness
This can be intrinsic to the source, a result of the measurement process, or effective randomness due to the involvement of some process we don't know or understand perfectly.
ii) Any conclusions we draw must therefore incorporate uncertainty
This means we should describe both the data and conclusions in the language of mathematical probability.
Our conclusion will take the form: the probability that something is true in light of (given) the data we collected,
$P(\mathrm{something}|\mathrm{data})$.
By the basic laws of probability, this can be written
$P(\mathrm{something}|\mathrm{data}) = \frac{P(\mathrm{data}|\mathrm{something}) \, P(\mathrm{something})}{P(\mathrm{data})}$.
We'll spend much more time understanding this later, but, importantly, it means that
iii) There is a correct answer
As in physics, the theory tells us the solution. The challenge is in evaluating it.
In other words, we aren't free to pick or dream up a procedure that kinda, sorta feels like it should be about right. The equation above is indisputably correct, but how to use it, and when/how to approximate it, is something to be learned.
Within this framework,
iv) Data are constants
Even though they are generated randomly by the Universe, data that we have already collected are fixed numbers.
Much of our job boils down to building a model that predicts, probabilistically, what data we might have measured instead.
v) Things we don't know with perfect precision can be mathematically described as "random"
That is, we use probabilities to model things that are uncertain, even if they are not "truly" random.
As the quarter unfolds, we will cover