Notes: Getting Started¶
We will be using Jupyter Notebooks with IPython extensively in this course. The simplest way to work with these is on Google Colab. If you prefer to work offline or on another system, I strongly recommend a separate python installation (or at least conda environment) for this course in order to avoid package conflicts (instructions below). All the class notebooks are tested on both Colab and on a clean conda installation before the term begins. If you ignore the above advice and end up with computing issues, you will find little sympathy.
The tutorial notebooks can be found here. Every enrolled student will be given access to their own data folder on Google Drive, containing data files that are unique to them. Others can use the "public" data folder. What to do with these files depends on whether you are working in Colab or not (see below).
If you are not comfortable with programming in general¶
... then you will not get much out of this course, given that working with data in inevitably requires coding. We recommend getting some practical experience with programming first, rather than trying to learn concurrently.
If you are comfortable programming but are new to Python¶
... then you have a learning curve ahead of you, even though there will be a lot of example code to learn from. We recommend checking out these tutorials for Python and NumPy (especially the Quickstart and "absolute basics for beginners"). Everyone can no doubt also appreciate the documentation for the Python standard library, NumPy and SciPy.
Setting up Colab¶
Colab is a cloud-based Jupyter-like environment for working with notebooks like those used in this class. Within Colab, you can directly access your data folder on Drive using code already present in the notebooks. This code assumes that your folder is accessible in MyDrive/Physics267_data
, which you can arrange by using the "Organize: Add shortcut" functionality of Drive to create a shorcut to the data folder from your root Drive folder, and renaming that shortcut "Physics267_data".
For those who are familiar with Jupyter but not Colab, a couple of things to know:
- You have no permanant conda/pip environment, but each notebook instance has a semi-permanent state. After a period of inactivity, they will be deactivated, at which point re-opening them is like restarting the kernel in Jupyter.
- Many standard packages are preinstalled, and others can be installed within notebooks via
!pip install <package>
. This code will already be present when needed in the notebooks, although you will need to uncomment it. - Whether Colab saves notebooks with or without outputs is a per-notebook setting that you can change.
The shared nature of a cloud service and the need to reinstall packages after an instance has been restarted can be irritants, but Colab is a good option if you are not comfortable or able to manage packages on your own system. As of mid-August 2024, all the class notebooks were functional in Colab, althoug hthis could change in the future as they update various packages.
Setting up another system¶
Python¶
Assuming you will be using a system that you have administrative access to, the first thing you will need is Python (versions >= 3). Note that the Python included as part of MacOS doesn't count - it is highly restricted due to being part of the OS, so you will want to install a separate copy if you are using a Mac.
The fastest and recommended way to get up and running is with Miniconda. This provides a user-specific installation, so there is no possibility of conflicts or permissions issues with the system you are working on. If you're tempted to go for full-blown Anaconda instead... please don't. Similarly, if your plan is to use an installation you already have (without creating a fresh environment), please don't. Every single package conflict or related issue we have encountered has been with a student who decided that they, uniquely, would be fine without a clean Miniconda installation. They were wrong.
Jupyter and additional Python packages¶
You'll need a number of python packages in this course. The shopping list is given in the requirements.txt file, which to the best of our knowledge reflects which packages are available through conda
(which should be used preferentially) as opposed to pip
(exception for Jupyter, which as of 2023 should be installed through pip
to avoid known issues). You will need to add conda-forge
to the default list of channels to get everything. Probably the simplest and most robust method is to use this environment file, which enumerates specific versions of all packages, and is functional for all notebooks in this class as of mid-August 2024. After downloading the file,
conda create -n p267 -c conda-forge --file environment.txt
If you would rather do things manually, we still strongly advise you to install the necessary packages (and only these) in a separate, named environment, as in
conda create -n p267 -c conda-forge astropy dynesty emcee matplotlib numba numpy pandas pygtc regions scipy statsmodels
followed by
conda activate p267
pip install bcgs incredible jupyterlab lmc lrgs
However, we can't guarantee that future package updates won't break anything.
Getting data¶
If you are not in Colab, the notebooks assume that the data for a given notebook are in a directory called data
at the same location as the notebook itself. You will have to download each one from Google Drive and arrange this yourself. Meaning, the Drive folder at
Physics267_data/assignment
would be present locally and named
tutorials/assignment/data
if the notebook itself is tutorials/assignment/assignment.ipynb
.
Working with the notebooks¶
The Demo tutorial notebook walks through the typical structure of a tutorial, and how they are completed.