In this notebook, we will work through an application of the change of variables formula for probability densities:
$p(y) = p(x)\left|\frac{dx}{dy}\right|$.
Let $\theta$ be uniformly distributed on $-0.95\frac{\pi}{2} < \theta < 0.95\frac{\pi}{2}$, and consider the function $b(\theta)=\tan(\theta)$.
TutorialName = 'probability_transformations'
exec(open('tbc.py').read()) # define TBC and TBC_above
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
%matplotlib inline
Given the PDF of $\theta$ and the function $b(\theta)$, what is the PDF of $b$, $p(b)$? (As simple as it is, you might want to explicitly write down $p(\theta)$ first.)
TBC()
Next, define $p(b)$ as a function.
# p(b) according to the change of variables formula
def p_b(b):
return TBC()
TBC_above()
We can also do this transformation via numerical calculation done on a grid. This defines a grid of $\theta$ values on the interval where $p(\theta)$ is non-zero for us to use.
theta_grid = np.arange(-0.95, 0.96, 0.05)*0.5*np.pi
Evaluate $p(\theta)$ at these points.
TBC() # tgrid_p_theta =
If that was done right, a tabular integration of the grid evaluations should return 1.0, or something very close.
# check normalization
np.trapz(y=tgrid_p_theta, x=theta_grid)
Next, transform the gridded evaluations of $p(\theta)$ to $p(b)$ by applying the same transformation of variables as before.
# transform to p(b)
TBC() # tgrid_p_b =
Let's plot the function and grid evaluatations for sanity's sake.
b_grid = np.arange(-13.0, 13.01, 0.01)
plt.rcParams['figure.figsize'] = (14.0, 5.0)
plt.plot(np.tan(theta_grid), tgrid_p_b, 'bo', label='grid');
plt.plot(b_grid, p_b(b_grid), 'r-', label='analytic');
plt.xlabel('b');
plt.ylabel('p(b)');
plt.legend();
We can check that the gridded values are still normalized (accounting for the fact that the grid is not evenly spaced in $b$). Note that the Riemann sum here could well be off by a few percent, as the transformed grid spacing does not provide good coverage of the tails of the function.
# check normalization
np.trapz(y=tgrid_p_b, x=np.tan(theta_grid))
As even more of a sanity check, you can compare your solution to one saved below. The difference should be basically zero everywhere.
sol = np.loadtxt('solutions/trans.dat')
plt.rcParams['figure.figsize'] = (14.0, 5.0)
plt.plot(np.tan(theta_grid), tgrid_p_b-sol, 'bo');
plt.xlabel('b');
plt.ylabel('my p(b) grid - known solution');
This transformation business is kind of a pain, what with the calculus and possibly ending up with non-uniform grid points. It turns out that life is much more straightforward when dealing with samples of a PDF rather than manipulation the PDF itself.
To demonstrate, generate a large number (say, $10^5$) of samples from $p(\theta)$, and straightforwardly transform them to samples of $b$ using the definition of $b(\theta)$.
TBC()
# theta_samples = np.random.uniform( ...
# b_samples =
We can now compute an estimate of the PDF for $b$ based on these samples, say a histogram. In the limit of many samples, they should agree very well - you can play around with changing the number of samples and histogram bins to see how that changes things.
plt.rcParams['figure.figsize'] = (14.0, 5.0)
plt.plot(np.tan(theta_grid), tgrid_p_b, 'bo', label='grid');
plt.plot(b_grid, p_b(b_grid), 'r-', label='analytic');
plt.hist(b_samples, bins=100, density=True, histtype='step', color='k', linewidth=2, label='samples');
plt.xlabel('b');
plt.ylabel('p(b)');
plt.legend();