So far in the notes and problems, we've mostly avoided one of the most commonly used probability distributions, the Gaussian or normal distribution:
$\mathrm{Normal}(x|\mu,\sigma) \equiv p_\mathrm{Normal}(x|\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma}}\exp \left[-\frac{(x-\mu)^2}{2\sigma^2}\right]$.$^1$
There are two reasons for this:
That said, it is certainly common to use Gaussian distributions in practice, particularly in cases where
Gaussians have lots of nice mathematical features that make them convenient to work with when we can. For example, see a list of identities for the multivariate Gaussian here and a whole lot of math here.
There are a couple of cases that it's useful to work through if you haven't before, to build intuition. This is also good practice to reinforce some of the probability manipulations and reasoning you've learned.
We'll look at
TutorialName = 'gaussians'
exec(open('tbc.py').read()) # define TBC and TBC_above
import numpy as np
import scipy.stats as st
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
%matplotlib inline
The product of Gaussians comes up, for example, when the sampling distributions for different data points are independent Gaussians, or when the sampling distribution and prior are both Gaussian (this is a conjugate pair).
So, consider the product $\mathrm{Normal}(x|\mu_1,\sigma_1) \cdot \mathrm{Normal}(x|\mu_2,\sigma_2)$.
Show that this can be manipulated into a different product of two Gaussians,
$\mathrm{Normal}(x|\mu_1,\sigma_1) \, \mathrm{Normal}(x|\mu_2,\sigma_2) = \mathrm{Normal}(x|\mu_a,\sigma_a) \, \mathrm{Normal}(0|\mu_b,\sigma_b)$,
where
If $x$ were a model parameter, and $\mu_i$ and $\sigma_i$ were independent measurements of $x$ with error bars, how do you interpret each term of this factorization?
TBC() # answer in markdown
Check your solution by plugging in some values for $x$, $\mu_i$ and $\sigma_i$. The cell below will then compare the log of both sides of the equation above. The exact value you get will depend on exactly what numbers you choose, but generally (checkpoint) it should be pretty small, say $<10^{-10}$ in absolute value.
TBC()
# pick some values (where m is mu, s sigma)
# x =
# m1 =
# s1 =
# m2 =
# s2 =
# compute things
# sa =
# ma =
# mb =
# sb =
print('This should be a pretty small number:',
st.norm.logpdf(x,m1,s1) + st.norm.logpdf(x,m2,s2) - ( st.norm.logpdf(x,ma,sa) + st.norm.logpdf(0,mb,sb) ) )
For later, note that a multidimensional version of this identity also holds. Here $\Sigma$ is a covariance matrix, so we are missing some "squares" compared with the expressions above in terms of standard deviations (lower-case $\sigma$), but otherwise things are very similar.
$\mathrm{Normal}(x|\mu_1,\Sigma_1) \, \mathrm{Normal}(x|\mu_2,\Sigma_2) = \mathrm{Normal}(x|\mu_a,\Sigma_a) \, \mathrm{Normal}(0|\mu_b,\Sigma_b)$,
where
When the sampling distribution is normal with a fixed variance, the conjugate prior for the mean is also normal. Using the identity from part 1, show this for the case of a single data point, $y$; that is,
$p(\mu|y,\sigma) \propto \mathrm{Normal}(y|\mu,\sigma)\,\mathrm{Normal}(\mu|m_0,s_0) \propto \mathrm{Normal}(\mu|m_1,s_1)$,
and find $m_1$ and $s_1$ in terms of $y$, $\sigma$, $m_0$ and $s_0$.
TBC() # answer in markdown
Again, check your work by choosing some fiducial values and using the cells below to look at the ratio
$\mathrm{Normal}(y|\mu,\sigma)\,\mathrm{Normal}(\mu|m_0,s_0) / \mathrm{Normal}(\mu|m_1,s_1)$
over a range of $\mu$. It should be constant.
TBC()
# pick some values
# y =
# sigma =
# m0 =
# s0 =
# compute things
# s1 =
# m1 =
# plot over a range that at least includes both y and m0
mugrid = np.linspace(min(y,m0), max(y,m0))
# we'll compare the log-probabilities, since that's a good habit to be in
diff = st.norm.logpdf(y, loc=mugrid, scale=sigma)+st.norm.logpdf(mugrid, loc=m0, scale=s0) - st.norm.logpdf(mugrid, loc=m1, scale=s1)
print('This should be approximately constant:')
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(mugrid, diff, 'b-');
plt.xlabel(r'$\mu$');
plt.ylabel('log-posterior difference');
Checkpoint: Remeber that in this case the plotted function should be close to constant, probably so close that matplotlib shows it as a perfectly flat line, but not necessarily small in absolute value.
Consider the distribution $\mathrm{Normal}\left[y\,\big|\,\mu_y(x;a,b),\sigma_y\right]$, where $\mu_y(x;a,b)=a+bx$.
Show that this can be rearranged into a Gaussian over $x$, i.e.
$\mathrm{Normal}\left[x|\mu_x(y;a,b),\sigma_x(y;a,b)\right]$,
up to an errant factor of $b$.
TBC() # answer in markdown
Classical WLS is a simple method for fitting a line to data that you've almost certainly seen before. Consider data consisting of $n$ triplets $(x_i,y_i,\sigma_i)$, where $x_i$ are assumed to be known perfectly and $\sigma_i$ is interpreted as a "measurement error" for $y_i$. WLS maximizes the likelihood function
$\mathcal{L}(a,b;x,y,\sigma) = \prod_{i=1}^n \mathrm{Normal}(y_i|a+bx_i,\sigma_i)$.
In fact, we can get away with being more general and allowing for the possibility that the different measurements are not independent, with their measurement errors jointly characterized by a known covariance matrix, $\Sigma$, rather than the individual $\sigma_i$:
$\mathcal{L}(a,b;x,y,\Sigma) = \mathrm{Normal}(y|X\beta,\Sigma) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp \left[-\frac{1}{2}(y-X\beta)^\mathrm{T}\Sigma^{-1}(y-X\beta)\right]$,
where $X$ is called the design matrix, with each row equal to $(1, x_i)$, and $\beta = \left(\begin{array}{c}a\\b\end{array}\right)$.
With a certain amount of algebra, it can be shown that $\mathcal{L}$ is proportional to a bivariate Gaussian over $\beta$,
$\mathcal{L} \propto \mathrm{Normal}(\beta | \mu_\beta, \Sigma_\beta)$,
with
$\Sigma_\beta = (X^\mathrm{T}\Sigma^{-1}X)^{-1}$;
$\mu_\beta = \Sigma_\beta X^\mathrm{T}\Sigma^{-1} y$.
We will not drag you through the derivation, but hopefully the resemblance of these expressions to the simpler ones above makes them plausible.
In classical WLS, $\mu_\beta$ is the "best fit" estimate of $a$ and $b$, and $\Sigma_\beta$ is the covariance of the standard errors on those parameters.
The relative simplicity of the computations above, not to mention the fact that they are efficiently implemented in numerous packages, can be useful even in situations beyond the assumption-heavy scenario where WLS is derived. As a simple example, consider a case where the sampling distribution corresponds to the likelihood function above, but we wish to use an informative prior on $a$ and $b$.
Taking advantage of the results you derived above (all of which have straightforward multivariate analogs),
TBC() # answers in markdown
Below, we will explicitly show the correspondance in (3) for a WLS fit of some mock data.
# generate some fake data
a = 0.0
b = 1.0
n = 10
x = st.norm.rvs(size=n)
sigma = st.uniform.rvs(1.0, 2.0, size=n)
y = st.norm.rvs(loc=a+b*x, scale=sigma, size=n)
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.errorbar(x, y, yerr=sigma, fmt='bo');
plt.xlabel('x');
plt.ylabel('y');
The next cell uses the statsmodels
package to perform the WLS calculations. You are encouraged to implement the matrix algebra above to verify the results. What we get at the end are $\mu_\beta$ and $\Sigma_\beta$, as defined above.
import statsmodels.api as sm
model = sm.WLS(y, sm.add_constant(x), weights=sigma**-2)
wls = model.fit()
mu_beta = np.matrix(wls.params).T # transpose casts this as a column vector
Sigma_beta = np.asmatrix(wls.normalized_cov_params)
Now, compute the parameters of the posterior for $\beta$ based on $\mu_\beta$ and $\Sigma_\beta$ (parameters that appear in the sampling distribution) and the parameters of the conjugate prior. Set the prior parameters to be equivalent to the uniform distribution for the check below (you can put in something different to see how it looks later).
Transform post_mean
to a shape (2,) numpy array for convenience (as opposed to, say, a 2x1 matrix).
TBC()
# define prior parameters
# do some calculations, possibly
# parameters of the posterior:
# post_cov = ...
# post_mean = ...
print('WLS mean and covariance:')
print(mu_beta)
print(Sigma_beta)
print('Posterior mean and covariance:')
print(post_mean)
print(post_cov)
Add your log-prior to the function definition below. We will then compare your analytic solution to a brute-force calculation of the posterior.
def log_post_brute(a, b):
like = np.sum( st.norm.logpdf(y, loc=a+b*x, scale=sigma) )
prior = TBC() # some logpdf
return prior + like
TBC_above()
print('Difference between elegant and brute-force log posteriors for some random parameter values:')
print('(The third column should be basically constant, though non-zero.)\n')
for i in range(10):
a = np.random.rand() * 10.0 - 5.0
b = np.random.rand() * 10.0 - 5.0
diff = st.multivariate_normal.logpdf([a,b], mean=post_mean, cov=post_cov) - log_post_brute(a,b)
print([a, b, diff])
In statistics literature, a more common convention is $p(x|\mu,\sigma^2)$, with the second parameter being the variance rather than the standard deviation; this is then consistent with the multivariate Gaussian notation in which the second parameter is the covariance matrix. However, most code implementations of the univariate normal distribution take the standard deviation as an argument rather than the variance, so we'll stick with that notation.