Nanos gigantium humeris insidentes!

Bayesian inference

  • October 12, 2013 5:21 pm
Statistical inference is the process of drawing conclusions from data that are subject to random variation.
Bayesian inference
Bayesian inference is a method of inference in which Bayes’ rule is used to update the probability estimate for a hypothesis as additional evidence is acquired. Bayesian updating is an important technique throughout statistics, and especially in mathematical statistics.


  • x, a data point in general. This may in fact be a vector of values.
  • \theta, the parameter of the data point’s distribution, i.e., x \sim p(x \mid \theta) . This may in fact be a vector of parameters.
  • \alpha, the hyperparameter of the parameter, i.e., \theta \sim p(\theta \mid \alpha) . This may in fact be a vector of hyperparameters.
  • \mathbf{X}, a set of n observed data points, i.e., x_1,\ldots,x_n.
  • \tilde{x}, a new data point whose distribution is to be predicted.

Bayesian inference

  • The prior distribution is the distribution of the parameter(s) before any data is observed, i.e. p(\theta|\alpha).
  • The prior distribution might not be easily determined. In this case, we can use the Jeffreys prior to obtain the posterior distribution before updating them with newer observations.
  • The sampling distribution is the distribution of the observed data conditional on its parameters, i.e. p(\mathbf{X}|\alpha) . This is also termed the likelihood, especially when viewed as a function of the parameter(s), sometimes written \mathbf{L}(\mathbf{X};\alpha) = p(\mathbf{X}|\alpha) .
  • The marginal likelihood (sometimes also termed the evidence) is the distribution of the observed data marginalized over the parameter(s), i.e. p(\mathbf{X}|\alpha) = \int_{\theta} p(\mathbf{X}|\theta) p(\theta|\alpha) \mathrm{d}\!\theta  .
  • The posterior distribution is the distribution of the parameter(s) after taking into account the observed data. This is determined by Bayes’ rule, which forms the heart of Bayesian inference:
p(\theta|\mathbf{X},\alpha) = \frac{p(\mathbf{X}|\theta) p(\theta|\alpha)}{p(\mathbf{X}|\alpha)} \propto p(\mathbf{X}|\theta) p(\theta|\alpha)

Note that this is expressed in words as “posterior is proportional to likelihood times prior”, or sometimes as “posterior = likelihood times prior, over evidence”. In above formula, the evidence term can be interpreted as a factor regulating the distribution.

Simpler version

A simpler version to help memorizing.

  1. Formulation of a generative model:
    1. likelihood:  p(X|\theta)
    2. Prior distribution:  p(\theta)
  2. Observation of new data: x
  3. Update of beliefs based upon observations, given a prior state of knowledge
    1.  p(\theta|y)\propto p(y|\theta)p(\theta)


Bayesian prediction

p(\tilde{x}|\mathbf{X},\alpha) = \int_{\theta} p(\tilde{x}|\theta) p(\theta|\mathbf{X},\alpha) \mathrm{d}\!\theta
p(\tilde{x}|\alpha) = \int_{\theta} p(\tilde{x}|\theta) p(\theta|\alpha) \mathrm{d}\!\theta


Print Friendly