Nanos gigantium humeris insidentes!

## Bayesian inference

**Statistical inference**is the process of drawing conclusions from data that are subject to random variation.

Bayesian inference

Bayesian inference is a method of inference in which Bayes’ rule is used to update the probability estimate for a hypothesis as additional evidence is acquired. Bayesian updating is an important technique throughout statistics, and especially in mathematical statistics.

### Definitions

- , a data point in general. This may in fact be a vector of values.
- , the parameter of the data point’s distribution, i.e., . This may in fact be a vector of parameters.
- , the hyperparameter of the parameter, i.e., . This may in fact be a vector of hyperparameters.
- , a set of observed data points, i.e., .
- , a new data point whose distribution is to be predicted.

### Bayesian inference

- The prior distribution is the distribution of the parameter(s) before any data is observed, i.e. .
- The prior distribution might not be easily determined. In this case, we can use the Jeffreys prior to obtain the posterior distribution before updating them with newer observations.

- The sampling distribution is the distribution of the observed data conditional on its parameters, i.e. . This is also termed the likelihood, especially when viewed as a function of the parameter(s), sometimes written .
- The marginal likelihood (sometimes also termed the
*evidence*) is the distribution of the observed data marginalized over the parameter(s), i.e. . - The posterior distribution is the distribution of the parameter(s) after taking into account the observed data. This is determined by Bayes’ rule, which forms the heart of Bayesian inference:

Note that this is expressed in words as “posterior is proportional to likelihood times prior”, or sometimes as “posterior = likelihood times prior, over evidence”. In above formula, the evidence term can be interpreted as a factor regulating the distribution.

### Simpler version

A simpler version to help memorizing.

- Formulation of a generative model:
- likelihood:
- Prior distribution:

- Observation of new data: x
- Update of beliefs based upon observations, given a prior state of knowledge

### Bayesian prediction

- The posterior predictive distribution is the distribution of a new data point, marginalized over the posterior:

- The prior predictive distribution is the distribution of a new data point, marginalized over the prior: