Maximum likelihood

Maximum likelihood estimation is a general method for estimating the parameters of econometric models from observed data. The following conditions should be met for the maximum likelihood principle to work: 1. The form of the joint pdf of \(y_t\) is known. 2. The specifications of the moments of the joint pdf are known. 3. The joint pdf can be evaluated for all values of the parameters \(\theta\).

If the distribution \(y_t\) is misspecified, i.e. violating conditions (1) and (2), estimation is by quasi-maximum likelihood. If condition (1) is violated, a generalized method of moments is required. If condition (2) is not satisfied, estimation relies on nonparametric methods. If condition (3) is not met, simulation-based estimation methods are used.

A time series data represents the observed realization of draws from a joint pdf. The maximum likelihood principle makes used of this result by providing a general framework for estimating the unknown parameters \(\theta\) from the observed time series data \(\{y_1, y_2, \dots, y_T\}\).

The standard interpretation of the joint pdf is that \(f\) is a function of \(y_t\) for given parameters \(\theta\). When defining the maximum likelihood estimator, this interpretation is reversed, so that \(f\) is taken as a function of \(\theta\) for given \(y_t\), because we regard \(\{y_1, y_2, \dots, y_T\}\) as a realized data set which is no longer random. The maximum likelihood estimator is then obtained by finding the value of \(\theta\) which is "most likely" to have generated the observed data.

The likelihood function is simply a redefinition of the joint pdf. For many problems it is easier to work with the logarithm of this joint pdf. The log-likelihood function is

$$\begin{align} \ln L_T(\theta) &= \frac1T \ln f(y_1|x_1;\theta)\\[2ex] &+ \frac 1T \sum_{t=2}^T \ln f(y_t|y_{t-1}, \dots, y_1, x_t, x_{t-1}, \dots, x_1; \theta) \end{align} \tag{1.8}$$

where \(\theta\) is a single argument and \(T\) indicates that the log-likelihood is an average over the sample of the logarithm of the density evaluated at \(y_t\). The term log-likelihood is also known as the average log-likelihood.

When \(y_t\) is iid, the log-likelihood function is based on the joint pdf in (1.4):

$$\begin{align} \ln L_T(\theta) &= \frac 1T \sum_{t=1}^T \ln f(y_t; \theta) \end{align}$$

In all cases, the log-likelihood function is a scalar that represents a summary measure of the data for given \(\theta\).

The maximum likelihood estimator of \(\theta\) is defined as the value \(\hat{\theta}\) that maximizes the log-likelihood function.