Locally weighted regression

\((x^{(i)}, y^{(i)})\) is the training example \(i\), \(x^{(i)} \in \mathbb{R}^{n+1}\), \(y^{(i)} \in \mathbb{R}\)

\(m\) is the number of examples, \(n\) is the number of features

Hypothesis function

$$h_\theta(x) = \sum_{j=0}^n \theta_j x_j = \theta^T x$$

Loss function

$$J(\theta) = \frac12 \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2$$

Parametric learning algorithm finds a fixed set of parameters \(\theta\). Non-parameteric learning algorithm requires to keep the training data set, which could be cumbersome for large data sets. Locally weighted linear regression is one example of non-parametric learning algorithms.

For linear regression, to evaluate \(h\) at \(x\), we fit \(\theta\) to minimize \(J(\theta)\), and then return \(\theta^T x\).

For locally weighted regression, we look at a local neighborhood of \(x\), focusing (putting more weight) on a narrow range near \(x\), we fit a straight line, and make a prediction at the value of \(x\).

Fit \(\theta\) to minimize

$$J(\theta) = \sum_{i=1}^m w^{(i)} (h_\theta (x^{(i)}) - y^{(i)})^2$$

where the weight function is defined as

$$w^{(i)} = \exp(-\frac{(x^{(i)}-x)^2}{2})$$