Locally weighted regression

$(x^{(i)}, y^{(i)})$ is the training example $i$, $x^{(i)} \in \mathbb{R}^{n+1}$, $y^{(i)} \in \mathbb{R}$

$m$ is the number of examples, $n$ is the number of features

Hypothesis function

$$h_\theta(x) = \sum_{j=0}^n \theta_j x_j = \theta^T x$$

Loss function

$$J(\theta) = \frac12 \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2$$

Parametric learning algorithm finds a fixed set of parameters $\theta$. Non-parameteric learning algorithm requires to keep the training data set, which could be cumbersome for large data sets. Locally weighted linear regression is one example of non-parametric learning algorithms.

For linear regression, to evaluate $h$ at $x$, we fit $\theta$ to minimize $J(\theta)$, and then return $\theta^T x$.

For locally weighted regression, we look at a local neighborhood of $x$, focusing (putting more weight) on a narrow range near $x$, we fit a straight line, and make a prediction at the value of $x$.

Fit $\theta$ to minimize

$$J(\theta) = \sum_{i=1}^m w^{(i)} (h_\theta (x^{(i)}) - y^{(i)})^2$$

where the weight function is defined as

$$w^{(i)} = \exp(-\frac{(x^{(i)}-x)^2}{2})$$