Locally weighted regression
$(x^{(i)}, y^{(i)})$ is the training example $i$, $x^{(i)} \in \mathbb{R}^{n+1}$, $y^{(i)} \in \mathbb{R}$
$m$ is the number of examples, $n$ is the number of features
Hypothesis function
$$h_\theta(x) = \sum_{j=0}^n \theta_j x_j = \theta^T x$$
Loss function
$$J(\theta) = \frac12 \sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2$$
Parametric learning algorithm finds a fixed set of parameters $\theta$. Non-parameteric learning algorithm requires to keep the training data set, which could be cumbersome for large data sets. Locally weighted linear regression is one example of non-parametric learning algorithms.
For linear regression, to evaluate $h$ at $x$, we fit $\theta$ to minimize $J(\theta)$, and then return $\theta^T x$.
For locally weighted regression, we look at a local neighborhood of $x$, focusing (putting more weight) on a narrow range near $x$, we fit a straight line, and make a prediction at the value of $x$.
Fit $\theta$ to minimize
$$J(\theta) = \sum_{i=1}^m w^{(i)} (h_\theta (x^{(i)}) - y^{(i)})^2$$
where the weight function is defined as
$$w^{(i)} = \exp(-\frac{(x^{(i)}-x)^2}{2})$$