High-dimensional space

The Law of Large Numbers

Generating random points in $d$-dimensional space using a Gaussian will cause the distance between all pairs of points be the same when $d$ is large, because the distance

$$|\boldsymbol{y-z}|^2 = \sum_{i=1}^d(y_i-z_i)^2$$

can be viewed as the sum of $d$ independent samples of a random variable $x$ that is the squared difference of two Gaussian random variables

$$x_i = (y_i - z_i)^2$$

The Law of Large Numbers states that with high probability the average of the samples will be close to the expectation of the random variable:

$$\Pr\left( |\frac{x_1+x_2+\cdots+x_n}{n} - E(x)| \ge\epsilon \right) \le \frac{Var(x)}{n\epsilon^2}$$

Compare it with the confidence limits for $\mu$ when $\sigma$ is known in statistics:

$$\begin{align} \Pr(|\overline X - \mu| \le \epsilon) &= \Pr(\frac{|\overline X - \mu|}{\sigma/\sqrt{n}} \le \frac{\epsilon}{\sigma/\sqrt{n}})\\[2ex] &=2\int_0^{\frac{\epsilon}{\sigma/\sqrt{n}}}\frac{e^{-Z^2/2}}{\sqrt{2\pi}}dZ \end{align}$$

where $\mu$ is the population mean, $\sigma$ is the population standard deviation, $\overline X$ is the sample mean, $n$ is the sample size (provided it is large enough s.t. $X$ is normally distributed), $\epsilon$ is the error, and $Z$ is the the standard normal variate.

The Law of Large Numbers can be proved by using Chebushev's inequality and a few facts about independent random variables:

$$\begin{align} &E(x+y) = E(x) + E(y)\\ &Var(x-c) = Var(x)\\ &Var(cx) = c^2 Var(x)\\ &E(xy) = E(x) E(y)\\ &Var(x+y) = Var(x) + Var(y) \end{align}$$

Markov's inequality. Let $x$ be a nonnegative random variable. Then for $a \gt 0$,

$$\Pr(x \ge a) \le \frac{E(x)}{a}$$

Proof For a continuous nonnegative random variable $x$ with probability density $p$,

$$\begin{align} E(x) &= \int_0^\infty xp(x)dx\\[2ex] &= \int_0^a xp(x)dx + \int_a^\infty xp(x)dx\\[2ex] &\ge \int_a^\infty xp(x)dx\\[2ex] &\ge a\int_a^\infty p(x)dx\\[2ex] &= a \Pr(x \ge a) &\blacksquare \end{align}$$

The proof for discrete random variables works the same with sums instead of integrals.

Chebushev's inequality. Let $x$ be a nonnegative random variable. Then for $c \gt 0$,

$$\Pr(|x-E(x)| \ge c) \le \frac{Var(x)}{c^2}$$

Proof $\Pr(|x-E(x)| \ge c) = \Pr(|x-E(x)|^2 \ge c^2)$. Note that $y=|x-E(x)|^2$ is a nonnegative random variable and $E(y)=Var(x)$. Applying Markov's inequality,

$$\begin{align} \Pr(|x-E(x)| \ge c) &= \Pr(|x-E(x)|^2 \ge c^2)\\[2ex] &\le \frac{E(|x-E(x)|^2)}{c^2}\\[2ex] &= \frac{Var(x)}{c^2} &\blacksquare \end{align}$$

If we draw two random points $\boldsymbol{y}$ and $\boldsymbol{z}$ from a $d$-dimensional Gaussian with unit variance in each direction, then $\boldsymbol{|y|} \approx d$ and $\boldsymbol{|z|} \approx d$. Since for all $i$,

$$\begin{align} E(y_i-z_i) &= E(y_i^2) + E(z_i^2) - 2E(y_i z_i)\\[2ex] &= Var(y_i) + Var(z_i) - 2E(y_i)E(z_i)\\[2ex] &= 2 \end{align}$$

then

$$\boldsymbol{|y-z|^2} = \sum_{i=1}^d (y_i-z_i)^2 \approx 2d$$

By the Pythagorean theorem, the random $d$-dimensional $\boldsymbol{y}$ and $\boldsymbol{z}$ must be approximately orthogonal.

Master Tail Bounds Theorem. Let $x = x_1 + x_2 + \cdots + x_n,$ where $x_1, x_2, \ldots, x_n$ are mutually independent random variables with zero mean and variance at most $\sigma^2$. Let $0 \le a \le \sqrt{2}n\sigma^2$. Assume that $|E(x_i^s)| \le \sigma^2s!$ for $s=3,4,\ldots, \lfloor(a^2/4n\sigma^2)\rfloor$. Then,

$$Pr(|x| \ge a) \le 3e^{-a^2/(12n\sigma^2)}$$

The proof is done by applying Markov's inequality to the random variable $x^r$ where $r$ is a large even number. Since $r$ is nonnegative,

$$Pr(|x| \ge a) = Pr(x^r \ge a^r) \le \frac{E(x^r)}{a^r}$$

Then write $E(x)$ as $E(x_1 + \cdots + x_n)^r$ and expand the polynomial by applying the multinomial theorem. Use the fact that by independence $E(x_i^{r_i} x_j^{r_j}) = E(x_i^{r_i})E(x_j^{r_j})$ to get a collection of simpler expectations that can be bounded using assumption that $|E(x_i^s)| \le \sigma^2s!$

The Geometry of High Dimensions

Most of the volume of high-dimensional objects is concentrated near the surface. Consider any object $A$ in $R^d$. Shrink $A$ by a small amount $\epsilon$ to produce a new object $(1-\epsilon)A=\{(1-\epsilon)x|x\in A\}$. Then,

$$\text{volume}\left((1-\epsilon)A\right)=(1-\epsilon)^d\text{volume}(A)$$

A useful inequality of the exponential function: for all real $x$,

$$1+x \le e^{x}$$

Proof If $x \le -1$, the inequality $1-x \le e^{-x}$ obviously holds. Suppose $x \ge -1$. Then,

$$\begin{align} e^x &= \sum_{k=0}^\infty \frac{x^k}{k!}\\[2ex] &= 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \frac{x^4}{4!} + \cdots\\[2ex] &= 1 + x + \sum_{k \ge 1} \left( \frac{x^{2k}}{(2k)!} + \frac{x^{2k+1}}{(2k+1)!} \right)\\[2ex] &= 1 + x + \sum_{k \geq 1} x^{2k}\left( \frac{1}{(2k)!} + \frac{x}{(2k+1)!} \right)\\[2ex] &= 1 + x + \sum_{k \geq 1} x^{2k}\left( \frac{2k + 1 + x}{(2k+1)!} \right)\\[2ex] &\ge 1 + x &\blacksquare \end{align}$$

When we shrink each of the $2d$ sides of a $d$-dimensional cube by a factor $f$, its volume shrinks by a factor of $f^d$. For any object $A$ in $R^d$,

$$\frac{\text{volume}\left((1-\epsilon)A\right)}{\text{volume(A)}} = (1-\epsilon)^d \le e^{-\epsilon d}$$

Fixing $\epsilon$ and letting $d \to \infty$, the above quantity approaches zero. This means that almost all of the volume of $A$ is concentrated in the portion of $A$ that does not belong to the region $(1-\epsilon)A$.

Let $S$ denote a unit ball in $d$ dimentions. At least $1-e^{-\epsilon d}$ fraction of the volume of the unit ball is concentrated in a small annulus $S \setminus (1-\epsilon)S$ of width $\epsilon$ at the boundary. If the ball is of radius $r$, then the annulus width is $O(\frac{r}{d})$.