Journey into Machine Learning: (Weak) Law of Large Numbers and the Central Limit Theorem

Following the definitions of convergence in probability and convergence in distribution in the previous post - we now state two fundamental theorems which incorporate these concepts.

(Weak) Law of Large Numbers:
Let $\left\{X_i \right\}$ be a sequence of iid (independent and identically distributed) random variables with $\mathbb{E}\left[X_i\right] = \mu \lt \infty$ and $\text{Var}\left(X_i \right) = \sigma^2 $. Now define $$S_n:=\sum_{i=1}^{n} X_i$$ then $$\frac{S_n}{n}\rightarrow \mu$$ as $n\rightarrow \infty$.

Proof:
Since the random variables are all iid then $$\mathbb{E}\left[\frac{S_n}{n} \right] = \frac{1}{n} \mathbb{E}\left[S_n\right] = \frac{n \mu}{n} = \mu$$ Similarly $$\text{Var}\left( \frac{S_n}{n} \right) = \frac{1}{n^2} \left(\mathbb{E}[S_n^2]- \mathbb{E}[S_n]^2 \right)=\frac{n \sigma^2}{n^2} = \frac{\sigma^2}{n}$$ We now call upon the Chebyshev's Inequality which states if $X$ is a random variable with finite expectation $\mu$ and non-zero variance $\sigma^2$ then $$P\left( \left|X-\mu \right| \ge n \sigma \right) \le \frac{1}{n^2}$$It gives a bound on the distributions values in terms of the variance - this is completely distribution agnostic.

We can now use Chebyshev's Inequality and re-write it using $X=S_n$ and $n \sigma = \epsilon$:
$$P\left( \left|S_n-\mu \right| \ge \epsilon \right) \le \frac{\sigma^2}{n \epsilon^2}$$Hence
$$ P\left( \left|S_n-\mu \right| \lt \epsilon \right) = 0$$ as $n \rightarrow \infty$. Which is precisely the definition of convergence in probability.

Central Limit Theorem (CLT):
We shall state a slightly restricted version of the CLT which is fine for illustrative purposes. Taking our $S_n$ from above, then the CLT states:
Let $\left\{X_1, X_2, . . . \right\}$ be a sequence of i.i.d random variables with finite expectation $\mathbb{E}[X_i] = \mu < \infty$ and finite non-zero variance $\text{Var}\left(X_i\right) = \sigma^2 < \infty$. Then

$$ P \left( \frac{S_n-n \mu}{\sigma \sqrt{n}} \leq x \right) \rightarrow \Phi(x)$$
as $n \rightarrow \infty$ and the convergence is in distribution and $\Phi(x)$ is the cumulative density function of of a Standard Normal variable. We shall not prove the CLT as only an understanding of what it says and how it can be applied is required.

This explains why Normal distributions are so common in modelling (and even nature) since under these mild conditions, regardless of distribution - if this combination of random variables is formed then it will tend to the normal distribution for large enough n.

Journey into Machine Learning

Sunday, 25 August 2013

(Weak) Law of Large Numbers and the Central Limit Theorem

No comments:

Post a Comment

About Me

Blog Archive