2023. 12. 11. 18:29ㆍMathematics/Probability
1. Bayesian Statistical Inference
1.1. Terminology of Bayesian Inference
$x = (x_1, \cdots, x_n)$: observation vector of $X$
$p_\Theta$ or $f_\Theta$: prior distribution
unknown parameter distribution $\Theta$ that was assumed before observing $x$
$p_{\Theta|X}$ or $f_{\Theta|X}$: posterior distribution
unknown parameter distribution $\Theta$ that was assumed after observing $x$
1.2. Summary of Bayesian Inference
1. we start with a prior distribution $p_\Theta$ or $f_\Theta$ for the unknown random variable $\Theta$.
2. we have a model $p_{X|\Theta}$ or $f_{X|\Theta}$ of the observation vector $x$.
3. After observing vector $x$, we form the posterior distribution of $\Theta$, using the appropriate version of Bayes' rule.
1.3. The four versions of Bayes' rule
1. if $\Theta$ is discrete, $X$ is discrete, then
$$p_{\Theta|X}(\theta|x) = \frac{p_\Theta(\theta) p_{X|\Theta}(x|\theta)}{\sum_{\theta^\prime} p_\Theta(\theta^\prime) p_{X|\Theta}(x|\theta^\prime)}$$
2. if $\Theta$ is discrete, $X$ is continuous, then
$$p_{\Theta|X}(\theta|x) = \frac{p_\Theta(\theta) f_{X|\Theta}(x|\theta)}{\sum_{\theta^\prime} p_\Theta(\theta^\prime) f_{X|\Theta}(x|\theta^\prime)}$$
3. if $\Theta$ is discrete, $X$ is continuous, then
$$p_{\Theta|X}(\theta|x) = \frac{f_\Theta(\theta) p_{X|\Theta}(x|\theta)}{\int f_\Theta(\theta^\prime) p_{X|\Theta}(x|\theta^\prime)}$$
4. if $\Theta$ is continuous, $X$ is continuous, then
$$f_{\Theta|X}(\theta|x) = \frac{f_\Theta(\theta) f_{X|\Theta}(x|\theta)}{\int f_\Theta(\theta^\prime) f_{X|\Theta}(x|\theta^\prime)}$$
1.4. Maximum a Posteriori Probability (MAP) rule
$$\hat{\theta_n} = \arg\underset{\theta}\max p_{\Theta|X}(\theta|x) \ (\Theta \text{ is discrete}), \quad \hat{\theta_n} = \arg\underset{\theta}\max f_{\Theta|X}(\theta|x) \ (\Theta \text{ is continuous})$$
1.5. Example of Bayesian Inference
줄리엣은 로미오와의 데이트에서 항상 $X - \text{Uniform} [0, \theta]$만큼 지각을 한다고 가정하자.
첫 데이트 때, 줄리엣이 $x_1$시간 만큼 지각했다고 했을 때, $f_\Theta$ 업데이트 즉, $f_{\Theta|X}$와 $\hat \theta$가 구해라.
$$f_\Theta(\theta) = \begin{cases} 1, & \text{if } 0 \le \theta \le 1 \\ 0, & \text{otherwise} \end{cases} \quad f_{X|\Theta}(x|\theta) = \begin{cases} 1/\theta, & \text{if } 0 \le x \le \theta \\ 0, & \text{otherwise} \end{cases}$$ $$f_{\Theta|X}(\theta|x_1) = \frac{f_\Theta(\theta) f_{X|\Theta}(x_1|\theta)}{\int_0^1 f_\Theta(\theta^\prime) f_{X|\Theta}(x_1|\theta^\prime)} = \frac{1/\theta}{\int_{x_1}^1 \frac{1}{\theta^\prime}d\theta^\prime} = \frac{1}{\theta \cdot |\log x_1|}, \quad \text{if } x_1 \le \theta \le 1$$
$\theta = x_1$일 때 $f_{\Theta|X}$가 가장 크기 때문에, $\hat \theta = x_1$이다.
2. Classical Statistical Inference
2.1. Estimation of the Mean and Variance of a Random variable
Let $X_1, X_2, \cdots$ be i.i.d random variables with mean $\mu$ and variance $\sigma^2$ that are unknown.
mean estimator is Sample Mean
$$M_n=\frac{X_1+X_2+\ldots+X_n}{n},\ \ E\left[M_n\right]=\mu,\ \ \text{var}\left(M_n\right)=\frac{\sigma^2}{n}$$
variance estimator is Sample Variance
$${{\bar{S}}_n}^2=\frac{1}{n}\sum_{1}^{n}{{(X}_i-M_n)}^2,\ \ \ \ E\left[{\bar{S}}_n\right]=\frac{n-1}{n}\sigma^2,\ \ \widehat{S_n}=\frac{1}{n-1}\sum_{1}^{n}{{(X}_i-M_n)}^2,\ \ \ \ E\left[\widehat{S_n}\right]=\sigma^2$$
proof
$$\begin{matrix}E\left[{{\bar{S}}_n}^2\right]&=&\ \frac{1}{n}E\left[\sum_{1}^{n}\left({X_i}^2-2X_iM_n+{M_n}^2\right)\right] \\ &=&\frac{1}{n}E\left[\sum_{1}^{n}{X_i}^2-2M_n\sum_{1}^{n}{X_i}^2+n{M_n}^2\right] \\ &=&E\left[\frac{1}{n}\sum_{1}^{n}{X_i}^2-2{M_n}^2+{M_n}^2\right]\\&=&E\left[\frac{1}{n}\sum_{1}^{n}{X_i}^2-{M_n}^2\right]\\&=&\mu^2+\sigma^2-\left(\mu^2+\frac{\sigma^2}{n}\right)\\&=&\frac{n-1}{n}\sigma^2 \end{matrix}$$
2.2. Maximum Likelihood Estimation (ML estimation)
Let the vector of observations $X = (X_1, \cdots, X_n)$ is described by $p_X(x; \theta)$ whose form depends on $\theta$.
Suppose that we observe a particular value $x = (x_1, \cdots, x_n)$ of $X$.
Maximum likelihood estimate is a value of the parameter $\hat \theta$ that maximizes $p_X(x_1, \cdots, x_n; \theta)$ over all $\theta$:
$$\hat{\theta_n} = \arg\underset{\theta}\max p_X(x_1, \cdots, x_n; \theta) \ (X \text{ is discrete}) \\ \hat{\theta_n} = \arg\underset{\theta}\max f_X(x_1, \cdots, x_n; \theta) \ (X \text{ is continuous})$$
It is natural to maximize the probability of occurrence of observed results,
so estimate parameter $\hat \theta$ by maximizing $p_X(x; \theta)$ referred to as likelihood function.
In many experiments, the observations $X_i$ are assumed to be independent, so the likelihood function is form
$$p_X(x_1, \cdots, x_n; \theta) = \prod p_{X_i}(x_i;\theta) \ (X \text{ is discrete}), \quad f_X(x_1, \cdots, x_n; \theta) = \prod f_{X_i}(x_i;\theta) \ (X \text{ is continuous})$$
- 중요 -
$p_X(x_i; \theta)$가 의미하는 것은 "$\Theta$가 $\theta$일 확률"이 아닌,
"$\Theta$를 $\theta$로 두었을 때, $x_i$가 발생할 가능성(=확률)"을 의미한다.
결론, MLE는 $x$의 발생 가능성을 가장 높이는 $\hat \theta$를 찾는 과정이다.
2.2.1. Example of ML estimation
Likelihood function: $X - \text{Bernoulli}(p), \quad p_X(x; \theta) = \begin{cases} \theta, & \text{if } x = 1 \\ 1-\theta, & \text{if } x = 0 \end{cases}$, ($\theta$ = 정완이가 게임에서 이길 확률)
정완이가 첫 배치고사 10판 중 6판은 이기고 4판은 졌을 때($x = (1, 0, 0, 0, 1, 1, 1, 0, 1, 0)$), $\hat \theta$를 구하시오.
$$p_X(x;\theta) = \theta^6(1-\theta)^4 \\ \frac{dp_x}{d\theta}(x; \theta) = 6\theta^5(1-\theta)^4 - 4\theta^6(1-\theta)^3 = (6 - 10\theta)(\theta^5(1-\theta)^3)) \\ 6 - 10\theta = 0 \rightarrow \theta = \frac{6}{10}$$
$\theta = \frac{6}{10}$일 때 $p_X(x; \theta)$가 가장 크기 때문에, $\hat \theta = \frac{6}{10}$이다.
- Bayesian statistic VS Classical statistic -
1. $\Theta$ 추정 방법: bayesian statistic은 베이즈 정리를 통해 $\Theta$의 확률분포를 갱신하는 방식으로 모수를 추정하고,
classical statistic은 사례 $x$의 발생 가능성을 나타내는 함수인 likelihood function을 사용해 모수를 추정한다.
2. $\Theta$에 대한 관점: classical statistic은 unknown parameter $\Theta$를 상수(=constant)로 취급하는 반면,
bayesian statistic은 unknown parameter $\Theta$를 확률변수(=random variable)로 취급한다.
3. 장단점: bayesian statistic은 주관에 의해 첫 $p_\Theta$가 정해진다.
하지만, 사례 $x$가 많아질수록 점점 객관적으로 변한다.
classical statistic은 주관 없이 오직 사례 $x$만으로 확률분포를 추정한다.
하지만, 사례 $x$가 적으면 신뢰도가 매우 낮아져 활용하기 어렵다.
'Mathematics > Probability' 카테고리의 다른 글
[Probability] Markov Chains (0) | 2024.01.19 |
---|---|
[Probability] Law of Large Number & Central Limit Theorem (0) | 2023.12.11 |