Chap 4

4.1 Likelihood and Maximum Likelihood

定义

iid meas independent and identically distributed. 独立同分布

对数似然函数：
$$
\ellx(\mu) = \log(f{\mu}(x))
$$
则 maximum liklihood estimate (MLE) is:
$$
MLE:\hat{\mu} = \arg \max_{\mu \in \Omega} { \ell_x(\mu) }
$$

例子

$$
xi \overset{iid}{\sim} f{\mu}(x)
$$

$$
\ellx(\mu) = \sum{i=1}^n \log f_{\mu}(xi) = \sum{i=1}^n \ell_{x_i}(\mu)
$$

4.2 Fisher Information and the MLE

注意，式子上方的点表示求导，多个表示求多次。

定义

score function，对数似然的一阶导
$$
\dot{\ell\theta}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta}
$$
$E[\dot{\ell{\theta}}(x)] = 0$, under regularity condition. 证明如下：
$$
\begin{aligned}
\int \frac{\partial \log f{\theta}(x)}{\partial \theta} f{\theta}(x) dx &= \int \frac{\dot{f{\theta}}(x)}{f{\theta}(x)} f{\theta}(x) dx \
&= \int \dot{f{\theta}}(x) dx \
&= \frac{\partial}{\partial \theta} \int f{\theta}(x) dx \
&= 0
\end{aligned}
$$
Fisher Information:
$$
\mathcal{I}(\theta) = Var[\dot{\ell{\theta}}(x)] = E[(\dot{\ell{\theta}}(x))^2] - 0
$$
（平方的期望减去期望的平方）

Fisher 信息是一个方差。
上式还等于 $-E[\ddot{\ell{\theta}}(x)]$ ，即二阶导的期望的负，证明如下：
$$
\dot{\ell{\theta}}(x) = \frac{\dot{f{\theta}}(x)}{f{\theta}(x)}
$$

$$
\ddot{\ell{\theta}}(x) = \frac{\ddot{f{\theta}}(x)f{\theta}(x) - (\dot{f{\theta}}(x))^2}{f_{\theta}^2(x)}
$$

而：
$$
\begin{aligned}
E[\frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)}] &= \int \frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)} f{\theta}(x) dx \
&= \int \ddot{f{\theta}}(x) dx \
&= \frac{\partial}{\partial \theta} \int \dot{f_{\theta}}(x) dx \
&= 0
\end{aligned}
$$
最后一步由式 (27) 得出。

又：
$$
E[\frac{(\dot{f{\theta}}(x))^2}{f{\theta}^2(x)}] = E[(\frac{\dot{f{\theta}}(x)}{f{\theta}(x)})^2] = E[(\dot{\ell{\theta}}(x))^2]
$$
式 (35) 减去式 (36) 得到 $-E[(\dot{\ell{\theta}}(x))^2]$，故得证。
对于 $X_1, X_2, ..., Xn$:
$$
L(\theta) = \prod{i=1}^n f_{\theta}(x_i)
$$

$$
\ell{\theta}(x) = \log L(\theta) = \sum{i=1}^n \log f_{\theta}(x_i)
$$

$$
\dot{\ell{\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell_{x_i}}(\theta)
$$

依据中心极限定理：
$$
\dot{\ell_{\vec{x}}}(\theta) \sim N(0, n \mathcal{I}(\theta))
$$
$\hat{\theta} \rightarrow MLE$
$$
\dot{\ell{\vec{x}}}(\hat{\theta}) = 0 \approx \dot{\ell{\vec{x}}}(\theta) +\ddot{\ell{\vec{x}}}(\theta)(\hat{\theta}-\theta) \
\hat{\theta} - \theta = -(\ddot{\ell{\vec{x}}}(\theta))^{-1} \dot{\ell_{\vec{x}}}(\theta) = (n \mathcal{I}(\theta))^{-1} \dot{\sim} N(0, (n \mathcal{I}(\theta))^{-1})
$$
if $\tilde{\theta},E(\tilde{\theta}) = \theta$, then $Var(\tilde{\theta}) \geq (n \mathcal{I}(\theta))^{-1}$. PPT P12，即任意 $\theta$ 的无偏估计的方差，总是有一个下界。前式还可写作：$Var(\tilde{\theta}) n \mathcal{I}(\theta) \geq 1$，而不等式左边等于：
$$
\int (\tilde{\theta} - \theta)^2 f{\theta}(\vec{x}) d\vec{x} \int (\dot{\ell\vec{x}}(\theta))^2 f_{\theta}(\vec{x}) d\vec{x}
$$
而根据柯西施瓦茨不等式：$[ \int f(x) g(x) dx]^2 \leq \int f^2(x) g^2(x) dx$

所以上式应当满足：
$$
\geq [\int (\tilde{\theta} - \theta) \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x}]^2 = 1
$$
上式不等号右边的平方号内可以如此推导：
$$
\begin{aligned}
\int \tilde{\theta} \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x} &= \int \tilde{\theta} \frac{\dot{f{\theta}}(\vec{x})}{f{\theta}(\vec{x})} f{\theta}(\vec{x}) d\vec{x} \
&= \frac{\int \partial (\tilde{\theta}f{\theta}(\vec{x})) d\vec{x}}{\partial \theta} \
&= \frac{\partial (\int \tilde{\theta}f{\theta}(\vec{x}) d\vec{x})}{\partial \theta} \
&= \frac{\partial \theta}{\partial \theta} \
&= 1
\end{aligned}
$$
此外（这部分不知道有何意义）：
$$
\dot{\ell{\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell{xi}}(\theta) = \sum{i=1}^n \frac{\partial \log f_{\theta}(xi)}{\partial \theta} \
f{\theta}(\vec{x}) = \prod{i=1}^n f{\theta}(xi)
$$
故式(38)不等号右边的平方符号内可写作：
$$
\int \tilde{\theta}[\sum{i=1}^n \frac{\dot{f_{\theta}}(xi)}{f{\theta}(xi)}] \prod{i=1}^n f_{\theta}(x_i) dx_1 ... dx_n
$$

例子

$$
P(x=1) = \theta \
P(x=0) = 1 - \theta \
f{\theta}(x) = \theta^x(1-\theta)^{1-x} \
\log f{\theta}(x) = x \log\theta + (1-x) \log (1-\theta) \
\dot{\ell{\theta}}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta} = \frac{\partial(x \log \theta + (1-x) \log(1-\theta))}{\partial \theta} = \frac{x}{\theta} - \frac{1-x}{1-\theta} \
-\ddot{\ell{\theta}}(x) = \frac{x}{\theta^2} + \frac{1-x}{(1-\theta)^2} \
\mathcal{I}(\theta) = E[-\ddot{\ell{\theta}}(x)] = \frac{1}{\theta(1-\theta)}
$$

例2

有误！！！

电工观测电压 $X$，100以下时正常，100以上时只显示100。有：
$$
X \sim N(\mu, 1) \
P(X \geq 100) = p
$$
而：
$$
X =
\left{
\begin{aligned}
100, Z=0 \
h(x) = \frac{\Phi(x-\mu)}{1-p}, Z=1 (X < 100)
\end{aligned}
\right.
$$
观测12次，其中出现100的次数为 $k$ ：
$$
\begin{aligned}
似然 &= p^k \prod_{i=1}^{12}(1-p)h(xi) \
&= p^k \prod{i=1}^{12} \Phi(x-\mu)
\end{aligned}
$$

例3

求Jeffreys' prior

统计理论与方法第四章

Chap 4

4.1 Likelihood and Maximum Likelihood

定义

例子

4.2 Fisher Information and the MLE

定义

例子

例2

例3

Likelihood ratio test

统计理论与方法第三章：贝叶斯推断

市场机制设计第七周：Stable Matching

Comments NOTHING

取消回复

Chap 4

4.1 Likelihood and Maximum Likelihood

定义

例子

4.2 Fisher Information and the MLE

定义

例子

例2

例3

Likelihood ratio test

统计理论与方法 第三章：贝叶斯推断

市场机制设计 第七周：Stable Matching

Comments NOTHING

取消回复

统计理论与方法第三章：贝叶斯推断

市场机制设计第七周：Stable Matching