统计理论与方法 第四章

发布于 2022-10-20  35 次阅读


Chap 4

4.1 Likelihood and Maximum Likelihood

定义

iid meas independent and identically distributed. 独立同分布

对数似然函数:
$$
\ellx(\mu) = \log(f{\mu}(x))
$$
则 maximum liklihood estimate (MLE) is:
$$
MLE:\hat{\mu} = \arg \max_{\mu \in \Omega} { \ell_x(\mu) }
$$

例子

$$
xi \overset{iid}{\sim} f{\mu}(x)
$$

$$
\ellx(\mu) = \sum{i=1}^n \log f_{\mu}(xi) = \sum{i=1}^n \ell_{x_i}(\mu)
$$

4.2 Fisher Information and the MLE

注意,式子上方的点表示求导,多个表示求多次。

定义

  1. score function,对数似然的一阶导
    $$
    \dot{\ell\theta}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta}
    $$

  2. $E[\dot{\ell{\theta}}(x)] = 0$, under regularity condition. 证明如下:
    $$
    \begin{aligned}
    \int \frac{\partial \log f
    {\theta}(x)}{\partial \theta} f{\theta}(x) dx &= \int \frac{\dot{f{\theta}}(x)}{f{\theta}(x)} f{\theta}(x) dx \
    &= \int \dot{f{\theta}}(x) dx \
    &= \frac{\partial}{\partial \theta} \int f
    {\theta}(x) dx \
    &= 0
    \end{aligned}
    $$

  3. Fisher Information:
    $$
    \mathcal{I}(\theta) = Var[\dot{\ell{\theta}}(x)] = E[(\dot{\ell{\theta}}(x))^2] - 0
    $$
    (平方的期望减去期望的平方)

    Fisher 信息是一个方差。

  4. 上式还等于 $-E[\ddot{\ell{\theta}}(x)]$ ,即二阶导的期望的负,证明如下:
    $$
    \dot{\ell
    {\theta}}(x) = \frac{\dot{f{\theta}}(x)}{f{\theta}(x)}
    $$

    $$
    \ddot{\ell{\theta}}(x) = \frac{\ddot{f{\theta}}(x)f{\theta}(x) - (\dot{f{\theta}}(x))^2}{f_{\theta}^2(x)}
    $$

    而:
    $$
    \begin{aligned}
    E[\frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)}] &= \int \frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)} f{\theta}(x) dx \
    &= \int \ddot{f
    {\theta}}(x) dx \
    &= \frac{\partial}{\partial \theta} \int \dot{f_{\theta}}(x) dx \
    &= 0
    \end{aligned}
    $$
    最后一步由 式 (27) 得出。

    又:
    $$
    E[\frac{(\dot{f{\theta}}(x))^2}{f{\theta}^2(x)}] = E[(\frac{\dot{f{\theta}}(x)}{f{\theta}(x)})^2] = E[(\dot{\ell{\theta}}(x))^2]
    $$
    式 (35) 减去 式 (36) 得到 $-E[(\dot{\ell
    {\theta}}(x))^2]$,故得证。

  5. 对于 $X_1, X_2, ..., Xn$:
    $$
    L(\theta) = \prod
    {i=1}^n f_{\theta}(x_i)
    $$

    $$
    \ell{\theta}(x) = \log L(\theta) = \sum{i=1}^n \log f_{\theta}(x_i)
    $$

    $$
    \dot{\ell{\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell_{x_i}}(\theta)
    $$

    依据中心极限定理:
    $$
    \dot{\ell_{\vec{x}}}(\theta) \sim N(0, n \mathcal{I}(\theta))
    $$

  6. $\hat{\theta} \rightarrow MLE$
    $$
    \dot{\ell{\vec{x}}}(\hat{\theta}) = 0 \approx \dot{\ell{\vec{x}}}(\theta) +\ddot{\ell{\vec{x}}}(\theta)(\hat{\theta}-\theta) \
    \hat{\theta} - \theta = -(\ddot{\ell
    {\vec{x}}}(\theta))^{-1} \dot{\ell_{\vec{x}}}(\theta) = (n \mathcal{I}(\theta))^{-1} \dot{\sim} N(0, (n \mathcal{I}(\theta))^{-1})
    $$

  7. if $\tilde{\theta},E(\tilde{\theta}) = \theta$, then $Var(\tilde{\theta}) \geq (n \mathcal{I}(\theta))^{-1}$. PPT P12,即任意 $\theta$ 的无偏估计的方差,总是有一个下界。前式还可写作:$Var(\tilde{\theta}) n \mathcal{I}(\theta) \geq 1$,而不等式左边等于:
    $$
    \int (\tilde{\theta} - \theta)^2 f{\theta}(\vec{x}) d\vec{x} \int (\dot{\ell\vec{x}}(\theta))^2 f_{\theta}(\vec{x}) d\vec{x}
    $$
    而根据柯西施瓦茨不等式:$[ \int f(x) g(x) dx]^2 \leq \int f^2(x) g^2(x) dx$

    所以上式应当满足:
    $$
    \geq [\int (\tilde{\theta} - \theta) \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x}]^2 = 1
    $$
    上式不等号右边的平方号内可以如此推导:
    $$
    \begin{aligned}
    \int \tilde{\theta} \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x} &= \int \tilde{\theta} \frac{\dot{f{\theta}}(\vec{x})}{f{\theta}(\vec{x})} f{\theta}(\vec{x}) d\vec{x} \
    &= \frac{\int \partial (\tilde{\theta}f
    {\theta}(\vec{x})) d\vec{x}}{\partial \theta} \
    &= \frac{\partial (\int \tilde{\theta}f{\theta}(\vec{x}) d\vec{x})}{\partial \theta} \
    &= \frac{\partial \theta}{\partial \theta} \
    &= 1
    \end{aligned}
    $$
    此外(这部分不知道有何意义):
    $$
    \dot{\ell
    {\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell{xi}}(\theta) = \sum{i=1}^n \frac{\partial \log f_{\theta}(xi)}{\partial \theta} \
    f
    {\theta}(\vec{x}) = \prod{i=1}^n f{\theta}(xi)
    $$
    故式(38)不等号右边的平方符号内可写作:
    $$
    \int \tilde{\theta}[\sum
    {i=1}^n \frac{\dot{f_{\theta}}(xi)}{f{\theta}(xi)}] \prod{i=1}^n f_{\theta}(x_i) dx_1 ... dx_n
    $$

例子

$$
P(x=1) = \theta \
P(x=0) = 1 - \theta \
f{\theta}(x) = \theta^x(1-\theta)^{1-x} \
\log f
{\theta}(x) = x \log\theta + (1-x) \log (1-\theta) \
\dot{\ell{\theta}}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta} = \frac{\partial(x \log \theta + (1-x) \log(1-\theta))}{\partial \theta} = \frac{x}{\theta} - \frac{1-x}{1-\theta} \
-\ddot{\ell{\theta}}(x) = \frac{x}{\theta^2} + \frac{1-x}{(1-\theta)^2} \
\mathcal{I}(\theta) = E[-\ddot{\ell
{\theta}}(x)] = \frac{1}{\theta(1-\theta)}
$$

例2

有误!!!

电工观测电压 $X$,100以下时正常,100以上时只显示100。有:
$$
X \sim N(\mu, 1) \
P(X \geq 100) = p
$$
而:
$$
X =
\left{
\begin{aligned}
100, Z=0 \
h(x) = \frac{\Phi(x-\mu)}{1-p}, Z=1 (X < 100)
\end{aligned}
\right.
$$
观测12次,其中出现100的次数为 $k$ :
$$
\begin{aligned}
似然 &= p^k \prod_{i=1}^{12}(1-p)h(xi) \
&= p^k \prod
{i=1}^{12} \Phi(x-\mu)
\end{aligned}
$$

例3

求Jeffreys' prior

Likelihood ratio test