Chap 4
4.1 Likelihood and Maximum Likelihood
定义
iid meas independent and identically distributed. 独立同分布
对数似然函数:
$$
\ellx(\mu) = \log(f{\mu}(x))
$$
则 maximum liklihood estimate (MLE) is:
$$
MLE:\hat{\mu} = \arg \max_{\mu \in \Omega} { \ell_x(\mu) }
$$
例子
$$
xi \overset{iid}{\sim} f{\mu}(x)
$$
$$
\ellx(\mu) = \sum{i=1}^n \log f_{\mu}(xi) = \sum{i=1}^n \ell_{x_i}(\mu)
$$
4.2 Fisher Information and the MLE
注意,式子上方的点表示求导,多个表示求多次。
定义
-
score function,对数似然的一阶导
$$
\dot{\ell\theta}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta}
$$ -
$E[\dot{\ell{\theta}}(x)] = 0$, under regularity condition. 证明如下:
$$
\begin{aligned}
\int \frac{\partial \log f{\theta}(x)}{\partial \theta} f{\theta}(x) dx &= \int \frac{\dot{f{\theta}}(x)}{f{\theta}(x)} f{\theta}(x) dx \
&= \int \dot{f{\theta}}(x) dx \
&= \frac{\partial}{\partial \theta} \int f{\theta}(x) dx \
&= 0
\end{aligned}
$$ -
Fisher Information:
$$
\mathcal{I}(\theta) = Var[\dot{\ell{\theta}}(x)] = E[(\dot{\ell{\theta}}(x))^2] - 0
$$
(平方的期望减去期望的平方)Fisher 信息是一个方差。
-
上式还等于 $-E[\ddot{\ell{\theta}}(x)]$ ,即二阶导的期望的负,证明如下:
$$
\dot{\ell{\theta}}(x) = \frac{\dot{f{\theta}}(x)}{f{\theta}(x)}
$$$$
\ddot{\ell{\theta}}(x) = \frac{\ddot{f{\theta}}(x)f{\theta}(x) - (\dot{f{\theta}}(x))^2}{f_{\theta}^2(x)}
$$而:
$$
\begin{aligned}
E[\frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)}] &= \int \frac{\ddot{f{\theta}}(x)f{\theta}(x)}{f{\theta}^2(x)} f{\theta}(x) dx \
&= \int \ddot{f{\theta}}(x) dx \
&= \frac{\partial}{\partial \theta} \int \dot{f_{\theta}}(x) dx \
&= 0
\end{aligned}
$$
最后一步由 式 (27) 得出。又:
$$
E[\frac{(\dot{f{\theta}}(x))^2}{f{\theta}^2(x)}] = E[(\frac{\dot{f{\theta}}(x)}{f{\theta}(x)})^2] = E[(\dot{\ell{\theta}}(x))^2]
$$
式 (35) 减去 式 (36) 得到 $-E[(\dot{\ell{\theta}}(x))^2]$,故得证。 -
对于 $X_1, X_2, ..., Xn$:
$$
L(\theta) = \prod{i=1}^n f_{\theta}(x_i)
$$$$
\ell{\theta}(x) = \log L(\theta) = \sum{i=1}^n \log f_{\theta}(x_i)
$$$$
\dot{\ell{\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell_{x_i}}(\theta)
$$依据中心极限定理:
$$
\dot{\ell_{\vec{x}}}(\theta) \sim N(0, n \mathcal{I}(\theta))
$$ -
$\hat{\theta} \rightarrow MLE$
$$
\dot{\ell{\vec{x}}}(\hat{\theta}) = 0 \approx \dot{\ell{\vec{x}}}(\theta) +\ddot{\ell{\vec{x}}}(\theta)(\hat{\theta}-\theta) \
\hat{\theta} - \theta = -(\ddot{\ell{\vec{x}}}(\theta))^{-1} \dot{\ell_{\vec{x}}}(\theta) = (n \mathcal{I}(\theta))^{-1} \dot{\sim} N(0, (n \mathcal{I}(\theta))^{-1})
$$ -
if $\tilde{\theta},E(\tilde{\theta}) = \theta$, then $Var(\tilde{\theta}) \geq (n \mathcal{I}(\theta))^{-1}$. PPT P12,即任意 $\theta$ 的无偏估计的方差,总是有一个下界。前式还可写作:$Var(\tilde{\theta}) n \mathcal{I}(\theta) \geq 1$,而不等式左边等于:
$$
\int (\tilde{\theta} - \theta)^2 f{\theta}(\vec{x}) d\vec{x} \int (\dot{\ell\vec{x}}(\theta))^2 f_{\theta}(\vec{x}) d\vec{x}
$$
而根据柯西施瓦茨不等式:$[ \int f(x) g(x) dx]^2 \leq \int f^2(x) g^2(x) dx$所以上式应当满足:
$$
\geq [\int (\tilde{\theta} - \theta) \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x}]^2 = 1
$$
上式不等号右边的平方号内可以如此推导:
$$
\begin{aligned}
\int \tilde{\theta} \dot{\ell{\vec{x}}}(\theta) f{\theta}(\vec{x}) d\vec{x} &= \int \tilde{\theta} \frac{\dot{f{\theta}}(\vec{x})}{f{\theta}(\vec{x})} f{\theta}(\vec{x}) d\vec{x} \
&= \frac{\int \partial (\tilde{\theta}f{\theta}(\vec{x})) d\vec{x}}{\partial \theta} \
&= \frac{\partial (\int \tilde{\theta}f{\theta}(\vec{x}) d\vec{x})}{\partial \theta} \
&= \frac{\partial \theta}{\partial \theta} \
&= 1
\end{aligned}
$$
此外(这部分不知道有何意义):
$$
\dot{\ell{\vec{x}}}(\theta) = \sum{i=1}^n \dot{\ell{xi}}(\theta) = \sum{i=1}^n \frac{\partial \log f_{\theta}(xi)}{\partial \theta} \
f{\theta}(\vec{x}) = \prod{i=1}^n f{\theta}(xi)
$$
故式(38)不等号右边的平方符号内可写作:
$$
\int \tilde{\theta}[\sum{i=1}^n \frac{\dot{f_{\theta}}(xi)}{f{\theta}(xi)}] \prod{i=1}^n f_{\theta}(x_i) dx_1 ... dx_n
$$
例子
$$
P(x=1) = \theta \
P(x=0) = 1 - \theta \
f{\theta}(x) = \theta^x(1-\theta)^{1-x} \
\log f{\theta}(x) = x \log\theta + (1-x) \log (1-\theta) \
\dot{\ell{\theta}}(x) = \frac{\partial \log f{\theta}(x)}{\partial \theta} = \frac{\partial(x \log \theta + (1-x) \log(1-\theta))}{\partial \theta} = \frac{x}{\theta} - \frac{1-x}{1-\theta} \
-\ddot{\ell{\theta}}(x) = \frac{x}{\theta^2} + \frac{1-x}{(1-\theta)^2} \
\mathcal{I}(\theta) = E[-\ddot{\ell{\theta}}(x)] = \frac{1}{\theta(1-\theta)}
$$
例2
有误!!!
电工观测电压 $X$,100以下时正常,100以上时只显示100。有:
$$
X \sim N(\mu, 1) \
P(X \geq 100) = p
$$
而:
$$
X =
\left{
\begin{aligned}
100, Z=0 \
h(x) = \frac{\Phi(x-\mu)}{1-p}, Z=1 (X < 100)
\end{aligned}
\right.
$$
观测12次,其中出现100的次数为 $k$ :
$$
\begin{aligned}
似然 &= p^k \prod_{i=1}^{12}(1-p)h(xi) \
&= p^k \prod{i=1}^{12} \Phi(x-\mu)
\end{aligned}
$$
例3
求Jeffreys' prior
Comments NOTHING