原理

对输入数据施加一个线性变换再输出。

数学描述：$\bold{y} = \bold{A}^{T} \bold{x} + \bold{b}$

代码语言： output = weight @ input + bias

官方文档页面也给出了 weight 和 bias 的初始化方式，是从均匀分布 $U(-\sqrt{k}, \sqrt{k})$ 上进行采样，其中 $k = \frac{1}{in_features}$ 。注意，仅当初始化参数 bias=True 时上式的 bias 才如此产生。

使用

初始化参数列表

in_features ：表示输入 Tensor 的最后一维的 size 。
out_features ：示输出 Tensor 的最后一维的 size 。
bias=True ：如果为 True 则按上述方式初始化，否则不添加 bias 偏置（存疑）。

实例

import torch
from torch import nn
m = nn.Linear(2, 3)
input = torch.ones(1, 2)
print(input) # tensor([[1., 1.]])
output = m(input)
print(output) # tensor([[-0.7184,  1.2585,  0.4239]], grad_fn=<AddmmBackward0>)

直观理解

对张量进行线性变换，变换前张量的最后一维的 size 是 in_features 变换后是 out_features 。我的理解是将信息在一定程度上进行压缩，以符合后续计算的格式要求。

比如在 cs224n Assignment 4 构建 NMT 模型的过程中，decoder part 的 first hidden state $h{0}^{dec}$ 需要从 encoder part 的first 和 last hidden state 得出。而后两者分别为 $h \times 1$ 维，两者的 concatenation 为 $2h \times 1$ 维，而 $h{0}^{dec}$ 需要是 $h \times 1$ 维，因此我们做一个线性投影 $W_{h}$ ，实现这一线性变换，这个过程对信息进行了压缩。

torch.nn.Linear

原理

使用

初始化参数列表

实例

直观理解

CS224N Lecture 7: Translation, Seq2Seq, Attention

pack_padded_sequence 与 pad_packed_sequence

Comments NOTHING

取消回复