批量归一化

批量归一化

问题

底层参数的变化会导致顶层的较大变化,导致训练效果不好

解决方案

批量归一化(Batch Normalization),对每一层的输入进行归一
化处理,使得每一层的输入分布保持稳定。

  • 固定每个小批量的均值和方差
  • 对每个小批量的输入进行归一化处理
  • 引入可学习的缩放参数和偏移参数,用于调整归一化后的输出

批量归一化层

  • 可学习的缩放参数和偏移参数
  • 作用位置:全连接层和卷积层的输出,激活函数之前
  • 作用:归一化输入,加速训练,正则化模型

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import torch 
from torch import nn
from d2l import torch as d2l

def batch_norm(X,gamma,beta,moving_mean,moving_var,eps,momentum):
if not torch.is_grad_enabled():
X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)
else:
assert len(X.shape) in (2, 4), "Batch normalization only supports 2D or 4D inputs"
if len(X.shape) == 2:
mean = X.mean(dim=0)
var = ((X - mean) ** 2).mean(dim=0)
else:
mean = X.mean(dim=(0, 2, 3), keepdim=True)
var = ((X - mean) ** 2).mean(dim=(0, 2, 3), keepdim=True)

X_hat = (X - mean) / torch.sqrt(var + eps)
moving_mean = momentum * moving_mean + (1.0 - momentum) * mean# 移动平均

moving_var = momentum * moving_var + (1.0 - momentum) * var # 移动平均

Y = gamma * X_hat + beta
return Y, moving_mean, moving_var

class BatchNorm(nn.Module):
def __init__(self, num_features, num_dims):
super().__init__()
if num_dims == 2:
shape = (1, num_features)
else:
shape = (1, num_features, 1, 1)
self.gamma = nn.Parameter(torch.ones(shape))
self.beta = nn.Parameter(torch.zeros(shape))
self.moving_mean = torch.zeros(shape)
self.moving_var = torch.ones(shape)

def forward(self, X):
if self.moving_mean.device != X.device:
self.moving_mean = self.moving_mean.to(X.device)
self.moving_var = self.moving_var.to(X.device)
Y, self.moving_mean, self.moving_var = batch_norm(
X, self.gamma, self.beta, self.moving_mean, self.moving_var,
eps=1e-5, momentum=0.9)
return Y

if __name__ == "__main__":
# 测试BatchNorm
# X = torch.randn(4, 3, 2, 2) # 4个样本,3个通道,2x2的特征图
# batch_norm_layer = BatchNorm(num_features=3, num_dims=4)
# Y = batch_norm_layer(X)
# print("Output shape:", Y.shape) # 应该是 (4, 3, 2, 2)


#应用于LeNet
net = nn.Sequential(
nn.Conv2d(1, 6, kernel_size=5),
BatchNorm(6, num_dims=4),
nn.Sigmoid(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(6, 16, kernel_size=5),
BatchNorm(16, num_dims=4),
nn.Sigmoid(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16 * 4 * 4, 120),
BatchNorm(120, num_dims=2),
nn.Sigmoid(),
nn.Linear(120, 84),
BatchNorm(84, num_dims=2),
nn.Sigmoid(),
nn.Linear(84, 10)
)

lr, num_epochs = 1.0, 10
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())


批量归一化
http://example.com/2025/08/18/25_08_18批量归一化/
作者
ZF ZHAO
发布于
2025年8月18日
许可协议