GoogLeNet

GoogLeNet,也叫做Inception,没错,翻译过来就是《盗梦空间》。
Alt text

相较于之前的网络,GoogLeNet网络层数更多,网络变得更深,网络结构也变得更加复杂。

不过,虽然网络结构复杂,但由于使用了1*1卷积来减少通道数,GoogLeNet所包含的参数不增反减,这也是GoogLeNet表现如此出众的重要原因之一。

GoogleNet的网络结构参数表如下:
Alt text

type:网络层类型
patch size/stride:(卷积核or池化窗口)尺寸/(卷积/池化)步长
output size:该层输出特征图的shape

GoogLeNet重复使用了inception,和NiN基础块一样,它也是一个单独的块,不妨记作inception block,蓝色框起来的是便是inception block需要的参数。

inception block结构如下:
Alt text

inception block使用了4个并行的分支,最后在通道维度上将4个分支的结果做了concat融合。这样,网络变宽了,我们无需考虑到底是用卷积层还是池化层,卷积核的尺寸是3x3还是5x5比较好,这一切,都交给模型,让模型自己去学习。

网络结构表中inception的参数说明:

#1x11x1卷积输出通道数(第一分支)
#3x3reduce3x3卷积之前的1x1卷积输出通道数(第二分支)
#3x33x3卷积输出通道数(第二分支)
#5x5 reduce5x5卷积之前的1x1卷积输出通道数(第三分支)
#5x55x5卷积输出通道数(第三分支)
pool proj:池化层后面的1x1卷积输出通道数(第四分支)

值得注意的是,inception block输出特征图的尺寸和输入图片尺寸是一样的,只是通道数发生了改变,因此4个分支的输出结果可以在通道维度上进行concat融合。

根据网络结构参数表,就可以着手实现GoogLeNet了。

PyTorch 实现GoogLeNet

由于网络中大量用到卷积-批归一化-激活操作,于是可以将其打包成一个模块conv_block

1
2
3
4
5
6
7
8
9
class conv_block(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super().__init__()
self.relu = nn.ReLU()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.batchnorm = nn.BatchNorm2d(out_channels)

def forward(self, x):
return self.relu(self.batchnorm(self.conv(x)))

然后根据inception的结构图实现inception block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class Inception_block(nn.Module):
# Inception参数顺序:in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
def __init__(
self, in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
):
"""
in_channels:输入通道数,一般是3
out_1x1:对应`#1x1`,1x1卷积输出通道数(第一分支)
red_3x3:对应`#3x3 reduce`,3x3卷积之前的`1x1`卷积输出通道数(第二分支)
out_3x3:对应`#3x3`,`卷积输出通道数(第二分支)
red_5x5:对应`#5x5 reduce`,5x5卷积之前的`1x1`卷积输出通道数(第三分支)
out_5x5:对应`#5x5`,5x5卷积输出通道数(第三分支)
out_1x1pool:对应`pool proj`,池化层后面的`1x1`卷积输出通道数(第四分支)
注意第四分支的池化层无需从外部传参,因为池化操作不会改变通道数。
"""
super().__init__()
self.branch1 = conv_block(in_channels, out_1x1, kernel_size=(1, 1))

self.branch2 = nn.Sequential(
conv_block(in_channels, red_3x3, kernel_size=(1, 1)),
conv_block(red_3x3, out_3x3, kernel_size=(3, 3), padding=(1, 1)),
)

self.branch3 = nn.Sequential(
conv_block(in_channels, red_5x5, kernel_size=(1, 1)),
conv_block(red_5x5, out_5x5, kernel_size=(5, 5), padding=(2, 2)),
)

self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
conv_block(in_channels, out_1x1pool, kernel_size=(1, 1)),
)

def forward(self, x):
#将4个分支输出的结果在通道维度上做concat
return torch.cat(
[self.branch1(x), self.branch2(x), self.branch3(x), self.branch4(x)], 1
)

对照GoogLeNet的网络结构参数表,实现GoogLeNet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class GoogLeNet(nn.Module):
def __init__(self, num_classes=1000):
super().__init__()

self.conv1 = conv_block(
in_channels=3,
out_channels=64,
kernel_size=(7, 7),
stride=(2, 2),
padding=(3, 3),
)

self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.conv2 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)
self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

# Inception参数顺序:in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
self.inception3a = Inception_block(192, 64, 96, 128, 16, 32, 32)
self.inception3b = Inception_block(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(kernel_size=(3, 3), stride=2, padding=1)

self.inception4a = Inception_block(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception_block(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception_block(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception_block(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception_block(528, 256, 160, 320, 32, 128, 128)
self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

self.inception5a = Inception_block(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception_block(832, 384, 192, 384, 48, 128, 128)

self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1)
self.dropout = nn.Dropout(p=0.4)
self.fc1 = nn.Linear(1024, num_classes)


def forward(self, x):
x = self.conv1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.maxpool2(x)

x = self.inception3a(x)
x = self.inception3b(x)
x = self.maxpool3(x)

x = self.inception4a(x)
x = self.inception4b(x)
x = self.inception4c(x)
x = self.inception4d(x)
x = self.inception4e(x)
x = self.maxpool4(x)

x = self.inception5a(x)
x = self.inception5b(x)

x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.dropout(x)
x = self.fc1(x)

return x

测试一下,输入4张224*224的3通道图片:
Alt text

参考: