更新!!!

好像并不是Dropout的原因,在训练SGAN时也出现了同样的警告,即使已经设置了Dropout层的traing=True,怀疑是梯度消失(有文章说是这个 https://www.jiqizhixin.com/articles/2018-11-27-24 )或爆炸

在本文中,可能设置training=Trur恰好避免了梯度消失或爆炸,只是凑巧而已(猜测)

具体原因后续再来分析,先这样了。

问题提出

基于DCGAN(https://fx0809.gitee.io/2020/10/07/DCGAN/)的代码,想要将生成器和判别器的实现方式改为继承自`tf.keras.Model`类的方式,修改部分的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class Generator_model(tf.keras.Model):
def __init__(self):
super().__init__()

self.dense=tf.keras.layers.Dense(7*7*256,use_bias=False)
self.bn1=tf.keras.layers.BatchNormalization()
self.leakyrelu1=tf.keras.layers.LeakyReLU()

self.reshape=tf.keras.layers.Reshape((7,7,256))

self.convT1=tf.keras.layers.Conv2DTranspose(128,(5,5),strides=(1,1),padding='same',use_bias=False)
self.bn2=tf.keras.layers.BatchNormalization()
self.leakyrelu2=tf.keras.layers.LeakyReLU()

self.convT2=tf.keras.layers.Conv2DTranspose(64,(5,5),strides=(2,2),padding='same',use_bias=False)
self.bn3=tf.keras.layers.BatchNormalization()
self.leakyrelu3=tf.keras.layers.LeakyReLU()

self.convT3=tf.keras.layers.Conv2DTranspose(1,(5,5),strides=(2,2),padding='same',use_bias=False,activation='tanh')

def call(self,inputs,training=True):
x=self.dense(inputs)
x=self.bn1(x,training)
x=self.leakyrelu1(x)

x=self.reshape(x)

x=self.convT1(x)
x=self.bn2(x,training)
x=self.leakyrelu2(x)

x=self.convT2(x)
x=self.bn3(x,training)
x=self.leakyrelu3(x)

x=self.convT3(x)

return x

class Discriminator_model(tf.keras.Model):
def __init__(self):
super().__init__()

self.conv1=tf.keras.layers.Conv2D(64,(5,5),strides=(2,2),padding='same')
self.leakyrelu1=tf.keras.layers.LeakyReLU()
self.dropout1=tf.keras.layers.Dropout(0.3)

self.conv2=tf.keras.layers.Conv2D(128,(5,5),strides=(2,2),padding='same')
self.leakyrelu2=tf.keras.layers.LeakyReLU()
self.dropout2=tf.keras.layers.Dropout(0.3)

self.flatten=tf.keras.layers.Flatten()

self.dense=tf.keras.layers.Dense(1)

def call(self,inputs,training=True):
x=self.conv1(inputs)
x=self.leakyrelu1(x)
x=self.dropout1(x)

x=self.conv2(inputs)
x=self.leakyrelu2(x)
x=self.dropout2(x)

x=self.flatten(x)

x=self.dense(x)

return x

运行全部完整后,发现会不断地弹出warning信息:

1
WARNING:tensorflow:Gradients do not exist for variables ['discriminator_model/conv2d/kernel:0', 'discriminator_model/conv2d/bias:0'] when minimizing the loss.

问题解决

把判别器的Dropout层中的traing参数设置为Ture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Discriminator_model(tf.keras.Model):
def __init__(self):
super().__init__()

self.conv1=tf.keras.layers.Conv2D(64,(5,5),strides=(2,2),padding='same')
self.leakyrelu1=tf.keras.layers.LeakyReLU()
self.dropout1=tf.keras.layers.Dropout(0.3)

self.conv2=tf.keras.layers.Conv2D(128,(5,5),strides=(2,2),padding='same')
self.leakyrelu2=tf.keras.layers.LeakyReLU()
self.dropout2=tf.keras.layers.Dropout(0.3)

self.flatten=tf.keras.layers.Flatten()

self.dense=tf.keras.layers.Dense(1)

def call(self,inputs,training=True):
x=self.conv1(inputs)
x=self.leakyrelu1(x)
x=self.dropout1(x,training)

x=self.conv2(inputs)
x=self.leakyrelu2(x)
x=self.dropout2(x,training)

x=self.flatten(x)

x=self.dense(x)

return x

分析

官网关于Dropout层的描述如下

1
2
3
4
5
The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer.

(This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.)

或许正如in other contexts, you can set the kwarg explicitly to True when calling the layer所说,我们在使用其他方式搭建模型(比如这里的call方法)时,要手动设置training=True