我理解的BatchNorm层是能够加速训练速度,那它还能够防止神经网络过拟合嘛?怎么解释呢
1个回答
BN的初衷不是为了防止梯度消失或者防止过拟合。
BN是通过对系统参数搜索空间进行约束来增加系统鲁棒性,压缩搜索空间,改善系统的结构合理性,这会带来一系列的性能改善,比如加速收敛,保证梯度,缓解过拟合等。
具体对于过拟合来说,在BN中, Batch是随机选取进行Normalization, 并计算均值等, 在测试阶段, 应用均值这些训练参数来进行整体Normalization, 本质上是减小训练阶段的随机性。 因此, BatchNormalization也提供了Regularization的作用, 实际应用中证明, NB在防止过拟合方面确实也有相当好的表现。
论文原文是这么说的:
When training with Batch Normalization, a training example is seen in conjunction with other examples in the mini-batch, and the training network no longer producing deterministic values for a given training example. In our experiments, we found this effect to be advantageous to the generalization of the network. Whereas Dropout (Srivastava et al., 2014) is typically used to reduce over-fitting, in a batch-normalized network we found that it can be either removed or reduced in strength.
SofaSofa数据科学社区DS面试题库 DS面经