文章目录
介绍网络参数优化方法网络效果代码实现
介绍
2019 年,Google 在 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 一文中提出了 EfficientNet。和 MnasNet 一样,EfficientNet 也使用了神经架构搜索方法。不同的是,EfficientNet 使用该方法搜索的是一个 Baseline 网络,称之为 EfficientNet-B0,然后对该 Baseline 网络的参数进行优化从而得到最终的网络。
网络参数优化方法
在以前的研究中可以看到,对卷积网络进行扩展可以获得更好的性能,这分为以下三类:
扩展网络深度:比如 ResNet,从18层扩展到200层;扩展网络宽度(增加卷积核数量);扩展网络输入图片的分辨率(比较少见)。
但是之前的研究只是关注这三者之一,如果要同时扩展这三者,则通常需要手工进行繁琐的网络优化,而效果也并不理想。下图是网络扩展的几种形式:
作者分别对同一个 Baseline 网络单独进行深度,宽度,图像分辨率扩展所带来的性能提升如下图所示: 图中,w、d、r 分别是网络宽度,网络高度,分辨率相对于 Baseline 网络的倍率。
从上图可以看到,三个维度中任一维度的放大都可以带来精度的提升,但随着倍率越来越大,提升却越来越小。
在不同的 d, r 组合下变动 w 时,效果如下: 从实验结果可以看出最高精度比之前已经有所提升,且不同的组合效果还不一样,最高可以到 82% 左右。因此,得到更高的精度以及效率的关键是平衡网络宽度,网络深度,图像分辨率三个维度的放缩倍率 (d, r, w)。
首先,作者采用 NAS 搜索的方式设计了一个有效的 Baseline 网络,命名为 EfficientNet-B0,网络的结构如下: 这个网络结构和 MnasNet 非常相似。另外在这个 MBConv 的模块中还增加了 SE 模块。
基于这个 Baseline 模型,作者提出了一种混合维度放大法 (compound scaling method),该方法使用一个放大系数
ϕ
\phi
ϕ 来决定三个维度的放大倍率:
第一步,首先固定
ϕ
\phi
ϕ 为 1,在这样一个小模型上做网格搜索 (grid search),得到了最佳系数为
α
=
1.2
,
β
=
1.1
,
γ
=
1.15
\alpha=1.2,\beta=1.1,\gamma=1.15
α=1.2,β=1.1,γ=1.15。第二步,固定
α
=
1.2
,
β
=
1.1
,
γ
=
1.15
\alpha=1.2,\beta=1.1,\gamma=1.15
α=1.2,β=1.1,γ=1.15,使用不同的混合系数
ϕ
\phi
ϕ(从 1 到 7) 来放大初代网络得到 EfficientNet-B1 ~ EfficientNet-B7。
网络效果
代码实现
import tensorflow
as tf
import math
def swish(x
):
return x
* tf
.nn
.sigmoid
(x
)
def round_filters(filters
, multiplier
):
depth_divisor
= 8
min_depth
= None
min_depth
= min_depth
or depth_divisor
filters
= filters
* multiplier
new_filters
= max(min_depth
, int(filters
+ depth_divisor
/ 2) // depth_divisor
* depth_divisor
)
if new_filters
< 0.9 * filters
:
new_filters
+= depth_divisor
return int(new_filters
)
def round_repeats(repeats
, multiplier
):
if not multiplier
:
return repeats
return int(math
.ceil
(multiplier
* repeats
))
def conv_bn(x
, filters
, kernel_size
, strides
, activation
=True):
x
= tf
.keras
.layers
.Conv2D
(filters
=filters
,
kernel_size
=kernel_size
,
strides
=strides
,
padding
='SAME')(x
)
x
= tf
.keras
.layers
.BatchNormalization
()(x
)
if activation
:
x
= swish
(x
)
return x
def depthwiseConv_bn(x
, kernel_size
, strides
):
x
= tf
.keras
.layers
.DepthwiseConv2D
(kernel_size
,
padding
='same',
strides
=strides
)(x
)
x
= tf
.keras
.layers
.BatchNormalization
()(x
)
x
= tf
.keras
.layers
.Activation
(tf
.nn
.relu6
)(x
)
return x
def Squeeze_excitation_layer(x
):
inputs
= x
squeeze
= inputs
.shape
[-1]/2
excitation
= inputs
.shape
[-1]
x
= tf
.keras
.layers
.GlobalAveragePooling2D
()(x
)
x
= tf
.keras
.layers
.Dense
(squeeze
)(x
)
x
= swish
(x
)
x
= tf
.keras
.layers
.Dense
(excitation
)(x
)
x
= tf
.keras
.layers
.Activation
('sigmoid')(x
)
x
= tf
.keras
.layers
.Reshape
((1, 1, excitation
))(x
)
x
= inputs
* x
return x
def MBConv_idskip(x
, filters
, drop_connect_rate
, kernel_size
, strides
, t
):
x_input
= x
x
= conv_bn
(x
, filters
=x
.shape
[-1] * t
, kernel_size
=1, strides
=1)
x
= depthwiseConv_bn
(x
, kernel_size
=kernel_size
, strides
=strides
)
x
= Squeeze_excitation_layer
(x
)
x
= swish
(x
)
x
= conv_bn
(x
, filters
=filters
, kernel_size
=1, strides
=1, activation
=False)
if strides
==1 and x
.shape
[-1] == x_input
.shape
[-1]:
if drop_connect_rate
:
x
= tf
.keras
.layers
.Dropout
(rate
=drop_connect_rate
)(x
)
x
= tf
.keras
.layers
.add
([x_input
, x
])
return x
def MBConv(x
, filters
, drop_connect_rate
, kernel_size
, strides
, t
, n
):
x
= MBConv_idskip
(x
, filters
, drop_connect_rate
, kernel_size
, strides
, t
)
for _
in range(1, n
):
x
= MBConv_idskip
(x
, filters
, drop_connect_rate
, kernel_size
, strides
=1, t
=t
)
return x
class EfficientNet(tf
.keras
.Model
):
def __init__(self
, width_coefficient
, depth_coefficient
, dropout_rate
, n_classes
=1000):
super().__init__
()
self
.width_coefficient
= width_coefficient
self
.depth_coefficient
= depth_coefficient
self
.dropout_rate
= dropout_rate
self
.n_classes
= n_classes
def call(self
, inputs
):
x
= conv_bn
(inputs
, round_filters
(32, self
.width_coefficient
), kernel_size
=3, strides
=2, activation
=True)
x
= MBConv
(x
, filters
=round_filters
(16, self
.width_coefficient
), kernel_size
=3,
drop_connect_rate
=self
.dropout_rate
, strides
=1, t
=1, n
=round_repeats
(1, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(24, self
.width_coefficient
), kernel_size
=3,
drop_connect_rate
=self
.dropout_rate
, strides
=2, t
=6, n
=round_repeats
(2, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(40, self
.width_coefficient
), kernel_size
=5,
drop_connect_rate
=self
.dropout_rate
, strides
=2, t
=6, n
=round_repeats
(2, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(80, self
.width_coefficient
), kernel_size
=3,
drop_connect_rate
=self
.dropout_rate
, strides
=2, t
=6, n
=round_repeats
(3, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(112, self
.width_coefficient
), kernel_size
=5,
drop_connect_rate
=self
.dropout_rate
, strides
=1, t
=6, n
=round_repeats
(3, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(192, self
.width_coefficient
), kernel_size
=5,
drop_connect_rate
=self
.dropout_rate
, strides
=2, t
=6, n
=round_repeats
(4, self
.depth_coefficient
))
x
= MBConv
(x
, filters
=round_filters
(320, self
.width_coefficient
), kernel_size
=3,
drop_connect_rate
=self
.dropout_rate
, strides
=1, t
=6, n
=round_repeats
(1, self
.depth_coefficient
))
x
= conv_bn
(x
, filters
=round_filters
(1280, self
.width_coefficient
), kernel_size
=1, strides
=1)
x
= tf
.keras
.layers
.GlobalAveragePooling2D
()(x
)
x
= tf
.keras
.layers
.Dropout
(rate
=self
.dropout_rate
)(x
)
predictions
= tf
.keras
.layers
.Dense
(self
.n_classes
, activation
='softmax')(x
)
return predictions
def get_efficient_net(width_coefficient
, depth_coefficient
, resolution
, dropout_rate
):
net
= EfficientNet
(width_coefficient
=width_coefficient
,
depth_coefficient
=depth_coefficient
,
dropout_rate
=dropout_rate
)
net
.build
(input_shape
=(None, resolution
, resolution
, 3))
return net
def efficient_net_b0():
return get_efficient_net
(1.0, 1.0, 224, 0.2)
def efficient_net_b1():
return get_efficient_net
(1.1, 1.2, 260, 0.3)
def efficient_net_b2():
return get_efficient_net
(1.2, 1.4, 300, 0.3)
def efficient_net_b3():
return get_efficient_net
(1.4, 1.8, 380, 0.4)
def efficient_net_b4():
return get_efficient_net
(1.6, 2.2, 456, 0.4)
def efficient_net_b5():
return get_efficient_net
(1.8, 2.6, 528, 0.5)
def efficient_net_b6():
return get_efficient_net
(2.0, 3.1, 600, 0.5)
def efficient_net_b7():
return get_efficient_net
(2.2, 3.6, 672, 0.5)