1. We propose a simple yet effective attention module (CBAM) that can be widely applied to boost representation power of CNNs.
提出了一种有效且简单的注意力模块,它能够被广泛地应用到增强CNN网络的泛化和表达能力。 2. We validate the effectiveness of our attention module through extensive ablation studies.
我们通过广泛的学习研究验证了所提出的注意力模块的有效性。 3. We verify that performance of various networks is greatly improved on the multiple benchmarks (ImageNet-1K, MS COCO, and VOC 2007) by plugging our light-weight module.
我们提出的模块嵌入CNN模型后,在ImageNet-1k,MS COCO, VOC2007数据集表现出了相比原来的基础网络结构有明显的改进。
Given an intermediate feature map F ∈ R C×H×W as input, CBAM sequentially infers a 1D channel attention map M c ∈ R C×1×1 and a 2D spatial attentionmap M s ∈ R 1×H×W as illustrated in Fig. 1. The overall attention process can be summarized as: F ′ = M c (F) ⊗ F, F ′′ = M s (F ′ ) ⊗ F ′ , (1) where ⊗ denotes element-wise multiplication. During multiplication, the atten-tion values are broadcasted (copied) accordingly: channel attention values are broadcasted along the spatial dimension, and vice versa. F ′′ is the final refined output. Fig. 2 depicts the computation process of each attention map. The following describes the details of each attention module.
直接对公式进行理解:假设F是网络forward过程中的一个中间feature map, size 为[N, C, H, W],其中N为batch size, C为输入通道数,H为高度,W为宽度。M c (F)就是采用通道注意力机制所得来的权重系数,size为[N, C, 1, 1],M s (F ′ ) 就是采用空间注意力机制所得来的权重系数,size为[N, 1, W, H],那么最终结果 F ′′的size跟输入F的size不变,但是我们却增加了不同类型的权重系数相乘,而这些系数表达了哪些通道更有价值,哪些空间位置更值得关注。这就是注意力机制的本质含义,而训练过程中可以把这些参数学习出来。