Attention机制论文（Convolutional Block Attention Module）理解

技术2022-07-11 101

1. Attention机制很多，比较有代表的有：

SENet, 2017年ImageNet分类上的冠军，属于通道注意力机制。Google DeepMind提出的STN网络(Spatial Transformer Network），属于空间注意力机制。Google 的Attention is all your need。CBAM,也就是本篇论文，作者将通道注意力机制和空间注意力机制结合起来。

2. 作者给出的贡献主要有：

1. We propose a simple yet effective attention module (CBAM) that can be widely applied to boost representation power of CNNs.

提出了一种有效且简单的注意力模块，它能够被广泛地应用到增强CNN网络的泛化和表达能力。 2. We validate the effectiveness of our attention module through extensive ablation studies.

我们通过广泛的学习研究验证了所提出的注意力模块的有效性。 3. We verify that performance of various networks is greatly improved on the multiple benchmarks (ImageNet-1K, MS COCO, and VOC 2007) by plugging our light-weight module.

我们提出的模块嵌入CNN模型后，在ImageNet-1k，MS COCO， VOC2007数据集表现出了相比原来的基础网络结构有明显的改进。

3. 论文思路：

Given an intermediate feature map F ∈ R C×H×W as input, CBAM sequentially infers a 1D channel attention map M c ∈ R C×1×1 and a 2D spatial attentionmap M s ∈ R 1×H×W as illustrated in Fig. 1. The overall attention process can be summarized as: F ′ = M c (F) ⊗ F, F ′′ = M s (F ′ ) ⊗ F ′ , (1) where ⊗ denotes element-wise multiplication. During multiplication, the atten-tion values are broadcasted (copied) accordingly: channel attention values are broadcasted along the spatial dimension, and vice versa. F ′′ is the final refined output. Fig. 2 depicts the computation process of each attention map. The following describes the details of each attention module.

直接对公式进行理解：假设F是网络forward过程中的一个中间feature map, size 为[N, C, H, W],其中N为batch size, C为输入通道数，H为高度，W为宽度。M c (F)就是采用通道注意力机制所得来的权重系数，size为[N, C, 1, 1]，M s (F ′ ) 就是采用空间注意力机制所得来的权重系数，size为[N, 1, W, H]，那么最终结果 F ′′的size跟输入F的size不变，但是我们却增加了不同类型的权重系数相乘，而这些系数表达了哪些通道更有价值，哪些空间位置更值得关注。这就是注意力机制的本质含义，而训练过程中可以把这些参数学习出来。

4.Code

4.1 Channel Attention

class ChannelAttention(nn.Module): def __init__(self, in_planes, ratio=16): super(ChannelAttention, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.fc1 = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False) self.relu1 = nn.ReLU() self.fc2 = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x)))) max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x)))) out = avg_out + max_out return self.sigmoid(out)

4.2 Spatial Attention

class SpatialAttention(nn.Module): def __init__(self, kernel_size=7): super(SpatialAttention, self).__init__() assert kernel_size in (3, 7), 'kernel size must be 3 or 7' padding = 3 if kernel_size == 7 else 1 self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = torch.mean(x, dim=1, keepdim=True) #output size:[rows, 1] max_out, _ = torch.max(x, dim=1, keepdim=True) x = torch.cat([avg_out, max_out], dim=1) x = self.conv1(x) return self.sigmoid(x)

Processed: 0.016, SQL: 9