《ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar based Deep Neural-network Accelerator》

技术2025-05-02 55

Paper：ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar based Deep Neural-network Accelerator Authors: Zhuoran Song ;Xiaoyao Liang ; Li Jiang Affiliations: Shanghai Jiao Tong University Published in: TCAD

Contens

AbstractIntroductionBackground and MotivationFaults Tolerance Method for SLP and MLPSoftware and Hardware Co-design for Convolutional Neural NetworksSelf-compensating Mechanism for SAFsConclusion

Abstract

Only consider about two categories of RRAM-crossbar imperfections: variations and SAFs.Optimize the AFT algorithm which they proposed in previous research.Solve the problem from SLP to MLP to CNN to DNN.Aiming to minimize SWV. Bipartite-matching method: small weight synapse map on large variation memristor; large weight synapse map on small variation memristor. Using KM algorithm to find the minimal-weight perfect-matching in a bipartite-graph G.Dynamic adjustment means try empirical value to ensure accuracy.Bit-wise Redundant Mechanism, which use a pair of memristor cells in a black dashed box denotes one weight. In this box, a zero weight will change its value according to the resistance variation. We should detect the resistance variations before mapping the weight.Software/Hardware co-design: Off-device training make accuracy achieve

\theta

before on-device training make accuracy achieve

\gamma

.Experimental results show that the proposed method can guarantee

\leq 1.1\%

loss of accuracy for resistance variations in MLP and CNN. Moreover, the proposed method can guarantee

\leq 1\%

loss of accuracy even when stuck-at-faults (SAF) rate =

20\%

Introduction

Neural network has solved a lot of complex problems and NN applications are both compute- and memory- intensive. So, we use FPGA and ASICs to accelerate these aoolications, because of the high design cost of FPGA and the narrow application range of ASICs. High energy-efficiency accelerator with low hardware cost and small design cost is required and highly desirable for future data-intensive NN computing.

RRAM-crossbar accelerators have some desirable characteristics such as

The small footprintNon-volatilityLow-power consumptionNatural matrix-vector-multiplication

This paper focus on two categories of RRAM-crossbar imperfections:

The resistance variations caused by fabrication imperfection.The SAFs.

[16] don’t covered the resistance variations. [13]train CNN purely on RRAM-crossbars decreases the RRAM lifetime. So, the pure on-device training methods are therefore not well suitable for tolerating the process imperfections.

[17][18]train model off device, it leads to accumulate and magnify the errors. We should dedicate variation-tolerance mechanism and training method targeting for the MLP and CNN.

In this work, the 4 contributions are:

For the process imperfections of SLP, we propose a bipartite-matching method to prevent the large-weight synapses from being mapped to the imperfect memristor cells, and an off-device training algorithm by leveraging the inherent self-healing capability of NNs.For the resistance variations of MLP, we propose a dynamic adjustment mechanism which extends the off-device training algorithm to alleviate the accumulation of errors across multiple layers.For the resistance variations of CNN, we propose software/hardware co-design and a threshold co-training algorithm.For SAFs, we propose a self-compensating mechanism.

Background and Motivation

Mapping CNN to RRAM-crossbars

The pooling layer is typically implemented by CMOS ciruits. The input data(digital signals) are converted to a vector of analog signals by DAC, which are fed into the RRAM-crossbars.Multiple filters are stored in the RRAM-crossbars. The full-connected layer is implemented by RRAM-crossbars.

Weight Mapping Scheme

We assume one weight is mapped to one memristor cell. Hence, we assume the IV characteristic of devices is linear in this paper. This paper assume the cells are ideal state.

Sources of Variations and Faults

In this paper, focus on solving resistance variation caused by fabrication. Then introduce the SAFs[15]. The march-C algorithm[26]-[28] and squeeze-search algorithm[15] are capable of testing the memristors with SAFs.

Variation and Fault Tolerance Methods

The drawbacks of Vortex[17]. We propose a algorithm in previous research[18]. But the AFT algorithm is not applicable for deep neural networks.

Motivation All the existing solutions for further improvement of the accuracy is to explore the self-healing capability of NN. We train NN with the error. This is an old idea.The errors in MLP are magnified/accumulated during the forward propagation. So MLP becomes hard to train than SLP. We propose the dynamic adjustment mechanism.To achieve high accuracy without sacrificing the robustness of RRAM-crossbars, we propose the software and hardware co-design method to overcome the resistance variationsin CNNs.To overcome the SAFs in DNN, we propose a self-compensating mechanism.

Faults Tolerance Method for SLP and MLP

Bipartite-matching Method

SWV_{pq} = \begin{cases} \sum_{j=1}^n |w_{pj} - t_{qj} |, &for \ resistance \ variation\\ \sum_{j=1}^n |w_{pj} - w_{max} |, &for\ SA0\ fault\\ \sum_{j=1}^n |w_{pj} - w_{min} |, &for\ SA1\ fault\\ \end{cases}

Summed weighted deviations(SWV). For SA0 fault, the resistance is zero, so the conductance is

w_{max}

. The change of weight is

t_{qj}\leftarrow w_{pj}\cdot e^{\theta_{qj}};\theta\sim N(0,\sigma^2)

[25]

We use greedy mapping algorithm and Bipartite-matching to find the value which derive the smallest SWV. But this method can’t resole when a weight matrix can always have a column full of large weights.

An off-device NN Training Algorithm

If there is an error occurs on a neuron. Adjusting input neurons correlated with this neuron to compensat it.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CPknxYne-1593835686756)(…/image/weights_compensat.png)]

Using $W_{ij}\leftarrow W_{ij}\cdot t^{-1}, t = 2$ to reduce weight. For MLP. we adjust t without reasons. Dynamica adjustment means than we try some values make sure that the accuracy is increased.\

Software and Hardware Co-design for Convolutional Neural Networks

The methods for SLP and MLP are not good for CNN.

Bit-wise Redundant Mechanism

We propose Bit-wise Redundant Mechanism, which use a pair of memristor cells in a black dashed box denotes one weight. In this box, a zero weight will change its value according to the resistance variation. We should detect the resistance variations before mapping the weight.

Threshold Co-training Algorithm

We training CNN off-device after the accuracy achieve $\theta$ we using on-device training[13] until the accuracy achieve $\gamma$ .

Self-compensating Mechanism for SAFs

The probability of only SA0/SA1 is more than other cases. We mainly focus on the first case.

SA1 We enfore

F = 0

W = 0

to make weight will be fixed after back-propagation So if the weight is zero, it will not effect the accuracy.SA0 Since almost 90% weights of NN are between

[- 0.015, 0.015]

, we can change the neighboring cell in the pair to make the weight to be zero.

Conclusion

The proposed method can guarantee $\leq 1.1\%$ loss of accuracy for resistance variations in MLP and CNN. Moreover, the proposed method can guarantee $\leq 1\%$ loss of accuracy even when stuck-at-faults (SAF) rate = $20\%$ .

In order to improve writing skills, I will write some of blogs in English. Please call me if there is any mistake(details, words, sentences, grammar etc.).

Processed: 0.014, SQL: 9