《ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar based Deep Neural-network Accelerator》

    技术2025-05-02  13

    Paper:ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar based Deep Neural-network Accelerator Authors: Zhuoran Song ;Xiaoyao Liang ; Li Jiang Affiliations: Shanghai Jiao Tong University Published in: TCAD

    Contens

    AbstractIntroductionBackground and MotivationFaults Tolerance Method for SLP and MLPSoftware and Hardware Co-design for Convolutional Neural NetworksSelf-compensating Mechanism for SAFsConclusion

    Abstract

    Only consider about two categories of RRAM-crossbar imperfections: variations and SAFs.Optimize the AFT algorithm which they proposed in previous research.Solve the problem from SLP to MLP to CNN to DNN.Aiming to minimize SWV. Bipartite-matching method: small weight synapse map on large variation memristor; large weight synapse map on small variation memristor. Using KM algorithm to find the minimal-weight perfect-matching in a bipartite-graph G.Dynamic adjustment means try empirical value to ensure accuracy.Bit-wise Redundant Mechanism, which use a pair of memristor cells in a black dashed box denotes one weight. In this box, a zero weight will change its value according to the resistance variation. We should detect the resistance variations before mapping the weight.Software/Hardware co-design: Off-device training make accuracy achieve θ \theta θ before on-device training make accuracy achieve γ \gamma γ.Experimental results show that the proposed method can guarantee ≤ 1.1 % \leq 1.1\% 1.1% loss of accuracy for resistance variations in MLP and CNN. Moreover, the proposed method can guarantee ≤ 1 % \leq 1\% 1% loss of accuracy even when stuck-at-faults (SAF) rate = 20 % 20\% 20%.

    Introduction

    Neural network has solved a lot of complex problems and NN applications are both compute- and memory- intensive. So, we use FPGA and ASICs to accelerate these aoolications, because of the high design cost of FPGA and the narrow application range of ASICs. High energy-efficiency accelerator with low hardware cost and small design cost is required and highly desirable for future data-intensive NN computing.

    RRAM-crossbar accelerators have some desirable characteristics such as

    The small footprintNon-volatilityLow-power consumptionNatural matrix-vector-multiplication

    This paper focus on two categories of RRAM-crossbar imperfections:

    The resistance variations caused by fabrication imperfection.The SAFs.

    [16] don’t covered the resistance variations. [13]train CNN purely on RRAM-crossbars decreases the RRAM lifetime. So, the pure on-device training methods are therefore not well suitable for tolerating the process imperfections.

    [17][18]train model off device, it leads to accumulate and magnify the errors. We should dedicate variation-tolerance mechanism and training method targeting for the MLP and CNN.

    In this work, the 4 contributions are:

    For the process imperfections of SLP, we propose a bipartite-matching method to prevent the large-weight synapses from being mapped to the imperfect memristor cells, and an off-device training algorithm by leveraging the inherent self-healing capability of NNs.For the resistance variations of MLP, we propose a dynamic adjustment mechanism which extends the off-device training algorithm to alleviate the accumulation of errors across multiple layers.For the resistance variations of CNN, we propose software/hardware co-design and a threshold co-training algorithm.For SAFs, we propose a self-compensating mechanism.

    Background and Motivation

    Mapping CNN to RRAM-crossbars

    The pooling layer is typically implemented by CMOS ciruits. The input data(digital signals) are converted to a vector of analog signals by DAC, which are fed into the RRAM-crossbars.Multiple filters are stored in the RRAM-crossbars. The full-connected layer is implemented by RRAM-crossbars.

    Weight Mapping Scheme

    We assume one weight is mapped to one memristor cell. Hence, we assume the IV characteristic of devices is linear in this paper. This paper assume the cells are ideal state.

    Sources of Variations and Faults

    In this paper, focus on solving resistance variation caused by fabrication. Then introduce the SAFs[15]. The march-C algorithm[26]-[28] and squeeze-search algorithm[15] are capable of testing the memristors with SAFs.

    Variation and Fault Tolerance Methods

    The drawbacks of Vortex[17]. We propose a algorithm in previous research[18]. But the AFT algorithm is not applicable for deep neural networks.

    Motivation All the existing solutions for further improvement of the accuracy is to explore the self-healing capability of NN. We train NN with the error. This is an old idea.The errors in MLP are magnified/accumulated during the forward propagation. So MLP becomes hard to train than SLP. We propose the dynamic adjustment mechanism.To achieve high accuracy without sacrificing the robustness of RRAM-crossbars, we propose the software and hardware co-design method to overcome the resistance variationsin CNNs.To overcome the SAFs in DNN, we propose a self-compensating mechanism.

    Faults Tolerance Method for SLP and MLP

    Bipartite-matching Method S W V p q = { ∑ j = 1 n ∣ w p j − t q j ∣ , f o r   r e s i s t a n c e   v a r i a t i o n ∑ j = 1 n ∣ w p j − w m a x ∣ , f o r   S A 0   f a u l t ∑ j = 1 n ∣ w p j − w m i n ∣ , f o r   S A 1   f a u l t SWV_{pq} = \begin{cases} \sum_{j=1}^n |w_{pj} - t_{qj} |, &for \ resistance \ variation\\ \sum_{j=1}^n |w_{pj} - w_{max} |, &for\ SA0\ fault\\ \sum_{j=1}^n |w_{pj} - w_{min} |, &for\ SA1\ fault\\ \end{cases} SWVpq=j=1nwpjtqj,j=1nwpjwmax,j=1nwpjwmin,for resistance variationfor SA0 faultfor SA1 fault Summed weighted deviations(SWV). For SA0 fault, the resistance is zero, so the conductance is w m a x w_{max} wmax. The change of weight is t q j ← w p j ⋅ e θ q j ; θ ∼ N ( 0 , σ 2 ) t_{qj}\leftarrow w_{pj}\cdot e^{\theta_{qj}};\theta\sim N(0,\sigma^2) tqjwpjeθqj;θN(0,σ2)[25]

    We use greedy mapping algorithm and Bipartite-matching to find the value which derive the smallest SWV. But this method can’t resole when a weight matrix can always have a column full of large weights.

    An off-device NN Training Algorithm

    If there is an error occurs on a neuron. Adjusting input neurons correlated with this neuron to compensat it.

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CPknxYne-1593835686756)(…/image/weights_compensat.png)]

    Using W i j ← W i j ⋅ t − 1 , t = 2 W_{ij}\leftarrow W_{ij}\cdot t^{-1}, t = 2 WijWijt1,t=2 to reduce weight. For MLP. we adjust t without reasons. Dynamica adjustment means than we try some values make sure that the accuracy is increased.\

    Software and Hardware Co-design for Convolutional Neural Networks

    The methods for SLP and MLP are not good for CNN.

    Bit-wise Redundant Mechanism

    We propose Bit-wise Redundant Mechanism, which use a pair of memristor cells in a black dashed box denotes one weight. In this box, a zero weight will change its value according to the resistance variation. We should detect the resistance variations before mapping the weight.

    Threshold Co-training Algorithm

    We training CNN off-device after the accuracy achieve θ \theta θ we using on-device training[13] until the accuracy achieve γ \gamma γ.

    Self-compensating Mechanism for SAFs

    The probability of only SA0/SA1 is more than other cases. We mainly focus on the first case.

    SA1 We enfore F = 0 F=0 F=0 W = 0 W=0 W=0 to make weight will be fixed after back-propagation So if the weight is zero, it will not effect the accuracy.SA0 Since almost 90% weights of NN are between [ − 0.015 , 0.015 ] [-0.015,0.015] [0.015,0.015], we can change the neighboring cell in the pair to make the weight to be zero.

    Conclusion

    The proposed method can guarantee ≤ 1.1 % \leq 1.1\% 1.1% loss of accuracy for resistance variations in MLP and CNN. Moreover, the proposed method can guarantee ≤ 1 % \leq 1\% 1% loss of accuracy even when stuck-at-faults (SAF) rate = 20 % 20\% 20%.

    In order to improve writing skills, I will write some of blogs in English. Please call me if there is any mistake(details, words, sentences, grammar etc.).

    Processed: 0.018, SQL: 9