Paper:ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar based Deep Neural-network Accelerator Authors: Zhuoran Song ;Xiaoyao Liang ; Li Jiang Affiliations: Shanghai Jiao Tong University Published in: TCAD
Neural network has solved a lot of complex problems and NN applications are both compute- and memory- intensive. So, we use FPGA and ASICs to accelerate these aoolications, because of the high design cost of FPGA and the narrow application range of ASICs. High energy-efficiency accelerator with low hardware cost and small design cost is required and highly desirable for future data-intensive NN computing.
RRAM-crossbar accelerators have some desirable characteristics such as
The small footprintNon-volatilityLow-power consumptionNatural matrix-vector-multiplicationThis paper focus on two categories of RRAM-crossbar imperfections:
The resistance variations caused by fabrication imperfection.The SAFs.[16] don’t covered the resistance variations. [13]train CNN purely on RRAM-crossbars decreases the RRAM lifetime. So, the pure on-device training methods are therefore not well suitable for tolerating the process imperfections.
[17][18]train model off device, it leads to accumulate and magnify the errors. We should dedicate variation-tolerance mechanism and training method targeting for the MLP and CNN.
In this work, the 4 contributions are:
For the process imperfections of SLP, we propose a bipartite-matching method to prevent the large-weight synapses from being mapped to the imperfect memristor cells, and an off-device training algorithm by leveraging the inherent self-healing capability of NNs.For the resistance variations of MLP, we propose a dynamic adjustment mechanism which extends the off-device training algorithm to alleviate the accumulation of errors across multiple layers.For the resistance variations of CNN, we propose software/hardware co-design and a threshold co-training algorithm.For SAFs, we propose a self-compensating mechanism.The pooling layer is typically implemented by CMOS ciruits. The input data(digital signals) are converted to a vector of analog signals by DAC, which are fed into the RRAM-crossbars.Multiple filters are stored in the RRAM-crossbars. The full-connected layer is implemented by RRAM-crossbars.
Weight Mapping SchemeWe assume one weight is mapped to one memristor cell. Hence, we assume the IV characteristic of devices is linear in this paper. This paper assume the cells are ideal state.
Sources of Variations and FaultsIn this paper, focus on solving resistance variation caused by fabrication. Then introduce the SAFs[15]. The march-C algorithm[26]-[28] and squeeze-search algorithm[15] are capable of testing the memristors with SAFs.
Variation and Fault Tolerance MethodsThe drawbacks of Vortex[17]. We propose a algorithm in previous research[18]. But the AFT algorithm is not applicable for deep neural networks.
Motivation All the existing solutions for further improvement of the accuracy is to explore the self-healing capability of NN. We train NN with the error. This is an old idea.The errors in MLP are magnified/accumulated during the forward propagation. So MLP becomes hard to train than SLP. We propose the dynamic adjustment mechanism.To achieve high accuracy without sacrificing the robustness of RRAM-crossbars, we propose the software and hardware co-design method to overcome the resistance variationsin CNNs.To overcome the SAFs in DNN, we propose a self-compensating mechanism.We use greedy mapping algorithm and Bipartite-matching to find the value which derive the smallest SWV. But this method can’t resole when a weight matrix can always have a column full of large weights.
An off-device NN Training AlgorithmIf there is an error occurs on a neuron. Adjusting input neurons correlated with this neuron to compensat it.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CPknxYne-1593835686756)(…/image/weights_compensat.png)]
Using W i j ← W i j ⋅ t − 1 , t = 2 W_{ij}\leftarrow W_{ij}\cdot t^{-1}, t = 2 Wij←Wij⋅t−1,t=2 to reduce weight. For MLP. we adjust t without reasons. Dynamica adjustment means than we try some values make sure that the accuracy is increased.\
The methods for SLP and MLP are not good for CNN.
Bit-wise Redundant MechanismWe propose Bit-wise Redundant Mechanism, which use a pair of memristor cells in a black dashed box denotes one weight. In this box, a zero weight will change its value according to the resistance variation. We should detect the resistance variations before mapping the weight.
Threshold Co-training AlgorithmWe training CNN off-device after the accuracy achieve θ \theta θ we using on-device training[13] until the accuracy achieve γ \gamma γ.
The probability of only SA0/SA1 is more than other cases. We mainly focus on the first case.
SA1 We enfore F = 0 F=0 F=0 W = 0 W=0 W=0 to make weight will be fixed after back-propagation So if the weight is zero, it will not effect the accuracy.SA0 Since almost 90% weights of NN are between [ − 0.015 , 0.015 ] [-0.015,0.015] [−0.015,0.015], we can change the neighboring cell in the pair to make the weight to be zero.The proposed method can guarantee ≤ 1.1 % \leq 1.1\% ≤1.1% loss of accuracy for resistance variations in MLP and CNN. Moreover, the proposed method can guarantee ≤ 1 % \leq 1\% ≤1% loss of accuracy even when stuck-at-faults (SAF) rate = 20 % 20\% 20%.
In order to improve writing skills, I will write some of blogs in English. Please call me if there is any mistake(details, words, sentences, grammar etc.).