合肥工业大学《计算机体系结构》实验报告(2020年版)

    技术2024-11-08  17

      内容有点多。   在用matlab绘图时时间比较紧(赶在deadline之前),因此横纵坐标与图例的英文描述可能有些语法错误或者用词不当,但博主懒得重新再跑一遍程序了(图有点多)。在报告末尾附了绘图的matlab代码,有需要的朋友可以自己改一改。   2017级的学生应该都已经交实验报告了吧,如果有人来抄博主这个写得稀烂的报告,弄不好两个都得零分,害怕……(愿2017级的校友同学昨天[2020-07-03]就已经提交了)   博主在安装Simplescalar时也遇到了许多坑,改天找时间写个简单的教程;之前在上搜索的时候,没发现啥有帮助的资料,最后还得自己摸索。   发现有任何错误或者疑问欢迎在评论区反馈。

    文章目录

    实验一 流水线相关与指令调度1 实验内容1.1 流水线相关 1.2 指令调度2 实验方法3 结果与分析3.1流水线相关3.2 指令调度与循环展开 4 关键程序代码5 实验心得 实验二 分支预测1 实验内容2 实验方法3 结果与分析4 关键程序代码5 实验心得 实验三 缓存性能分析1 实验内容2 实验方法3 结果与分析4 关键程序代码5 实验心得 参考资料MATLAB绘图代码

    实验一 流水线相关与指令调度

    1 实验内容

    1.1 流水线相关

      a. 用WinMIPS64模拟器执行下列三个程序(任选一个):   求阶乘程序factorial.s   插入排序程序isort.s   乘法计算程序mult.s   分别以步进、连续、设置断点的方式运行程序,观察程序在流水线中的执行情况,观察CPU中寄存器和存储器的内容。   掌握WinMIPS64的操作和使用。   b. 用MIPS64汇编语言编写代码文件*.s,程序中应包括结构相关。用WinMIPS64模拟器运行你编写的程序,通过模拟:

    找出存在结构相关的指令对以及相应的结构相关的部件;记录由结构相关引起的暂停时钟周期数,计算暂停时钟周期数占总执行周期数的百分比;

      论述结构相关对 CPU 性能的影响,讨论解决结构相关的方法。

      c. 用MIPS64汇编语言编写代码文件*.s,程序中应包括数据相关。在不采用定向技术的情况下,用WinMIPS64/WinDLX模拟器运行存在数据相关的程序。记录数据相关引起的暂停时钟周期数以及程序执行的总时钟周期数,计算暂停时钟周期数占总执行周期数的百分比。   d. 在采用定向技术的情况下,用WinMIPS64模拟器再次运行程序。重复上述3中的工作,并计算采用定向技术后性能提高的倍数。

    1.2 指令调度

      a. 用指令调度技术解决流水线中的结构相关与数据相关

      i. 用MIPS64汇编语言编写代码文件*.s,程序中应包括数据相关与结构相关(你可以自己设置各个功能单元的延迟时间)   ii. 用WinMIPS64模拟器运行你所写的程序。记录程序执行过程中各种相关发生的次数、发生相关的指令组合,以及程序执行的总时钟周期数;   iii. 采用指令调度技术对程序进行指令调度,消除相关(手工调度_);   iv. 用WinMIPS64模拟器运行调度后的程序,观察程序在流水线中的执行情况,记录程序执行的总时钟周期数;   v. 根据记录结果,比较调度前和调度后的性能。论述指令调度对于提高CPU性能的意义。

      b. 用循环展开、寄存器换名以及指令调度提高性能

      i. 用MIPS64汇编语言编写代码文件*.s,程序中包含一个循环次数为4的整数倍的简单循环;   ii. 用WinMIPS64模拟器运行该程序。记录执行过程中各种相关发执行的时钟周期数;   iii. 将循环展开3次,将4个循环体组成的代码代替原来的循环的修改。然后,对新的循环体进行寄存器换名和指令调度;   iv. 用WinMIPS64模拟器运行修改后的程序,记录执行过程中各种及程序执行的总时钟周期数;   v. 根据记录结果,比较循环展开、指令调度前后的性能。

    2 实验方法

      借助WinMIPS64模拟器运行MIPS汇编代码。

    3 结果与分析

    3.1流水线相关

    执行程序

    图 1 架构1

      插入排序程序isort.s   步进:

    图 2 single cycle 图 3 multiple cycle

      连续:

    图 4 连续

    设置断点:

    图 5 break point_single cycle_1 图 6 break point_single cycle_2 图 7 break point_single cycle_3 图 8 break point_single cycle_4 图 9 break point_single cycle_5 图 10 break point_single cycle_6

    结构相关

      data_and_structure_stalls_1.s   执行方式:   single cycle

    图 11 架构2 图 12 structural stall in EX(cycle 20)

      指令对:

    ADD.D F3,F3,F1 #norm+=coeff[0]; jal coeff_one ; call coeff_one daddi r2,r0,8 ; r2 = 8 L.D F1,coeff(r2) # F1=coeff[1]

      相关部件:

    EX

    图 13 structural stall in EX(cycle 28)

      指令对:

    ADD.D F3,F3,F1 #norm+=coeff[1]; jal coeff_two ; call coeff_two daddi r2,r0,16 ; r2 = 16 L.D F1,coeff(r2) # F1=coeff[2]

      相关部件:

    EX

    图 14 structural stall in EX(cycle 37)

      指令对:

    ADD.D F3,F3,F1 #norm+=coeff[2]; jal go_result ; call go_result go_result: halt out: halt

      相关部件:

    EX

      总周期数:41   结构相关周期数:3   占比:3 / 41 ≈ 7.32%

      解决结构相关的方法:

      为了避免结构相关,可以考虑采用资源重复的方法,比如在流水线机器中设置相互独立的指令存储器和数据存储器,也可以将cache分割成指令cache和数据cache。

    数据相关

      架构同架构2。

      data_stalls_3.s   执行方式:   single cycle   不采用Forwarding:

    图 15 RAW stall in ID(R10,cycle 47)

      总周期:53   指令数:26   CPI:2.038   数据相关周期:18   占比:18 / 53 ≈ 33.96%

      采用Forwarding:   总周期:42   指令数:26   CPI:1.615   数据相关周期:7   占比:7 / 42 ≈ 16.67%

      可见通过定向技术, 减少了数据相关, 缩短了程序的执行周期, 整个性能为原来的1.26倍。

    3.2 指令调度与循环展开

      架构:

    图 16 架构3

      初始代码:lab3_0_init.s   采用前向技术

    图 17 执行初始代码

      总周期:163   指令数:85   CPI:1.918   数据相关:121(121 RAW stalls)   结构相关:12   分支冲突:7(Branch Taken Stalls)

      分支优化代码:lab3_1_branch_optimised.s   采用前向技术

    图 18 执行分支优化代码

      总周期:153   指令数:78   CPI:1.962   数据相关:120(120 RAW stalls)   结构相关:12   分支冲突:5(5 Branch Taken Stalls)

      可见通过分支优化,缩短了程序的执行周期,整个性能为原来的1.07倍。

      调度代码:lab3_2_scheduled.s   采用前向技术

    图 19 执行指令调度代码

      总周期:135   指令数:78   CPI:1.731   数据相关:78(78 RAW stalls)   结构相关:6   分支冲突:5(Branch Taken Stalls)

      可见通过指令调度,缩短了程序的执行周期,整个性能为原来的1.21倍。

      循环展开代码:lab3_3_unrolled.s   采用前向技术

    图 20 执行循环展开代码

      总周期:141   指令数:72   CPI:1.958   数据相关:117(117 RAW stalls)   结构相关:12   分支冲突:2(Branch Taken Stalls)

      可见通过循环展开,缩短了程序的执行周期,整个性能为原来的1.16倍。

      循环展开且调度代码:lab3_4_unrolled_and_scheduled.s   采用前向技术

    图 21 执行循环展开且指令调度代码

      总周期:108   指令数:72   CPI:1.5   数据相关:36(36 RAW stalls)   结构相关:18   分支冲突:2(Branch Taken Stalls)

      可见通过循环展开与指令调度,缩短了程序的执行周期,整个性能为原来的1.51倍。

    4 关键程序代码

      (1)执行程序

    # # Insertion sort algorithm # See http://www.cs.ubc.ca/spider/harrison/Java/InsertionSortAlgorithm.java.html # Note use of MIPS register pseudo-names, and # for comments # .data array: .word 0x4F6961869342DC99,0x7A0B67101C85D9EE,0x5EF87A2B37CA911D,0x47EF58E8B7E01DD9 .word 0x79A74EAB20CB53C9,0x6D26753D06F8E483,0x70F313AF126C0B47,0x745232A4035F1EF5 .word 0x46036BDDE8D095FD,0x4DE3F1D89B5A43EA,0x5279659D102EABBA,0x4496CDA949E29089 .word 0x6D594E2009B7D04A,0x4CE57C0D55905DE5,0x4115A0AC78A1848B,0x5051DAA648B3BDA6 .word 0x71C3730CE11593C0,0x425A9FAE68370FC5,0x6B265F8485354426,0x4E935A849C713D01 .word 0x773110588E5170D7,0x5B133F183803A780,0x49A52D37525C362C,0x4A0C150C49D8A123 .word 0x7962EC77A41FB066,0x5D3A087AF3417D04,0x7076F96031DC3B2E,0x404EC3D105D02FDD .word 0x5484F578189A7A8B,0x65EA86F819037E03,0x4367E6F2AE35B27A,0x63C1CF869394DB43 .word 0x59421109269E583C,0x6B9F1B529C8598EF,0x4C877DCC129AF1BD,0x58401EDBF56D884F .word 0x754C5475E3F8BFCF,0x1111111111111111,0x786213BFF3FAE203,0x53F6C77223F8D4B5 .word 0x5304A0C74815DFBF,0x701BFCF2B7E84DED,0x72C3DEDE1BA476AD,0x557C05371C0A436C .word 0x741CECCDBAEBBBB3,0x577156E9E5C72202,0x641D1FEFF6E59822,0x623B6D2C45E6AFC6 .word 0x6976994C37A754F0,0x4CE48C6E6963A020,0x4EDDBCD1CF3CD3AC,0x706AAA8FC1AE08E4 .word 0x674DE62D8E4ACB59,0x791423B583AF7749,0x4589009608F70D0A,0x55159D9A3430F238 .word 0x70BD250BE3048518,0x6D1B60128C603831,0x5397AB7F0E29CEE8,0x58EF0102374A9A97 .word 0x625D9DBD94D1E2D1,0x5E8439437165FDF6,0x4F621F3A37353266,0x426B3ACC1149F170 .word 0x59D789FA7FA3F476,0x4C4353E0D30D6D4B,0x492F120FA02F0B1C,0x720DFD78A97CFF59 .word 0x5BC2140E14551D39,0x68718C039D4656B9,0x7FFFFFFFFFFFFFFF,0x48F63330CBC9A739 .word 0x6E47955AFD5F8C20,0x44972B6AD10F9D2A,0x46578121CA1151A1,0x46281A1E7672B320 .word 0x4094CC803E05BD98,0x5FF5B63C7812A363,0x6AF41E217F7612C5,0x4B7B4452B1E208AC .word 0x750F8A67FA5E72E4,0x51C8ECF29B5E8AD1,0x580550353D81B486,0x668CD4C5F3970ABF .word 0x480BEE00A16715AD,0x4888D5AC9EE02467,0x77C3DDBA62669040,0x48D55CDF7F706867 .word 0x720670341FE6E445,0x6CAE4383191C2CC9,0x4F9E28BAD0270344,0x46DAD4328A8A3979 .word 0x55B7AEB598729716,0x76D0F139C5FF97C5,0x4B876EB39C2DC380,0x781ADC2AD91E6FDF .word 0x53BDEAF8F4AA0625,0x624D7EA5B9A73772,0x75A02137A787850D,0x4259BDE1C33A32E6 len: .word 100 .text daddi $t0,$zero,8 # $t0 = i = 8 ld $t1,len($zero) # $t1 = len dsll $t1,$t1,3 # $t1 = len*8 for: slt $t2,$t0,$t1 # i < len? beqz $t2,out # yes - exit dadd $t3,$zero,$t0 # $t3=j=i ld $t4,array($t0) # $t4=B=a[i] loop: slt $t2,$zero,$t3 # j>0 ? beqz $t2,over # no -exit daddi $t5,$t3,-8 # $t5=j-1 ld $t6,array($t5) # get $t6=a[j-1] slt $t2,$t6,$t4 # >B ? beqz $t2,over sd $t6,array($t3) # a[j]=a[j-1] dadd $t3,$zero,$t5 # j-- j loop over: sd $t4,array($t3) # a[j] = B daddi $t0,$t0,8 # i++ j for out: halt   (2)结构相关 .data N_COEFFS: .word 3 coeff: .double 5.0,2.0,-3.0 N_SAMPLES: .word 3 sample: .double 1,2,3,4,5,6,7,8,9,10 result: .double 0,0,0,0,0,0,0,0,0,0 C_ZERO: .double 0.0 .text start: ld r1,N_COEFFS(r0) # r1 = N_COEFFS ld r2,N_SAMPLES(r0) # r2 = N_SAMPLES slt r3,r1,r2 # N_COEFFS < N_SAMPLES? bnez r3,smooth # yes - go to smooth beq r1,r2,smooth # branch N_COEFFS = N_SAMPLES halt smooth: L.D F3,C_ZERO(r0) # F3=norm=0.0; daddu r2,r0,r0 ; r2 = 0 ld r1,N_COEFFS(r2) # r1 = N_COEFFS L.D F1,coeff(r2) # F1=coeff[0] L.D F0,C_ZERO(r0) # F0=0 c.lt.d F1,F0 #coeff[0]<0 c.lt.d freg,freg - set FP flag if less than bc1t neg_coeff_zero #- branch to address if FP flag is true ADD.D F3,F3,F1 #norm+=coeff[0]; jal coeff_one ; call coeff_one neg_coeff_zero: SUB.D F3,F3,F1 #norm-=coeff[0]; jal coeff_one ; call coeff_one coeff_one: daddi r2,r0,8 ; r2 = 8 L.D F1,coeff(r2) # F1=coeff[1] c.lt.d F1,F0 #coeff[1]<0 c.lt.d freg,freg - set FP flag if less than bc1t neg_coeff_one #- branch to address if FP flag is true ADD.D F3,F3,F1 #norm+=coeff[1]; jal coeff_two ; call coeff_two neg_coeff_one: SUB.D F3,F3,F1 #norm-=coeff[1]; jal coeff_two ; call coeff_two coeff_two: daddi r2,r0,16 ; r2 = 16 L.D F1,coeff(r2) # F1=coeff[2] c.lt.d F1,F0 #coeff[2]<0 c.lt.d freg,freg - set FP flag if less than bc1t neg_coeff_two #- branch to address if FP flag is true ADD.D F3,F3,F1 #norm+=coeff[2]; jal go_result ; call go_result neg_coeff_two: SUB.D F3,F3,F1 #norm-=coeff[2]; jal go_result ; call go_result go_result: halt out: halt   (3)数据相关 .data N_COEFFS: .word 3 coeff: .double 1.0,25.0,3.0 N_SAMPLES: .word 3 sample: .double 1,2,3,4,5,6,7,8,9,10 result: .double 0,0,0,0,0,0,0,0,0,0 .text start: ld r1,N_COEFFS(r0) # r1 = N_COEFFS ld r2,N_SAMPLES(r0) # r2 = N_SAMPLES slt $t2,r1,r2 # N_COEFFS < N_SAMPLES? bnez $t2,smooth # yes - go to smooth beq r1,r2,smooth # branch N_COEFFS = N_SAMPLES halt smooth: daddu r2,r0,r0 ; r2 = 0 ld r1,N_COEFFS(r2) # r1 = N_COEFFS dsll r1,r1,3 # r1 = N_COEFFS*8 for: slt $t2,r2,r1 # i < N_COEFFS? beqz $t2,out # yes - exit ld r5,coeff(r2) # r5=a[i] daddi r2,r2,8 # i++ j for out: halt   (4)指令调度与循环展开 # lab3_0_init.s .data a: .double 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 b: .double 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 c: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 d: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 .text main: daddi r1, r0, 6 #r1 now contains n = 6 daddi r2, r0, 10 #r2 now contains 10 mtc1 r2, f1 #f1 now contains alpha = 10.0 dadd r2, r0, r0 #r2 now contains i = 0 loop: slt r5, r2, r1 #r5 = r2<r1 (i<n) beq r5, r0, exit #if r5 = 0, exit dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] s.d f6, c(r3) #c[i] = a[i]*b[i] mul.d f7, f6, f1 #f7 has c[i]*alpha add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha s.d f8, d(r3) #successfully stored f8 to d[i] daddi r2, r2, 1 #i++ j loop exit: halt # lab3_1_branch_optimised.s .data a: .double 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 b: .double 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 c: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 d: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 .text main: daddi r1, r0, 6 #r1 now contains n = 6 daddi r2, r0, 10 #r2 now contains 10 mtc1 r2, f1 #f1 now contains alpha = 10.0 dadd r2, r0, r0 #r2 now contains i = 0 daddi r4, r1, -1 loop: dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] s.d f6, c(r3) #c[i] = a[i]*b[i] mul.d f7, f6, f1 #f7 has c[i]*alpha add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha s.d f8, d(r3) #successfully stored f8 to d[i] daddi r2, r2, 1 #i++ slt r5, r4, r2 #r5 = r4<r2 (n-1<i) beq r5, r0, loop #if r5 = 0, loop exit: halt # lab3_2_scheduled.s .data a: .double 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 b: .double 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 c: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 d: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 .text main: daddi r1, r0, 6 #r1 now contains n = 6 daddi r2, r0, 10 #r2 now contains 10 mtc1 r2, f1 #f1 now contains alpha = 10.0 dadd r2, r0, r0 #r2 now contains i = 0 daddi r4, r1, -1 loop: dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 #get values l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] #loop meta daddi r2, r2, 1 #i++ slt r5, r4, r2 #r5 = r4<r2 (n-1<i) #end of loop meta mul.d f7, f6, f1 #f7 has c[i]*alpha add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha s.d f6, c(r3) #stored f6 to c[i] s.d f8, d(r3) #successfully stored f8 to d[i] beq r5, r0, loop #if r5 = 0, loop exit: halt #lab3_3_unrolled.s .data a: .double 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 b: .double 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 c: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 d: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 .text main: daddi r1, r0, 6 #r1 now contains n = 6 daddi r2, r0, 10 #r2 now contains 10 mtc1 r2, f1 #f1 now contains alpha = 10.0 dadd r2, r0, r0 #r2 now contains i = 0 daddi r4, r1, -1 loop: dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] s.d f6, c(r3) #c[i] = a[i]*b[i] mul.d f7, f6, f1 #f7 has c[i]*alpha add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha s.d f8, d(r3) #successfully stored f8 to d[i] daddi r2, r2, 1 #i++ dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] s.d f6, c(r3) #c[i] = a[i]*b[i] mul.d f7, f6, f1 #f7 has c[i]*alpha add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha s.d f8, d(r3) #successfully stored f8 to d[i] daddi r2, r2, 1 #i++ slt r5, r4, r2 #r5 = r4<r2 (n-1<i) beq r5, r0, loop #if r5 = 0, loop exit: halt # lab3_4_unrolled_and_scheduled.s .data a: .double 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 b: .double 2.0, 3.0, 4.0, 5.0, 6.0, 7.0 c: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 d: .double 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 .text main: daddi r1, r0, 6 #r1 now contains n = 6 daddi r2, r0, 10 #r2 now contains 10 mtc1 r2, f1 #f1 now contains alpha = 10.0 dadd r2, r0, r0 #r2 now contains i = 0 daddi r4, r1, -1 loop: dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] l.d f5, d(r3) #f5 contains d[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] #loop meta daddi r2, r2, 1 #i++ dsll r3, r2, 3 #r3 now contains 8*i or 8*r2 #end of loop meta mul.d f7, f6, f1 #f7 has c[i]*alpha l.d f2, a(r3) #f2 contains a[i] l.d f3, b(r3) #f3 contains b[i] add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha #This is the part between muli and addi that needs a stall and cannot be optimised by scheduling s.d f6, c(r3) #stored f6 to c[i] mul.d f6, f2, f3 #f6 has a[i]*b[i] s.d f8, d(r3) #successfully stored f8 to d[i] l.d f5, d(r3) #f5 contains d[i] mul.d f7, f6, f1 #f7 has c[i]*alpha #loop meta daddi r2, r2, 1 #i++ slt r5, r4, r2 #r5 = r4<r2 (n-1<i) #end of loop meta add.d f8, f5, f7 #f8 now has d[i] + c[i]*alpha #This is the part between muli and addi that needs a stall and cannot be optimised by scheduling s.d f6, c(r3) #stored f6 to c[i] s.d f8, d(r3) #successfully stored f8 to d[i] beq r5, r0, loop #if r5 = 0, loop exit: halt

    5 实验心得

      本次实验借助WinMIPS64进行。   通过本次实验,我掌握了WinMIPS64模拟器的使用,加深对计算机流水线基本概念的理解,进一步了解MIPS基本流水线各段的功能以及基本操作,同时加深对数据相关、 构相关的理解,了解相关对CPU性能的影响,学会了解决数据相关的方法,掌握如何使用定向技术来减少数据相关带来的暂停。加深对循环级并行性、指令调度技术、循环展开技术的理解,可以用循环展开、指令调度等技术来解决流水线中的相关问题以及了解了其对CPU性能的改进。   在使用指令调度消除相关的过程中,我也加深了对指令相关的影响的理解,同时大致掌握了各种消除相关的算法思想。

    实验二 分支预测

    1 实验内容

      本次实验使用分支预测模拟器sim-bpred,在4种预测器类型及不同的参数配置下运行测试程序,并比较、分析结果,使大家加深对动态分支预测机制的理解,并了解各种分支预测实现方式的优劣。

    2 实验方法

       SimpleScalar分支预测的实现方法:先进行分支方向探测,即是否采取分支(当然跳转指令和调用返回指令不用作这一步),接着是生成分支地址,对于调返指令,直接在RAS上作相关操作,普通分支指令则要利用BTB来进行地址探测,命中则生成地址。然后对两步综合,地址命中且分支预测为采取,返回分支目标地址;地址不命中且分支预测为采取,返回1;只要分支预测为不采取,就返回0。   重点分析针对条件分支指令的方向探测方法,主要有6种,三种静态:taken,nottanken,perfect;三种动态:bimod,2-level,combined。静态的方法顾名思义,只是perfect这种,按它的原意是不预测,直接把真正采取的下一条指令填入npc,而且它确实不需要调用。   对于三种动态方法,分别说明如下:   bimod是最普通的,即采用一个2bit宽的分支方向预测表,按分支地址查找,2bit分支预测器的判断和更新与课本上的一致。这种方式只有一个参数,就是分支预测表的长度。   2-level要复杂一些,它采用两级表格式,第一级是分支历史表,存放各组分支历史寄存器的值,第二级是全局/局部分支模式表,(全局或局部应是由表长相对于分支历史寄存器的长决定),它存放各分支历史模式的2bit预测器。在判断时用当前分支指令对应的历史寄存器值去索引二级表得到相应预测器值。更新时,把当前分支的方向左移入历史寄存器,并对使用过的2bit预测器作更新。它有四个参数,前三个是一级表长度,二级表长度,历史寄存器宽度,最后一个是异或标志。如果为1,则将历史寄存器的值与当前分支指令地址异或,用其结果再去索引二级模式表。

    3 结果与分析

      博主不是在虚拟机上做的,而是在阿里云云服务器上操作。

    图 22 操作系统内核版本与服务器硬件配置

      下面展示的是服务器操作Linux内核版本、服务器硬件配置与操作系统类型(Ubuntu 18.04.4)。

    敏感信息懒得遮住了,求大佬手下留情别攻击博主服务器……

    root@iZwz9bj4ryzwisxega006tZ:/home# clear root@iZwz9bj4ryzwisxega006tZ:/home# uname -a Linux iZwz9bj4ryzwisxega006tZ 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux root@iZwz9bj4ryzwisxega006tZ:/home# dmidecode |more # dmidecode 3.1 Getting SMBIOS data from sysfs. SMBIOS 2.8 present. 9 structures occupying 429 bytes. Table at 0x000F5850. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: SeaBIOS Version: 8c24b4c Release Date: 04/01/2014 Address: 0xE8000 Runtime Size: 96 kB ROM Size: 64 kB Characteristics: BIOS characteristics not supported Targeted content distribution is supported BIOS Revision: 0.0 Handle 0x0100, DMI type 1, 27 bytes System Information Manufacturer: Alibaba Cloud Product Name: Alibaba Cloud ECS Version: pc-i440fx-2.1 Serial Number: aaa22528-980a-4893-9b5f-a692b99af30b UUID: AAA22528-980A-4893-9B5F-A692B99AF30B Wake-up Type: Power Switch SKU Number: Not Specified Family: Not Specified Handle 0x0300, DMI type 3, 21 bytes Chassis Information Manufacturer: Alibaba Cloud Type: Other Lock: Not Present Version: pc-i440fx-2.1 Serial Number: Not Specified Asset Tag: Not Specified Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: Unknown OEM Information: 0x00000000 Height: Unspecified Number Of Power Cords: Unspecified Contained Elements: 0 Handle 0x0400, DMI type 4, 42 bytes Processor Information Socket Designation: CPU 0 Type: Central Processor Family: Other Manufacturer: Alibaba Cloud ID: 54 06 05 00 FF FB 8B 0F Version: pc-i440fx-2.1 Voltage: Unknown External Clock: Unknown Max Speed: Unknown Current Speed: Unknown Status: Populated, Enabled Upgrade: Other L1 Cache Handle: Not Provided L2 Cache Handle: Not Provided L3 Cache Handle: Not Provided Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Core Count: 1 Core Enabled: 1 Thread Count: 2 Characteristics: None Handle 0x1000, DMI type 16, 23 bytes Physical Memory Array Location: Other Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 2 GB Error Information Handle: Not Provided Number Of Devices: 1 Handle 0x1100, DMI type 17, 40 bytes Memory Device Array Handle: 0x1000 Error Information Handle: Not Provided Total Width: Unknown Data Width: Unknown Size: 2048 MB Form Factor: DIMM Set: None Locator: DIMM 0 Bank Locator: Not Specified Type: RAM Type Detail: Other Speed: Unknown Manufacturer: Alibaba Cloud Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Rank: Unknown Configured Clock Speed: Unknown Minimum Voltage: Unknown Maximum Voltage: Unknown Configured Voltage: Unknown Handle 0x1300, DMI type 19, 31 bytes Memory Array Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0007FFFFFFF Range Size: 2 GB Physical Array Handle: 0x1000 Partition Width: 1 Handle 0x2000, DMI type 32, 11 bytes System Boot Information Status: No errors detected Handle 0x7F00, DMI type 127, 4 bytes End Of Table root@iZwz9bj4ryzwisxega006tZ:/home# head -n 1 /etc/issue Ubuntu 18.04.4 LTS \n \l

      所选测试数据集:gcc、mcf、vortex、vpr。

      运行gcc数据集示例如图23所示。

    图 23 运行gcc数据集示例

      运行gcc测试数据集实验所得数据表1所示。

    表1 运行gcc测试数据集实验所得数据 Always takenalways not takenbimod(512))Bimod(1024)Two level (1,1024,8,0)Two level (1,64,6,1) sim_total_insn 200000000 200000000 200000000 200000000 200000000 200000000 sim_total_refs 82299557 82299557 82299557 82299557 82299557 82299557 sim_elapsed_time 9 9 12 12 12 12 sim_inst_rate 22222222.22 22222222.2222 16666666.6667 16666666.6667 16666666.6667 16666666.6667 sim_num_branches 38932651 38932651 38932651 38932651 38932651 38932651 sim_IPB 5.1371 5.1371 5.1371 5.1371 5.1371 5.1371 bpred_bimod.lookups 38932651 38932651 38932651 38932651 38932651 38932651 bpred_bimod.updates 38932651 38932651 38932651 38932651 38932651 38932651 bpred_bimod.addr_hits 24896713 22021899 33168606 33826488 34264216 34264216 bpred_bimod.dir_hirts 24896713 22021899 33796352 34459270 34903457 34903457 bpred_bimod.misses 14035938 16910752 5136299 4473381 4029194 4029194 bpred_bimod.jr_hits 3392575 3392575 2722268 2722268 2722268 2722268 bpred_bimod.jr_seen 3392575 3392575 3392575 3392575 3392575 3392575 bpred_bimod.jr_non_ras_hits.PP 3392575 3392575 338933 338933 338933 338933 bpred_bimod.jr_non_ras_seen.PP 3392575 3392575 999722 999722 999722 999722 bpred_bimod.bpred_addr_rate 0.6395 0.5656 0.8519 0.8688 0.8801 0.8801 bpred_bimod.bpred_dir_rate 0.6395 0.5656 0.8681 0.8851 0.8965 0.8965 bpred_bimod.bpred_jr_rate 1.0000 1.0000 0.8024 0.8024 0.8024 0.8024 bpred_bimod.bpred_jr_non_ras_rate.PP 1.0000 1.0000 0.3390 0.3390 0.3390 0.3390 bpred_bimod.retstack_pushes 0 0 2392862 2392862 2392862 2392862 bpred_bimod.retstack_pops 0 0 2392853 2392853 2392853 2392853 bpred_bimod.used_ras.PP 0 0 2392853 2392853 2392853 2392853 bpred_bimod.ras_hits.PP 0 0 2383335 2383335 2383335 2383335 bpred_bimod.ras_rate.PP 0.9960 0.9960 0.9960 0.9960

      运行gcc数据集所得数据对比图如图24所示

    图 24 运行gcc测试数据集数据对比

    运行mfc测试数据集示例如图25所示。

    图 25 运行mcf数据集示例

      运行mcf测试数据集所得数据如表2所示

    表2 运行mcf测试数据集所得数据 Always takenalways not takenbimod(512) Bimod(1024) Two level (1,1024,8,0)Two level (1,64,6,1)sim_total_insn277504665277504665200000000200000000200000000200000000sim_total_refs100392784100392784100392784748774957487749574877495sim_elapsed_time111210111111sim_inst_rate25227696.818223125388.750020000000.000018181818.181818181818.181818181818.1818sim_num_branches567038725670387237825248378252483782524837825248sim_IPB4.89394.89395.28755.28755.28755.2875bpred_bimod.lookups567038725670387237825248378252483782524837825248bpred_bimod.updates567038725670387237825248378252483782524837825248bpred_bimod.addr_hits397600754130302831592697316275663162787031627870bpred_bimod.dir_hirts397600754130302831603605316384613163876731638767bpred_bimod.misses16943797154008446221643618678761864816186481bpred_bimod.jr_hits531325853132583085664308566430856643085664bpred_bimod.jr_seen531325853132583100112310011231001123100112bpred_bimod.jr_non_ras_hits.PP5313258531325819993199931999319993bpred_bimod.jr_non_ras_seen.PP5313258531325824071240712407124071bpred_bimod.bpred_addr_rate0.70120.72840.83520.83610.83620.8362bpred_bimod.bpred_dir_rate0.70120.72840.83550.83640.83640.8364bpred_bimod.bpred_jr_rate1.00001.00000.99530.99530.99530.9953bpred_bimod.bpred_jr_non_ras_rate.PP1.00001.00000.83060.83060.83060.8306bpred_bimod.retstack_pushes003076045307604530760453076045bpred_bimod.retstack_pops003076041307604130760413076041bpred_bimod.used_ras.PP003076041307604130760413076041bpred_bimod.ras_hits.PP003065671306567130656713065671bpred_bimod.ras_rate.PP<error: divide by zero><error: divide by zero>0.99660.99660.99660.9966

      运行mcf测试数据集所得数据对比图如图25所示

    图 26 运行mcf测试数据集数据对比

      运行vortex测试数据集示例如图27所示。

    图 27 运行vortex测试数据集示例

      运行vortex测试数据集所得数据如表3所示。

    表3 运行vortex测试数据集所得数据 Always takenalways not takenbimod(512) Bimod(1024) Two level (1,1024,8,0)Two level (1,64,6,1)sim_total_insn437835437835437835437835437835437835sim_total_refs159898159898159898159898159898159898sim_elapsed_time111111sim_inst_rate437835.0000437835.0000437835.0000437835.0000437835.0000437835.0000sim_num_branches912109121091210912109121091210sim_IPB4.80034.80034.80034.80034.80034.8003bpred_bimod.lookups912109121091210912109121091210bpred_bimod.updates912109121091210912109121091210bpred_bimod.addr_hits646004190083463840308447684476bpred_bimod.dir_hirts646004190084487850558550085500bpred_bimod.misses26610493106723615557105710bpred_bimod.jr_hits479947994540454045404540bpred_bimod.jr_seen479947994799479947994799bpred_bimod.jr_non_ras_hits.PP47994799289289289289bpred_bimod.jr_non_ras_seen.PP47994799520520520520bpred_bimod.bpred_addr_rate0.70830.45940.91510.92130.92620.9262bpred_bimod.bpred_dir_rate0.70830.45940.92630.93250.93740.9374bpred_bimod.bpred_jr_rate1.00001.00000.94600.94600.94600.9460bpred_bimod.bpred_jr_non_ras_rate.PP1.00001.00000.55580.55580.55580.5558bpred_bimod.retstack_pushes004281428142814281bpred_bimod.retstack_pops004279427942794279bpred_bimod.used_ras.PP004279427942794279bpred_bimod.ras_hits.PP004251425142514251bpred_bimod.ras_rate.PP<error: divide by zero><error: divide by zero>0.99350.99350.99350.9935

      运行vortex测试数据集所得数据对比图如图28所示。

    图 28 运行vortex测试数据集数据对比图

      运行vpr测试数据集示例如图29所示。

    图 29 运行vpr测试数据集示例

    运行vpr测试数据集所得数据如表4所示。

    表4 运行vpr测试数据集所得数据 Always takenalways not takenbimod(512) Bimod(1024) Two level (1,1024,8,0)Two level (1,64,6,1)sim_total_insn279912799127991279912799127991sim_total_refs977297729772977297729772sim_elapsed_time111111sim_inst_rate27991.000027991.000027991.000027991.000027991.000027991.0000sim_num_branches528652865286528652865286sim_IPB5.29535.29535.29535.29535.29535.2953bpred_bimod.lookups528652865286528652865286bpred_bimod.updates528652865286528652865286bpred_bimod.addr_hits336127944489454945904590bpred_bimod.dir_hirts336127944710476948064806bpred_bimod.misses19252492576517480480bpred_bimod.jr_hits359359342342342342bpred_bimod.jr_seen359359359359359359bpred_bimod.jr_non_ras_hits.PP3593590000bpred_bimod.jr_non_ras_seen.PP35935914141414bpred_bimod.bpred_addr_rate0.63580.52860.84920.86060.86830.8683bpred_bimod.bpred_dir_rate0.63580.52860.89100.90220.90920.9092bpred_bimod.bpred_jr_rate1.00001.00000.95260.95260.95260.9526bpred_bimod.bpred_jr_non_ras_rate.PP1.00001.00000.00000.00000.00000.0000bpred_bimod.retstack_pushes00349349349349bpred_bimod.retstack_pops00345345345345bpred_bimod.used_ras.PP00345345345345bpred_bimod.ras_hits.PP00342342342342bpred_bimod.ras_rate.PP<error: divide by zero><error: divide by zero>0.99130.99130.99130.9913

    运行vpr测试数据集所得数据对比图如图30所示。

    图 30 运行vpr测试数据集数据对比

      从图24、图26、图28与图30可知,一般地,采用always not taken 方式程序的分支预测错误概率相较而言最高,always taken方式次之,其余的bimod方式与two-level adaptive方式不论参数如何,性能大致相当。

    4 关键程序代码

      本次实验所用命令如下所示。

    #gcc taken sim-bpred -bpred taken -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i #gcc no taken sim-bpred -bpred nottaken -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i #gcc bimod(512) sim-bpred -bpred:bimod 512 -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i # gcc Bimod(1024) sim-bpred -bpred:bimod 1024 -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i #gcc Two level (1,1024,8,0) sim-bpred -bpred:2lev 1 1024 8 0 -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i #gcc Two level (1,64,6,1) sim-bpred -bpred:2lev 1 64 6 1 -max:inst 200000000 ../../../cc1.ss -O2 ./cccp.i #mcf taken sim-bpred -bpred taken -max:inst 2000000000 ./mcf.ss ./inp.in #mcf no taken sim-bpred -bpred nottaken -max:inst 2000000000 ./mcf.ss ./inp.in #mcf bimod(512) sim-bpred -bpred:bimod 512 -max:inst 200000000 ./mcf.ss ./inp.in #mcf Bimod(1024) sim-bpred -bpred:bimod 1024 -max:inst 200000000 ./mcf.ss ./inp.in #mcf Two level (1,1024,8,0) sim-bpred -bpred:2lev 1 1024 8 0 -max:inst 200000000 ./mcf.ss ./inp.in #mcf Two level (1,64,6,1) sim-bpred -bpred:2lev 1 64 6 1 -max:inst 200000000 ./mcf.ss ./inp.in #vortex taken sim-bpred -bpred taken -max:inst 2000000000 ../../../../../Simplescalar-master/spec95-little/vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex no taken sim-bpred -bpred nottaken -max:inst 2000000000 ../../../../../Simplescalar-master/spec95-little/vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex bimod(512) sim-bpred -bpred:bimod 512 -max:inst 200000000 ../../../../../Simplescalar-master/spec95-little/vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Bimod(1024) sim-bpred -bpred:bimod 1024 -max:inst 200000000 ../../../../../Simplescalar-master/spec95-little/vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Two level (1,1024,8,0) sim-bpred -bpred:2lev 1 1024 8 0 -max:inst 200000000 ./vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Two level (1,64,6,1) sim-bpred -bpred:2lev 1 64 6 1 -max:inst 200000000 ../../../../../Simplescalar-master/spec95-little/vortex.ss ./bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vpr taken sim-bpred -bpred taken -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr no taken sim-bpred -bpred nottaken -max:inst 2000000000 -max:inst 200000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr bimod(512) sim-bpred -bpred:bimod 512 -max:inst 200000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Bimod(1024) sim-bpred -bpred:bimod 1024 -max:inst 200000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Two level (1,1024,8,0) sim-bpred -bpred:2lev 1 1024 8 0 -max:inst 200000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Two level (1,64,6,1) sim-bpred -bpred:2lev 1 64 6 1 -max:inst 200000000 ./vpr.ss ./arch.in ./net.in ./place.in

    5 实验心得

      在本次实验中,使用分支预测模拟器sim-bpred,在4种预测器类型及不同的参数配置下运行测试程序,并比较、分析结果,我加深对动态分支预测机制的理解,并了解各种分支预测实现方式的优劣。

    实验三 缓存性能分析

    1 实验内容

      通过实验和结果分析,理解缓存的各种参数对缓存性能的影响。

    2 实验方法

      (1)安装和测试SimpleScalar模拟器(利用模拟器自带的测试程序进行测试)。   (2)在基本配置情况下运行SPEC2000基准测试集下面的4个程序(请指明自己选的是哪些测试程序)统计Cache失效次数,并统计L2缓存的失效次数(注:配置二级缓存结构, 指令和数据合在一起)。   (3)改变Cache容量(*2,*4,*8,*64),运行相同的测试程序,并统计L2缓存的失效次数计算失效率,并对结果进行总结分析。   (4)改变Cache的相联度(2路,4路,8路,16路,64路),运行1中所选择的测试程序,并统计L2缓存的失效次数计算失效率,并对结果进行分析。   (5)改变Cache块大小(*2,*4,*8,*64),运行1中所选择的测试程序,并统计L2缓存的失效次数计算失效率,并进行分析。

    3 结果与分析

      我所选择的4个数据集分别为:bzip2、mcf、vortex与vpr。

      运行bzip2测试数据集示例如图31所示。

    图 31 运行bzip2测试数据集示例

      选定不同的Cache相联度对bzip2测试数据集进行测试所得数据如表5所示。(的Markdown不完全支持HTML,所以没办法绘制table斜线)

    表 5 选定不同的Cache相联度对bzip2测试数据集进行测试所得数据 Cache 相联度                    指标   ⨉2⨉4⨉8⨉16⨉64accesses50780 50780 50780 50780 50780hits47440 47391 47232 46993 46993misses3340 3389 3548 3787 3787replacements2316 2365 2524 2763 2763writebacks1958 1956 2025 2032 2041invalidations0 0 0 0 0miss_rate0.0658 0.0667 0.0699 0.0746 0.0746repl_rate0.04560.0466 0.0497 0.0544 0.0544wb_rate0.0386 0.0385 0.0399 0.0400 0.0402inv_rate0.0000 0.0000 0.0000 0.0000 0.0000

      选定不同的Cache相联度对bzip2测试数据集进行测试所得数据对比如图49所示。

    图 32 选定不同的Cache相联度对bzip2测试数据集进行测试所得数据对比

      选定不同的Cache块大小对bzip2测试数据集进行测试所得数据如表6所示。

    表 6 选定不同的Cache块大小对bzip2测试数据集进行测试所得数据 Cache 块大小                    指标   ⨉2⨉4⨉8⨉64accesses50780 50780 50780 50780hits37826 44230 47440 49067misses2954 6550 3340 1713replacements8858 4502 2316 1201writebacks7617 3851 1958 1003invalidations0 0 0 0miss_rate0.2551 0.1290 0.0658 0.0337repl_rate0.17440.0887 0.0456 0.0237wb_rate0.1500 0.0758 0.0386 0.0198inv_rate0.0000 0.0000 0.0000 0.0000

      选定不同的Cache块大小对bzip2测试数据集进行测试所得数据对比如图33所示

    图 33 选定不同的Cache块大小对bzip2测试数据集进行测试所得数据对比

      选定不同的Cache容量对bzip2测试数据集进行测试所得数据如下表所示。

    表 7 选定不同的Cache容量对bzip2测试数据集进行测试所得数据 Cache 容量                    指标   ⨉2⨉4⨉8⨉64accesses50780507805078050780hits46487465334656047786misses4293424742202994replacements422941193964946writebacks299029102766917invalidations0000miss_rate0.08450.08360.08310.0590repl_rate0.08330.08110.07810.0186wb_rate0.05890.05730.05450.0181inv_rate0.00000.00000.00000.0000

      选定不同的Cache容量对bzip2测试数据集进行测试所得数据对比如图34所示。

    图 34 选定不同的Cache容量对bzip2测试数据集进行测试所得数据对比

    选定不同的Cache相联度对mcf测试数据集进行测试所得数据如表8所示。

    表 8 选定不同的Cache相联度对mcf测试数据集进行测试所得数据 Cache 相联度                    指标   ⨉2⨉4⨉8⨉16⨉64accesses100435002 100435002 100435002 100435002 100435002hits91282103 91974221 92047776 92101882 92022155misses9152899 8460781 8387226 8333120 8412847replacements9151875 8459757 8386202 8332096 8411823writebacks4234344 4122739 4091243 4078597 4084706invalidations0 0 0 0 0miss_rate0.0911 0.0842 0.0835 0.0830 0.0838repl_rate0.09110.0842 0.0835 0.0830 0.0838wb_rate0.0422 0.0410 0.0407 0.0406 0.0407inv_rate0.0000 0.0000 0.0000 0.0000 0.0000

    选定不同的Cache相联度对mcf测试数据集进行测试所得数据对比如图35所示。

    图 35 选定不同的Cache相联度对mcf测试数据集进行测试所得数据对比

      选定不同的Cache块大小对mcf测试数据集进行测试所得数据如表9所示。

    表 9 选定不同的Cache块大小对mcf测试数据集进行测试所得数据 Cache 块大小                    指标   ⨉2⨉4⨉8⨉64accesses100435002 100435002 100435002 100435002 hits78211037 85942231 91282103 92680384 misses22223965 14492771 9152899 7754618 replacements22219869 14490723 9151875 7754106 writebacks13542641 17396647 14234344 2670910 invalidations0 0 0 0 miss_rate0.2213 0.1443 0.0911 0.0772 repl_rate0.22120.1443 0.0911 0.0772 wb_rate0.1348 0.0736 0.0422 0.0266 inv_rate0.0000 0.0000 0.0000 0.0000

      选定不同的Cache块大小对mcf测试数据集进行测试所得数据对比如图36所示。

    图 36 选定不同的Cache块大小对mcf测试数据集进行测试所得数据对比

      选定不同的Cache容量对mcf测试数据集进行测试所得数据如10表所示。

    表 10 选定不同的Cache容量对mcf测试数据集进行测试所得数据 Cache 容量                    指标   ⨉2⨉4⨉8⨉64accesses100435002100435002100435002100435002hits84950662867705038820400593203363misses1548434013664499122309977231639replacements1548427613664371122307417229591writebacks5473601524531749475063926368invalidations0000miss_rate0.15420.13610.12180.0720repl_rate0.15420.13610.12180.0720wb_rate0.05450.05220.04930.0391inv_rate0.00000.00000.00000.0000

      选定不同的Cache容量对mcf测试数据集进行测试所得数据对比如图37所示。

    图 37 选定不同的Cache容量对mcf测试数据集进行测试所得数据对比

      选定不同的Cache相联度对vortex测试数据集进行测试所得数据如表11所示。

    表 11 选定不同的Cache相联度对vortex测试数据集进行测试所得数据 Cache 相联度                    指标   ⨉2⨉4⨉8⨉16⨉64accesses179276 179276 179276 179276 179276 hits172023 172065 172007 171994 171981 misses7253 7211 7269 7282 7295 replacements6229 6187 6245 6258 6271 writebacks5973 5965 6017 6030 6047 invalidations0 0 0 0 0 miss_rate0.0405 0.0402 0.0405 0.0406 0.0407 repl_rate0.0347 0.0345 0.0348 0.0349 0.0350 wb_rate0.0333 0.0333 0.0336 0.0336 0.0337 inv_rate0.0000 0.0000 0.0000 0.0000 0.0000

      选定不同的Cache相联度对vortex测试数据集进行测试所得数据对比如图38所示。

    图 38 选定不同的Cache相联度对vortex测试数据集进行测试所得数据对比

      选定不同的Cache块大小对vortex测试数据集进行测试所得数据如表12所示。

    表 12 选定不同的Cache块大小对vortex测试数据集进行测试所得数据 Cache 块大小                    指标   ⨉2⨉4⨉8⨉64accesses179276 179276 179276 179276 hits151381 165145 172023 175493 misses27895 14131 7253 3783 replacements23799 12083 6229 3271 writebacks23036 11665 5973 3110 invalidations0 0 0 0 miss_rate0.1556 0.0788 0.0405 0.0211 repl_rate0.13280.0674 0.0347 0.0182 wb_rate0.1285 0.0651 0.0333 0.0173 inv_rate0.0000 0.0000 0.0000 0.0000

    选定不同的Cache块大小对vortex测试数据集进行测试所得数据对比如图39所示。

    图 39 选定不同的Cache块大小对vortex测试数据集进行测试所得数据对比

    选定不同的Cache容量对vortex测试数据集进行测试所得数据如表13所示。

    表 13 选定不同的Cache容量对vortex测试数据集进行测试所得数据 Cache 容量                    指标   ⨉2⨉4⨉8⨉64accesses179276 179276 179276 179276 hits169301 170926 171530 172144 misses9975 8350 7746 7132 replacements9911 8222 7490 5084 writebacks7927 7362 6998 4869 invalidations0 0 0 0 miss_rate0.0556 0.0466 0.0432 0.0398 repl_rate0.05530.0459 0.0418 0.0284 wb_rate0.0442 0.0411 0.0390 0.0272 inv_rate0.0000 0.0000 0.0000 0.0000

      选定不同的Cache容量对vortex测试数据集进行测试所得数据对比如图40所示。

    图 40 选定不同的Cache容量对vortex测试数据集进行测试所得数据对比

      选定不同的Cache相联度对vpr测试数据集进行测试所得数据如表14所示。

    表 14 选定不同的Cache相联度对vpr测试数据集进行测试所得数据 Cache 相联度                    指标   ⨉2⨉4⨉8⨉16⨉64accesses10789 10789 10789 10789 10789 hits10261 10261 10261 10261 10261 misses528 528 528 528 528 replacements10 0 0 0 writebacks0 0 0 0 0 invalidations0 0 0 0 0 miss_rate0.0489 0.0489 0.0489 0.0489 0.0489 repl_rate0.0001 0.0000 0.0000 0.0000 0.0000 wb_rate0.0000 0.0000 0.0000 0.0000 0.0000 inv_rate0.0000 0.0000 0.0000 0.0000 0.0000

      选定不同的Cache相联度对vpr测试数据集进行测试所得数据对比如图41所示。

    图 41 选定不同的Cache相联度对vpr测试数据集进行测试所得数据对比

      使用vpr测试数据集设置Cache块大小进行测试的示例如图42所示。

    图 42 使用vpr测试数据集设置Cache块大小进行测试示例

      选定不同的Cache块大小对vpr测试数据集进行测试所得数据如表15所示。

    表 15 选定不同的Cache块大小对vpr测试数据集进行测试所得数据 Cache 块大小                    指标   ⨉2⨉4⨉8⨉64accesses10789 10789 10789 10789 hits8877 9800 1026110497 misses1912 989 528292 replacements11 1 1 writebacks0 0 0 0 invalidations0 0 0 0 miss_rate0.1772 0.0917 0.04890.0271 repl_rate0.0001 0.0001 0.00010.0001 wb_rate0.0000 0.0000 0.0000 0.0000 inv_rate0.0000 0.0000 0.0000 0.0000

      选定不同的Cache块大小对vpr测试数据集进行测试所得数据对比图如图43所示。

    图 43 选定不同的Cache块大小对vpr测试数据集进行测试所得数据对比

      选定不同的Cache容量对vpr测试数据集进行测试所得数据如表16所示。

    表 16 选定不同的Cache容量对vpr测试数据集进行测试所得数据 Cache 容量                    指标   ⨉2⨉4⨉8⨉64accesses10789 10789 10789 10789 hits10139 10207 10216 10261 misses650 582 573 528 replacements586 454 317 0 writebacks508 414 292 0 invalidations0 0 0 0 miss_rate0.0602 0.0539 0.0531 0.0489 repl_rate0.05430.0421 0.0294 0.0000 wb_rate0.0471 0.0384 0.0271 0.0000 inv_rate0.0000 0.0000 0.0000 0.0000

      选定不同的Cache容量对vpr测试数据集进行测试所得数据对比如图44所示。

    图 44 选定不同的Cache容量对vpr测试数据集进行测试所得数据对比

      实验结论:   从以上的各数据对比图可得出以下结论   cache容量:   随着cache容量的不断增大,其失效次数和失效率在⼀定程度上有所减小,原因是增⼤了cache容量后,会减少了容量失效。但当容量增⼤到⼀定值后,失效率不再减⼩。   Cache相联度:   随着cache相联度的增⼤,各程序中cache失效率均⼤体呈下降趋势。因为随着相联度的提升,冲突失效会减⼩,却也会增⼤容量失效。   Cache块大小:   在⼀定范围内,增⼤cache块⼤⼩的确能够有效降低失效率,因为增加块大小会减少强制性失效,但当块大小增大到⼀定值时,失效率将增⼤。出现这种现象的原因是在增大块大小的同时,块的数量在随之减少,所以会增加冲突失效。

    4 关键程序代码

      本次实验所用命令如下所示。

    #例如:-cache:d11 dl1:2048:64:4:r,表示对一级数据cache进行配置,2048表示有2048组,64表示cache块大小为64 byte,4表示相联度为4,r表示替换策略为RANDOM。 #在此配置下,一级数据cache的容量为2048*64*4=512 KB。 #vpr capacity 2 sim-cache -cache:dl1 dl1:32:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr capacity 4 sim-cache -cache:dl1 dl1:64:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr capacity 8 sim-cache -cache:dl1 dl1:128:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr capacity 64 sim-cache -cache:dl1 dl1:1024:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr association 2 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr association 4 sim-cache -cache:dl1 dl1:256:32:4:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr association 8 sim-cache -cache:dl1 dl1:128:32:8:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr association 16 sim-cache -cache:dl1 dl1:64:32:16:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr association 64 sim-cache -cache:dl1 dl1:16:32:64:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Block Size 2 sim-cache -cache:dl1 dl1:2048:8:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Block Size 4 sim-cache -cache:dl1 dl1:1024:16:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Block Size 8 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in #vpr Block Size 64 sim-cache -cache:dl1 dl1:256:64:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vpr.ss ./arch.in ./net.in ./place.in ##################################################################################### #vortex capacity 2 sim-cache -cache:dl1 dl1:32:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex capacity 4 sim-cache -cache:dl1 dl1:64:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex capacity 8 sim-cache -cache:dl1 dl1:128:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex capacity 64 sim-cache -cache:dl1 dl1:1024:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex association 2 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex association 4 sim-cache -cache:dl1 dl1:256:32:4:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex association 8 sim-cache -cache:dl1 dl1:128:32:8:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex association 16 sim-cache -cache:dl1 dl1:64:32:16:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex association 64 sim-cache -cache:dl1 dl1:16:32:64:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Block Size 2 sim-cache -cache:dl1 dl1:2048:8:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Block Size 4 sim-cache -cache:dl1 dl1:1024:16:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Block Size 8 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k #vortex Block Size 64 sim-cache -cache:dl1 dl1:256:64:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./vortex.ss bendian.raw ./bendian.rnv ./bendian.wnv ./lendian.raw ./lendian.rnv ./lendian.wnv ./persons.1k ##################################################################################### #mcf capacity 2 sim-cache -cache:dl1 dl1:32:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf capacity 4 sim-cache -cache:dl1 dl1:64:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf capacity 8 sim-cache -cache:dl1 dl1:128:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf capacity 64 sim-cache -cache:dl1 dl1:1024:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf association 2 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf association 4 sim-cache -cache:dl1 dl1:256:32:4:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf association 8 sim-cache -cache:dl1 dl1:128:32:8:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf association 16 sim-cache -cache:dl1 dl1:64:32:16:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf association 64 sim-cache -cache:dl1 dl1:16:32:64:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf Block Size 2 sim-cache -cache:dl1 dl1:2048:8:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf Block Size 4 sim-cache -cache:dl1 dl1:1024:16:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf Block Size 8 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in #mcf Block Size 64 sim-cache -cache:dl1 dl1:256:64:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./mcf.ss ./inp.in ##################################################################################### #bzip2 capacity 2 sim-cache -cache:dl1 dl1:32:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 capacity 4 sim-cache -cache:dl1 dl1:64:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 capacity 8 sim-cache -cache:dl1 dl1:128:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 capacity 64 sim-cache -cache:dl1 dl1:1024:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 association 2 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 association 4 sim-cache -cache:dl1 dl1:256:32:4:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 association 8 sim-cache -cache:dl1 dl1:128:32:8:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 association 16 sim-cache -cache:dl1 dl1:64:32:16:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 association 64 sim-cache -cache:dl1 dl1:16:32:64:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 Block Size 2 sim-cache -cache:dl1 dl1:2048:8:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 Block Size 4 sim-cache -cache:dl1 dl1:1024:16:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 Block Size 8 sim-cache -cache:dl1 dl1:512:32:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random #bzip2 Block Size 64 sim-cache -cache:dl1 dl1:256:64:2:l -cache:dl2 none -cache:il2 none -max:inst 2000000000 ./bzip2.ss ./control ./input.random

    5 实验心得

      本次实验使用SimpleScalar模拟器通过本次实验,我加深了对Cache的基本概念、基本组织结构以及基本工作原理的理解,了解了Cache的容量、相联度、块大小对Cache性能的影响,掌握了降低Cache失效率的各种方法,以及这些方法对Cache性能提高的好处,同时也理解了Cache失效的产生原因以及Cache的三种失效,对cache这⼀结构的理解更加深入。

    参考资料

      链接1   链接2

    MATLAB绘图代码

      代码的第一行都是文件名,看名字应该都能猜到是画哪个图了。   实验2:

    %gcc_Branch_Predictor.m y=[14035938 16910752 5136299 4473381 4029194 4029194]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'taken', 'Not-taken', 'bimod(512)','Bimod(1024)','2-level-1024','2-level-64'}, 'FontSize', 16) legend('Misses', 'FontSize', 16); xlabel('Predictive method', 'FontSize', 16); ylabel('Total misses', 'FontSize', 16); title('The relevance between branch prediction failure efficiency and predictive method (based on gcc dataset)', 'FontSize', 16); %mcf_Branch_Predictor.m y=[16943797 15400844 6221643 6186787 6186481 6186481]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'taken', 'Not-taken', 'bimod(512)','Bimod(1024)','2-level-1024','2-level-64'}, 'FontSize', 16) legend('Misses', 'FontSize', 16); xlabel('Predictive method', 'FontSize', 16); ylabel('Total misses', 'FontSize', 16); title('The relevance between branch prediction failure efficiency and predictive method (based on mcf dataset)', 'FontSize', 16); %vortex_Branch_Predictor.m y=[26610 49310 6723 6155 5710 5710]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'taken', 'Not-taken', 'bimod(512)','Bimod(1024)','2-level-1024','2-level-64'}, 'FontSize', 16) legend('Misses', 'FontSize', 16); xlabel('Predictive method', 'FontSize', 16); ylabel('Total misses', 'FontSize', 16); title('The relevance between branch prediction failure efficiency and predictive method (based on vortex dataset)', 'FontSize', 16); %vpr_Branch_Predictor.m y=[1925 2492 576 517 480 480]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'taken', 'Not-taken', 'bimod(512)','Bimod(1024)','2-level-1024','2-level-64'}, 'FontSize', 16) legend('Misses', 'FontSize', 16); xlabel('Predictive method', 'FontSize', 16); ylabel('Total misses', 'FontSize', 16); title('The relevance between branch prediction failure efficiency and predictive method (based on vpr dataset)', 'FontSize', 16);   实验3 %bzip2_association.m y=[0.0658 ; 0.0667 ; 0.0699; 0.0746; 0.0746;]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'2-way','4-way','8-way','16-way', '64-way'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Number of way', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache coherence (based on bzip2 dataset)', 'FontSize', 16); %bzip2_Block_size.m y=[0.2551 ; 0.1290 ;0.0658 ;0.0337;]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Size of Cache block', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and the size of Cache block (based on bzip2 dataset)', 'FontSize', 16); %bzip2_capacity.m y=[0.0845; 0.0836 ;0.0831 ;0.0590]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Capacity of Cache', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache Capacity (based on bzip2 dataset)', 'FontSize', 16); %mcf_association.m y=[0.0911 0.0842 0.0835 0.0830 0.0838]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'2-way','4-way','8-way','16-way', '64-way'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Number of way', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache coherence (based on mcf dataset)', 'FontSize', 16); %mcf_Block_Size.m y=[0.2213 0.1443 0.0911 0.0772 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Size of Cache block', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and the size of Cache block (based on mcf dataset)', 'FontSize', 16); %mcf_capacity.m y=[0.1542 0.1361 0.1218 0.0720]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Capacity of Cache', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache Capacity (based on mcf dataset)', 'FontSize', 16); %vortex_association.m y=[0.0405 0.0402 0.0405 0.0406 0.0407 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'2-way','4-way','8-way','16-way', '64-way'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Number of way', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache coherence (based on vortex dataset)', 'FontSize', 16); %vortex_Block_size.m y=[0.1556 0.0788 0.0405 0.0211]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Size of Cache block', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and the size of Cache block (based on vortex dataset)', 'FontSize', 16); %vortex_capacity.m y=[0.0556 0.0466 0.0432 0.0398 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Capacity of Cache', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache Capacity (based on vortex dataset)', 'FontSize', 16); %vpr_association.m y=[0.0489 0.0489 0.0489 0.0489 0.0489 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'2-way','4-way','8-way','16-way', '64-way'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Number of way', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache coherence (based on vpr dataset)', 'FontSize', 16); %vpr_Block_size.m y=[0.1772 0.0917 0.0489 0.0271 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Size of Cache block', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and the size of Cache block (based on vpr dataset)', 'FontSize', 16); %vpr_capacity.m y=[0.0602 0.0539 0.0531 0.0489 ]; b=bar(y); grid on; ch = get(b,'children'); set(gca,'XTickLabel',{'*2','*4','*8','*64'}, 'FontSize', 16) legend('Miss rate', 'FontSize', 16); xlabel('Capacity of Cache', 'FontSize', 16); ylabel('Miss rate', 'FontSize', 16); title('The relevance between L2 cache failure efficiency and Cache Capacity (based on vpr dataset)', 'FontSize', 16);
    Processed: 0.013, SQL: 12