An Approach to Dependability Enhancement of Cache Memories

Document Type : Original Article

Authors

1 Associate Professor, Shahid Bahonar University of Kerman, Kerman, Iran

2 Master's degree, Shahid Bahonar University of Kerman, Kerman, Iran

Abstract

The modern processors that consist of large caches are very vulnerable to transient errors. Due to the importance of this problem, coding methods are adopted for protection against errors. The cost of reliability should be incurred to optimize the use of energy and area. This paper proposes an approach to dependability enhancement in unprotected caches of processors. For this purpose, the low-cost tag error mitigation mechanism is adopted to analyze the reliability of caches in modern processors. Benefiting from the tag Hamming distance increase technique, the proposed approach has a much lower overhead and decreases the false hit rate to zero.

Keywords


Smiley face

[1]     Lotfi, N. Saxena, R. Bramley, P. Racunas and P. Shirvani, "Low Overhead Tag Error Mitigation for GPU Architectures," 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Luxembourg City, 2018, pp. 314-321.
[2]     H. Wen and W. Zhang, "Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture," 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2019, pp. 1-6, doi: 10.1109/HPEC.2019.8916239.
[3]     R. Baumann, “The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction,” in Proceedings of International Electron Devices Meeting, 2002.
[4]     C. Constantinescu, “Trends and challenges in VLSI circuit reliability,” IEEE Micro, vol. 23, no. 4, 2003.
[5]     A. Mahmoud, S. K. S. Hari, M. B. Sullivan, T. Tsai and S. W. Keckler, "Optimizing Software-Directed Instruction Replication for GPU Error Detection," SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 2018, pp. 842-854, doi: 10.1109/SC.2018.00070.
[6]     R. W. Hamming, “Error detecting and error correcting codes,” The Bell System Technical Journal, vol. 29, no. 2, pp. 147–160, April 1950.
[7]     P. Reviriego, S. Pontarelli, M. Ottavi and J. A. Maestro, "FastTag: A Technique to Protect Cache Tags Against Soft Errors," in IEEE Transactions on Device and Materials Reliability, vol. 14, no. 3, pp. 935-937, Sept. 2014.
[8]     A. Gendler, A. Bramnik, A. Szapiro and Y. Sazeides, "Don’t Correct the Tags in a Cache, Just Check Their Hamming Distance from the Lookup Tag," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, 2018, pp. 571-582, doi: 10.1109/HPCA.2018.00055.
[9]     J. Hong and S. Kim, "Smart ECC Allocation Cache Utilizing Cache Data Space," in IEEE Transactions on Computers, vol. 66, no. 2, pp. 368-374, 1 Feb. 2017.
[10] H. Farbeh, L. Delshadtehrani, H. Kim and S. Kim, "ECC-United Cache: Maximizing Efficiency of Error Detection/Correction Codes in Associative Cache Memories," in IEEE Transactions on Computers, doi: 10.1109/TC.2020.2994067.
[11] S. Wang, J. Hu and S. G. Ziavras, "Replicating Tag Entries for Reliability Enhancement in Cache Tag Arrays," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 4, pp. 643-654, April 2012.
[12] J. Hong, J. Kim and S. Kim, "Exploiting Same Tag Bits to Improve the Reliability of the Cache Memories," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 2, pp. 254-265, Feb. 2015.
[13] H. Farbeh, F. Mozafari, M. Zabihi and S. G. Miremadi, "RAW-Tag: Replicating in Altered Cache Ways for Correcting Multiple-Bit Errors in Tag Array," in IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 4, pp. 651-664, 1 July-Aug. 2019.
[14] Antonio González, Mateo Valero, Nigel Topham, and Joan M. Parcerisa. Eliminating cache conflict misses through XOR-based placement functions. In Proceedings of the 11th international conference on Supercomputing (ICS ’97). Association for Computing Machinery, New York, NY, USA, 76–83. 1997. DOI:https://doi.org/10.1145/263580.263599
[15]  Zhao Zhang, Zhichun Zhu and Xiaodong Zhang, "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000, Monterey, CA, USA, 2000, pp. 32-41.

[16] Ghazi Maghribi, Saeed, Alemi, Hadi. A new way to identify the blind of the initial state of the synchronous hash after the channel encoder. Electronic and Cyber Defense, 1400; 9 (1): 19-27.
[17] M. Kharbutli, Y. Solihin and Jaejin Lee, "Eliminating conflict misses using prime number-based cache indexing," in IEEE Transactions on Computers, vol. 54, no. 5, pp. 573-586, May 2005.
[18] R. Ubal, B. Jang, P. Mistry, D. Schaa and D. Kaeli, "Multi2Sim: A simulation framework for CPU-GPU computing," 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, 2012, pp. 335-344.
[19] Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi and S. Eidenbenz, "PPT-GPU: Scalable GPU Performance Modeling," in IEEE Computer Architecture Letters, vol. 18, no. 1, pp. 55-58, 1 Jan.-June 2019, doi: 10.1109/LCA.2019.2904497.
[20] https://github.com/Multi2Sim/
[21] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009, pp. 469-480
[22] C. W. Slayman, "Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations," in IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 397-404, Sept. 2005, doi: 10.1109/TDMR.2005.856487.
Volume 10, Issue 1 - Serial Number 37
Serial No. 37, Spring Quarterly
May 2022
Pages 1-10
  • Receive Date: 30 January 2021
  • Revise Date: 01 August 2021
  • Accept Date: 11 December 2021
  • Publish Date: 22 May 2022