Micro-Architectural features as soft-error induced fault executions markers in embedded safety-critical systems: a preliminary study
Radiation-induced soft errors are one of the most challenging issues in Safety Critical Real-Time Embedded System (SACRES) reliability, usually handled using different flavors of Double Modular Redundancy (DMR) techniques. This solution is becoming unaffordable due to the complexity of modern micro-processors in all domains. This paper addresses the promising field of using Artificial Intelligence (AI) based hardware detectors for soft errors. To create such cores and make them general enough to work with different software applications, microarchitectural attributes are a fascinating option as candidate fault detection features. Several processors already track these features through dedicated Performance Monitoring Unit (PMU). However, there is an open question to understand to what extent they are enough to detect faulty executions. Exploiting the capability of gem5 to simulate real computing systems, perform fault injection experiments and profile microarchitectural attributes (i.e., gem5 Stats), this paper presents the results of a comprehensive analysis regarding the potential attributes to detect soft error and the associated models that can be trained with these features.
READ FULL TEXT