Towards Robust Detection of Adversarial Infection Vectors: Lessons Learned in PDF Malware
Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents and other media formats. These infection vectors hide embedded malicious code to the victim users, thus facilitating the use of social engineering techniques to infect their machines. In the last decade, machine-learning algorithms provided an effective defense against such threats, being able to detect malware embedded in various infection vectors. However, the existence of an arms race in an adversarial setting like that of malware detection has recently questioned their appropriateness for this task. In this work, we focus on malware embedded in PDF files, as a representative case of how such an arms race can evolve. We first provide a comprehensive taxonomy of PDF malware attacks, and of the various learning-based detection systems that have been proposed to detect them. Then, we discuss more sophisticated attack algorithms that craft evasive PDF malware oriented to bypass such systems. We describe state-of-the-art mitigation techniques, highlighting that designing robust machine-learning algorithms remains a challenging open problem. We conclude the paper by providing a set of guidelines for designing more secure systems against the threat of adversarial malicious PDF files.
READ FULL TEXT