Extending Detection with Forensic Information
For over a quarter century, security-relevant detection has been driven by models learned from input features collected from real or simulated environments. An artifact (e.g., network event, potential malware sample, suspicious email) is deemed malicious or non-malicious based on its similarity to the learned model at run-time. However, the training of the models has been historically limited to only those features available at run time. In this paper, we consider an alternate model construction approach that trains models using forensic "privileged" information--features available at training time but not at runtime--to improve the accuracy and resilience of detection systems. In particular, we adapt and extend recent advances in knowledge transfer, model influence, and distillation to enable the use of forensic data in a range of security domains. Our empirical study shows that privileged information increases detection precision and recall over a system with no privileged information: we observe up to 7.7 error for fast-flux bot detection, 8.6 malware classification, and 16.9 limitations and applications of different privileged information techniques in detection systems. Such techniques open the door to systems that can integrate forensic data directly into detection models, and therein provide a means to fully exploit the information available about past security-relevant events.
READ FULL TEXT