DADA: A Large-scale Benchmark and Model for Driver Attention Prediction in Accidental Scenarios
Driver attention prediction has recently absorbed increasing attention in traffic scene understanding and is prone to be an essential problem in vision-centered and human-like driving systems. This work, different from other attempts, makes an attempt to predict the driver attention in accidental scenarios containing normal, critical and accidental situations simultaneously. However, challenges tread on the heels of that because of the dynamic traffic scene, intricate and imbalanced accident categories. With the hypothesis that driver attention can provide a selective role of crash-object for assisting driving accident detection or prediction, this paper designs a multi-path semantic-guided attentive fusion network (MSAFNet) that learns the spatio-temporal semantic and scene variation in prediction. For fulfilling this, a large-scale benchmark with 2000 video sequences (named as DADA-2000) is contributed with laborious annotation for driver attention (fixation, saccade, focusing time), accident objects/intervals, as well as the accident categories, and superior performance to state-of-the-arts are provided by thorough evaluations. As far as we know, this is the first comprehensive and quantitative study for the human-eye sensing exploration in accidental scenarios. DADA-2000 is available at https://github.com/JWFangit/LOTVS-DADA.
READ FULL TEXT