Facial Action Unit Detection Using Attention and Relation Learning
Attention mechanism has recently attracted increasing attentions in the area of facial action unit (AU) detection. By finding the region of interest (ROI) of each AU with the attention mechanism, AU related local features can be captured. Most existing attention based AU detection works use prior knowledge to generate fixed attentions or refine the predefined attentions within a small range, which limits their capacity to model various AUs. In this paper, we propose a novel end-to-end weakly-supervised attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise attentions and spatial attentions are learned to select and extract AU related local features. Moreover, pixel-level relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Extensive experiments on BP4D and DISFA benchmarks demonstrate that our framework (i) outperforms the state-of-the-art methods for AU detection, and (ii) can find the ROI of each AU and capture the relations among AUs adaptively.
READ FULL TEXT