TVNet: Temporal Voting Network for Action Localization
We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6 methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0 combined with PGCN and 59.1 prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.
READ FULL TEXT