Fast Hand Detection in Collaborative Learning Environments
Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordings in collaborative learning environments. More specifically, we develop long-term hand detection methods that can deal with partial occlusions and dramatic changes in appearance. Our approach integrates object-detection, followed by time projections, clustering, and small region removal to provide effective hand detection over long videos. The hand detector achieved average precision (AP) of 72 intersection over union (IoU). The detection results were improved to 81 using our optimized approach for data augmentation. The method runs at 4.7x the real-time with AP of 81 the number of false-positive hand detections by 80 from 0.2 to 0.5. The overall hand detection system runs at 4x real-time.
READ FULL TEXT