Learning to Search on Manifolds for 3D Pose Estimation of Articulated Objects
This paper focuses on the challenging problem of 3D pose estimation of a diverse spectrum of articulated objects from single depth images. A novel structured prediction approach is considered, where 3D poses are represented as skeletal models that naturally operate on manifolds. Given an input depth image, the problem of predicting the most proper articulation of underlying skeletal model is thus formulated as sequentially searching for the optimal skeletal configuration. This is subsequently addressed by convolutional neural nets trained end-to-end to render sequential prediction of the joint locations as regressing a set of tangent vectors of the underlying manifolds. Our approach is examined on various articulated objects including human hand, mouse, and fish benchmark datasets. Empirically it is shown to deliver highly competitive performance with respect to the state-of-the-arts, while operating in real-time (over 30 FPS).
READ FULL TEXT