Visual Tracking via Shallow and Deep Collaborative Model
In this paper, we propose a robust tracking method based on the collaboration of a generative model and a discriminative classifier, where features are learned by shallow and deep architectures, respectively. For the generative model, we introduce a block-based incremental learning scheme, in which a local binary mask is constructed to deal with occlusion. The similarity degrees between the local patches and their corresponding subspace are integrated to formulate a more accurate global appearance model. In the discriminative model, we exploit the advances of deep learning architectures to learn generic features which are robust to both background clutters and foreground appearance variations. To this end, we first construct a discriminative training set from auxiliary video sequences. A deep classification neural network is then trained offline on this training set. Through online fine-tuning, both the hierarchical feature extractor and the classifier can be adapted to the appearance change of the target for effective online tracking. The collaboration of these two models achieves a good balance in handling occlusion and target appearance change, which are two contradictory challenging factors in visual tracking. Both quantitative and qualitative evaluations against several state-of-the-art algorithms on challenging image sequences demonstrate the accuracy and the robustness of the proposed tracker.
READ FULL TEXT