Late Fusion of Local Indexing and Deep Feature Scores for Fast Image-to-Video Search on Large-Scale Databases
Low cost visual representation and fast query-by-example content search are two challenging objectives which should be supplied for web-scale visual retrieval task on moderate hardwares. In this paper, we introduce a fast yet robust method that ensures these two objectives by obtaining the state-of-the-art results for the image-to-video search scenario. For this purpose, we present critical improvements to the commonly used indexing and visual representation techniques by promoting faster, better and modest retrieval performance. Also, we boost the effectiveness of the method for visual distortions by exploiting the individual decision scores of local and global descriptors in the query time. By this way, local content descriptors effectively depict copy/duplicate scenes with large geometric deformations, while global descriptors are more practical for the near-duplicate and semantic search. Experiments are conducted on the large-scale Stanford I2V dataset. The experimental results show that the method is effective in terms of complexity and query processing time for large-scale visual retrieval scenarios, even if local and global representations are used together. Moreover, the proposed method is quite accurate and obtains state-of-the art performance based on the mAP score on the dataset. Lastly, we report additional mAP scores after updating the ground annotations obtained by the retrieval results of the proposed method which demonstrates the actual performance more clearly.
READ FULL TEXT