Deep Learning Approaches for Image Retrieval and Pattern Spotting in Ancient Documents
This paper describes two approaches for content-based image retrieval and pattern spotting in document images using deep learning. The first approach uses a pre-trained CNN model to cope with the lack of training data, which is fine-tuned to achieve a compact yet discriminant representation of queries and image candidates. The second approach uses a Siamese Convolution Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset to provide the similarity-based feature maps. In both methods, the learned representation scheme considers feature maps of different sizes which are evaluated in terms of retrieval performance. A robust experimental protocol using two public datasets (Tobacoo-800 and DocExplore) has shown that the proposed methods compare favorably against state-of-the-art document image retrieval and pattern spotting methods.
READ FULL TEXT