MSR: Multi-Scale Shape Regression for Scene Text Detection
State-of-the-art scene text detection techniques predict quadrilateral boxes which are prone to localization errors while dealing with long or curved text lines in scenes. This paper presents a novel multi-scale shape regression network (MSR) that is capable of locating scene texts of arbitrary orientations, shapes and lengths accurately. The MSR detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. The detection by linking of dense boundary points also enables accurate localization of scene texts of arbitrary orientations and shapes whereas most existing techniques using quadrilaterals often include undesired background to the ensuing text recognition. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Extensive experiments over several public datasets show that MSR obtains superior detection performance for both curved and arbitrarily oriented text lines of different lengths, e.g. 80.7 f-score for the CTW1500, 81.7 f-score for the MSRA-TD500, etc.
READ FULL TEXT