Incorporating Visual Layout Structures for Scientific Text Classification

06/01/2021
by   Zejiang Shen, et al.
18

Classifying the core textual components of a scientific paper-title, author, body text, etc.-is a critical first step in automated scientific document understanding. Previous work has shown how using elementary layout information, i.e., each token's 2D position on the page, leads to more accurate classification. We introduce new methods for incorporating VIsual LAyout structures (VILA), e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance. We show that the I-VILA approach, which simply adds special tokens denoting boundaries between layout structures into model inputs, can lead to +1 4.5 F1 Score improvements in token classification tasks. Moreover, we design a hierarchical model H-VILA that encodes these layout structures and record a up-to 70 without hurting prediction accuracy. The experiments are conducted on a newly curated evaluation suite, S2-VLUE, with a novel metric measuring VILA awareness and a new dataset covering 19 scientific disciplines with gold annotations. Pre-trained weights, benchmark datasets, and source code will be available at https://github.com/allenai/VILAhttps://github.com/allenai/VILA.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset