EARL: Speedup Transformer-based Rankers with Pre-computed Representation

04/28/2020
by   Luyu Gao, et al.
0

Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval. However, their performance gains come at a steep computational cost. This paper presents a novel Embed Ahead Rank Later (EARL) framework, which speeds-up Transformer-based rankers by pre-computing representations and keeping online computation shallow. EARL dis-entangles the attention in a typical Transformer-based ranker into three asynchronous tasks and assign each to a dedicated Transformer: query understanding, document understanding, and relevance judging. With such a ranking framework, query and document token representations can be offline computed and reused. We also propose a new judger transformer block that keeps online relevance judging light and shallow. Our experiments demonstrate that EARL can be as effective as previous state-of-the-art BERT rankers in accuracy while substantially faster in evaluation time.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset