EARL: Speedup Transformer-based Rankers with Pre-computed Representation
Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval. However, their performance gains come at a steep computational cost. This paper presents a novel Embed Ahead Rank Later (EARL) framework, which speeds-up Transformer-based rankers by pre-computing representations and keeping online computation shallow. EARL dis-entangles the attention in a typical Transformer-based ranker into three asynchronous tasks and assign each to a dedicated Transformer: query understanding, document understanding, and relevance judging. With such a ranking framework, query and document token representations can be offline computed and reused. We also propose a new judger transformer block that keeps online relevance judging light and shallow. Our experiments demonstrate that EARL can be as effective as previous state-of-the-art BERT rankers in accuracy while substantially faster in evaluation time.
READ FULL TEXT