CoRT: Complementary Rankings from Transformers
Recent approaches towards passage retrieval have successfully employed representations from pretrained Language Models(LMs) with large effectiveness gains. However, due to high computational cost those approaches are usually limited to re-ranking scenarios. The candidates in such a scenario are typically retrieved by scalable bag-of-words retrieval models such as BM25. Although BM25 has proven decent performance as a first-stage ranker, it tends to miss relevant passages. In this context we propose CoRT, a framework and neural first-stage ranking model that leverages contextual representations from transformer-based language models to complement candidates from term-based ranking functions while causing no significant delay. Using the MS MARCO dataset, we show that CoRT significantly increases first-stage ranking quality and recall by complementing BM25 with missing candidates. Consequently, we found subsequent re-rankers achieve superior results while requiring less candidates to saturate ranking quality. Finally, we demonstrate that with CoRT a representation-focused retrieval at web-scale can be realized with latencies as low as BM25.
READ FULL TEXT