MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource...
In Natural Language Processing (NLP), predicting linguistic structures, ...
We provide a reproducible, end-to-end demonstration of vector search wit...
In this work, we conceptualize the learning process as information
compr...
The rise of large language models (LLMs) had a transformative impact on
...
Traditionally, sparse retrieval systems relied on lexical representation...
BEIR is a benchmark dataset for zero-shot evaluation of information retr...
Noticing the urgent need to provide tools for fast and user-friendly
qua...
Popularized by the Differentiable Search Index, the emerging paradigm of...
Market research surveys are a powerful methodology for understanding con...
The ever-increasing size of language models curtails their widespread ac...
Supervised ranking methods based on bi-encoder or cross-encoder architec...
Anserini is a Lucene-based toolkit for reproducible information retrieva...
This paper presents the AToMiC (Authoring Tools for Multimedia Content)
...
The advent of multilingual language models has generated a resurgence of...
We present Spacerini, a modular framework for seamless building and
depl...
Various techniques have been developed in recent years to improve dense
...
Recent progress in information retrieval finds that embedding query and
...
This paper introduces a method called Sparsified Late Interaction for
Mu...
Industry practitioners always face the problem of choosing the appropria...
Reproducibility is an ideal that no researcher would dispute "in the
abs...
While dense retrieval has been shown effective and efficient across task...
Deep neural networks (DNNs) are often used for text classification tasks...
The application of natural language processing (NLP) to cancer pathology...
End-to-end automatic speech recognition systems represent the state of t...
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) ...
While differential privacy and gradient compression are separately
well-...
Real-time 3D mapping is a critical component in many important applicati...
Query expansion is an effective approach for mitigating vocabulary misma...
Tokenization is a crucial step in information retrieval, especially for
...
Large-scale diffusion neural networks represent a substantial milestone ...
Pre-trained transformers has declared its success in many NLP tasks. One...
There exists a wide variety of efficiency methods for natural language
p...
Most real-world problems that machine learning algorithms are expected t...
Lexical and semantic matching capture different successful approaches to...
Dense retrievers encode documents into fixed dimensional embeddings. How...
Current pre-trained language model approaches to information retrieval c...
Dense retrieval models using a transformer-based bi-encoder design have
...
With the recent success of dense retrieval methods based on bi-encoders,...
Recent rapid advancements in deep pre-trained language models and the
in...
Neural retrieval models are generally regarded as fundamentally differen...
Sparse lexical representation learning has demonstrated much progress in...
Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the ...
Learned sparse and dense representations capture different successful
ap...
Recent advances in retrieval models based on learned sparse representati...
One key feature of dense passage retrievers (DPR) is the use of separate...
This paper outlines a conceptual framework for understanding recent
deve...
We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual
...
Recent developments in representational learning for information retriev...
Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public
...