Image captioning is conventionally formulated as the task of generating
...
The rapid identification and accurate diagnosis of breast cancer, known ...
Vision-language pre-training (VLP) models have been demonstrated to be
e...
While large language models (LMs) have shown remarkable capabilities acr...
Absolute Pose Regression (APR) methods use deep neural networks to direc...
Training a Neural Radiance Field (NeRF) without pre-computed camera pose...
In this paper, we propose an end-to-end Retrieval-Augmented Visual Langu...
Swarm learning (SL) is an emerging promising decentralized machine learn...
Reinforcement Learning (RL) algorithms can solve challenging control pro...
Recently, introspective models like IntroVAE and S-IntroVAE have excelle...
We present the Pathways Autoregressive Text-to-Image (Parti) model, whic...
Exploring large-scale pretrained foundation models is of significant int...
We introduce a camera relocalization pipeline that combines absolute pos...
Multiple medical institutions collaboratively training a model using
fed...
This paper explores zero-label learning in Natural Language Processing (...
With recent progress in joint modeling of visual and textual representat...
We propose Ray-ONet to reconstruct detailed 3D models from monocular ima...
We present a relocalization pipeline, which combines an absolute pose
re...
This paper tackles the problem of novel view synthesis (NVS) from 2D ima...
Massively multilingual models subsuming tens or even hundreds of languag...
Modern multilingual models are trained on concatenated text from multipl...
Current natural language processing models work well on a single task, y...
We introduce a novel self-attention-based normal estimation network that...
A reasonable evaluation standard underlies construction of effective dee...
We present FlowNet3D++, a deep scene flow estimation network. Inspired b...
Learning multilingual representations of text has proven a successful me...
When labeled data is scarce for a specific target task, transfer learnin...
Transfer learning has been proven effective when within-target labeled d...
Multi-source transfer learning has been proven effective when within-tar...
Speech emotion recognition is a challenging task for three main reasons:...