3D pose estimation is an invaluable task in computer vision with various...
AI alignment refers to models acting towards human-intended goals,
prefe...
Generating high quality music that complements the visual content of a v...
We propose a self-supervised learning method for long text documents bas...
The recent advances in representation learning inspire us to take on the...
X-ray computed tomography (CT) is one of the most common imaging techniq...
Multimodal learning can benefit from the representation power of pretrai...
Neural Radiance Fields (NeRF) achieves photo-realistic image rendering f...
Music tagging and content-based retrieval systems have traditionally bee...
Implicit feedback has been widely used to build commercial recommender
s...
The task of predicting future actions from a video is crucial for a
real...
Self-supervised learning has drawn attention through its effectiveness i...
In order to perform unconditional video generation, we must learn the
di...
Recently, vector-quantized image modeling has demonstrated impressive
pe...
Session-based recommendation aims at predicting the next item given a
se...
Top-N recommendation is a challenging problem because complex and sparse...
Identifying a short segment in a long video that semantically matches a ...
Video generation models often operate under the assumption of fixed fram...
Although essential to revealing biased performance, well intentioned att...
Graph Convolutional Networks (GCNs) have shown significant improvements ...
Many recent advancements in Computer Vision are attributed to large data...
Matrix approximation is a common tool in machine learning for building
a...
Collaborative filtering is a rapidly advancing research area. Every year...