The ability to accurately recognize, localize and separate sound sources...
Audio-visual source localization is a challenging task that aims to pred...
Unsupervised audio-visual source localization aims at localizing visible...
We present a self-supervised learning method to learn audio and video
re...
We introduce a novel self-supervised pretext task for learning
represent...
Image hash codes are produced by binarizing the embeddings of convolutio...
Long-tail recognition tackles the natural non-uniformly distributed data...
We present a self-supervised learning approach to learn audio-visual
rep...
Real-world applications of object recognition often require the solution...
We introduce an approach to convert mono audio recorded by a 360 video c...
The role of semantics in zero-shot learning is considered. The effective...