A holistic understanding of object properties across diverse sensory
mod...
Multi-modal contrastive learning techniques in the audio-text domain hav...
Localizing visual sounds consists on locating the position of objects th...
Audio applications involving environmental sound analysis increasingly u...
Music information is often conveyed or recorded across multiple data
mod...
Deep learning is very data hungry, and supervised learning especially
re...
We present SONYC-UST-V2, a dataset for urban sound tagging with
spatiote...
Semantic code search is the task of retrieving relevant code given a nat...