LLMs have demonstrated remarkable abilities at interacting with humans
t...
Product Retrieval (PR) and Grounding (PG), aiming to seek image and
obje...
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of
...
Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quali...
With the rapid development of deep learning technology and improvement i...
Lip reading, aiming to recognize spoken sentences according to the given...
Existing reasoning tasks often have an important assumption that the inp...
Video moment retrieval aims to localize the target moment in an video
ac...
Spatio-temporal video grounding aims to retrieve the spatio-temporal tub...
Video moment retrieval is to search the moment that is most relevant to ...
Action localization in untrimmed videos is an important topic in the fie...
Open-ended video question answering aims to automatically generate the
n...
Query-based moment retrieval aims to localize the most relevant moment i...