Recent progress in diffusion models has revolutionized the popular techn...
The visual dialog task requires an AI agent to interact with humans in
m...
Knowing the reasoning chains from knowledge to the predicted answers can...
Resolving pronouns to their referents has long been studied as a fundame...
Grounding a pronoun to a visual object it refers to requires complex
rea...
We propose a foreground segmentation algorithm that does foreground
extr...