The field of generative models has recently witnessed significant progre...
Lip-to-speech involves generating a natural-sounding speech synchronized...
Referring Expressions Generation (REG) aims to produce textual descripti...
Capturing the interesting components of an image is a key aspect of imag...