The generative AI revolution has recently expanded to videos. Neverthele...
Current perceptual similarity metrics operate at the level of pixels and...
Large vision-language models (VLMs), such as CLIP, learn rich joint
imag...
Recent advances in text-to-image generation with diffusion models presen...
We present Neural Congealing – a zero-shot self-supervised framework for...
We propose a method for text-driven perpetual view generation – synthesi...
Large-scale text-to-image generative models have been a revolutionary
br...
Text-conditioned image editing has recently attracted considerable inter...
GANs are able to perform generation and manipulation tasks, trained on a...
We present a method for zero-shot, text-driven appearance manipulation i...
We present a method for semantically transferring the visual appearance ...
We leverage deep features extracted from a pre-trained Vision Transforme...
We present a method that decomposes, or "unwraps", an input video into a...
Most advanced video generation and manipulation methods train on a large...
We present a method to estimate depth of a dynamic scene, containing
arb...
Computer vision is increasingly effective at segmenting objects in image...
We present a method for retiming people in an ordinary, natural
video—ma...
We wish to automatically predict the "speediness" of moving objects in
v...
We present a novel GAN-based model that utilizes the space of deep featu...
How much can we infer about a person's looks from the way they speak? In...
We introduce SinGAN, an unconditional generative model that can be learn...
We present a method for predicting dense depth in scenarios where both a...
We present a system that allows users to visualize complex human motion ...
We present a joint audio-visual model for isolating a single speech sign...
We study the problem of reconstructing an image from information stored ...
We propose a novel method for template matching in unconstrained
environ...