A recent trend in deep learning algorithms has been towards training lar...
We propose a new paradigm to automatically generate training data with
a...
Due to the increase in computational resources and accessibility of data...
Current text-to-image generation models often struggle to follow textual...
A rich representation is key to general robotic manipulation, but existi...
We propose EM-PASTE: an Expectation Maximization(EM) guided Cut-Paste
co...
Simulating realistic sensors is a challenging part in data generation fo...
Training computer vision models usually requires collecting and labeling...
Weakly supervised object detection (WSOD) enables object detectors to be...
3D scene graphs (3DSGs) are an emerging description; unifying symbolic,
...
Joint visual and language modeling on large-scale datasets has recently ...
We have seen a great progress in video action recognition in recent year...
Object cut-and-paste has become a promising approach to efficiently gene...
The ability to integrate context, including perceptual and temporal cues...
Contrastive learning relies on an assumption that positive pairs contain...
State-of-the-art methods for self-supervised sequential action alignment...
Acoustic scattering is strongly influenced by boundary geometry of objec...
Although 3D Convolutional Neural Networks (CNNs) are essential for most
...
Simulation is increasingly being used for generating large labelled data...
It was recently shown that the structure of convolutional neural network...
Diffracted scattering and occlusion are important acoustic effects in
in...
Machines are a long way from robustly solving open-world perception-cont...
Standard 3D reconstruction pipelines assume stationary world, therefore
...
The risk of unauthorized remote access of streaming video from networked...
We present an approach to synthesize highly photorealistic images of 3D
...
Recent progress in computer vision has been driven by high-capacity mode...
We present an open-source, real-time implementation of SemanticPaint, a
...
Pixel-level labelling tasks, such as semantic segmentation, play a centr...
Humans describe images in terms of nouns and adjectives while algorithms...