The development of technologies for easily and automatically falsifying ...
Large-scale pre-trained Vision Language (VL) models have shown remar...
Deepfakes pose a serious threat to our digital society by fueling the sp...
Deep convolutional networks have recently achieved great success in vide...
The self-attention-based model, transformer, is recently becoming the le...
Recent advances in representation learning have demonstrated an ability ...
Multi-modal learning, which focuses on utilizing various modalities to
i...
When people observe events, they are able to abstract key information an...
The pixels in an image, and the objects, scenes, and actions that they
c...
We investigate the problem of zero-shot semantic image painting. Instead...
Network quantization has rapidly become one of the most widely used meth...
Performing inference on deep learning models for videos remains a challe...
Temporal modelling is the key for efficient video action recognition. Wh...
In recent years, a number of approaches based on 2D CNNs and 3D CNNs hav...
A key capability of an intelligent system is deciding when events from p...
Identifying common patterns among events is a key ability in human and
m...
Action recognition is an open and challenging problem in computer vision...
An event happening in the world is often made of different activities an...
Objects are entities we act upon, where the functionality of an object i...
We introduce a framework that uses Generative Adversarial Networks (GANs...
Sensing surroundings is ubiquitous and effortless to humans: It takes a
...
In the last decade, artificial intelligence (AI) models inspired by the ...
Widely used in news, business, and educational media, infographics are
h...
We present the Moments in Time Dataset, a large-scale human-annotated
co...
The success of recent deep convolutional neural networks (CNNs) depends ...
We introduce the problem of visual hashtag discovery for infographics:
e...
We propose a general framework called Network Dissection for quantifying...
The rise of multi-million-item dataset initiatives has enabled data-hung...
How best to evaluate a saliency model's ability to predict where humans ...
The complex multi-stage architecture of cortical visual pathways provide...
In this work, we revisit the global average pooling layer proposed in [1...
With the success of new computational architectures for visual processin...
Although the human visual system can recognize many concepts under
chall...