In this paper, we propose a Vision-Audio-Language Omni-peRception pretra...
As a combination of visual and audio signals, video is inherently
multi-...
Motion, scene and object are three primary visual components of a video....
We propose a new inference method for multiple change-point detection in...
Although face swapping has attracted much attention in recent years, it
...
This paper presents a new approach to estimation and inference in panel ...
Compared with image scene parsing, video scene parsing introduces tempor...
In this paper, we propose an Omni-perception Pre-Trainer (OPT) for
cross...
Face sketch synthesis has made significant progress with the development...
This paper studies the estimation of network connectedness with focally
...
Video semantic segmentation requires to utilize the complex temporal
rel...
Image captioning transforms complex visual information into abstract nat...
Estimating spot covariance is an important issue to study, especially wi...
We consider the estimation and inference in a system of high-dimensional...