Personalized text generation is an emerging research area that has attra...
One-shot Neural architecture search (One-shot NAS) has been proposed as ...
The goal of Automatic Voice Over (AVO) is to generate speech in sync wit...
Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown
exc...
3D interacting hand pose estimation from a single RGB image is a challen...
In recent years, research on hyperspectral image (HSI) classification ha...
The empirical studies of Graph Neural Networks (GNNs) broadly take the
o...
In recent years, neural architecture search (NAS) has shown great
compet...
The pre-trained language model (eg, BERT) based deep retrieval models
ac...
Conventional vocoders are commonly used as analysis tools to provide
int...
Adequate labeled data and expensive compute resources are the prerequisi...
In this paper, we formulate a novel task to synthesize speech in sync wi...
A novel multi-scale temporal convolutional network (TCN) and long short-...
In the past years, significant improvements in the field of neural
archi...
The content on the web is in a constant state of flux. New entities, iss...
Document layout comprises both structural and visual (eg. font-sizes)
in...
When trying to apply the recent advance of Natural Language Understandin...
Auto-ML pruning methods aim at searching a pruning strategy automaticall...
Search engines often follow a two-phase paradigm where in the first stag...
This paper presents a novel framework to build a voice conversion (VC) s...
Emotional voice conversion aims to convert the emotion of the speech fro...
Many information retrieval and natural language processing problems can ...
Urban anomalies may result in loss of life or property if not handled
pr...
Previous AutoML pruning works utilized individual layer features to
auto...
We describe our submitted system for the ZeroSpeech Challenge 2019. The
...
We investigated the training of a shared model for both text-to-speech (...