Yi Ren

research

∙ 08/29/2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Co-speech gesture generation is crucial for automatic digital avatar ani...

0 Longbin Ji, et al. ∙

research

∙ 08/24/2023

Sparks of Large Audio Models: A Survey and Outlook

This survey paper provides a comprehensive overview of the recent advanc...

0 Siddique Latif, et al. ∙

research

∙ 08/05/2023

Disentangled Counterfactual Reasoning for Unbiased Sequential Recommendation

Sequential recommender systems have achieved state-of-the-art recommenda...

0 Yi Ren, et al. ∙

research

∙ 07/14/2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Zero-shot text-to-speech aims at synthesizing voices with unseen speech ...

0 Ziyue Jiang, et al. ∙

research

∙ 06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...

0 Yahuan Cong, et al. ∙

research

∙ 06/06/2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...

0 Zhenhui Ye, et al. ∙

research

∙ 06/04/2023

Detector Guidance for Multi-Object Text-to-Image Generation

Diffusion models have demonstrated impressive performance in text-to-ima...

0 Luping Liu, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/28/2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular ...

0 Kun Song, et al. ∙

research

∙ 05/24/2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) aims to convert speech from o...

0 Rongjie Huang, et al. ∙

research

∙ 05/23/2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Stutter removal is an essential scenario in the field of speech editing....

0 Ziyue Jiang, et al. ∙

research

∙ 05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...

0 Zhenhui Ye, et al. ∙

research

∙ 05/16/2023

AMD: Autoregressive Motion Diffusion

Human motion generation aims to produce plausible human motion sequences...

0 Bo Han, et al. ∙

research

∙ 05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...

8 Zhenhui Ye, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 04/17/2023

Attributing Image Generative Models using Latent Fingerprints

Generative models have enabled the creation of contents that are indisti...

5 Guangyu Nie, et al. ∙

research

∙ 04/11/2023

Unbiased Pairwise Learning from Implicit Feedback for Recommender Systems without Biased Variance Control

Generally speaking, the model training for recommender systems can be ba...

0 Yi Ren, et al. ∙

research

∙ 03/24/2023

MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and onl...

5 Qinglin Zhang, et al. ∙

research

∙ 03/24/2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) ...

0 Qinglin Zhang, et al. ∙

research

∙ 03/08/2023

Unbiased Learning to Rank with Biased Continuous Feedback

It is a well-known challenge to learn an unbiased ranker with biased fee...

0 Yi Ren, et al. ∙

research

∙ 02/28/2023

Item Cold Start Recommendation via Adversarial Variational Auto-encoder Warm-up

The gap between the randomly initialized item ID embedding and the well-...

0 Shenzheng Zhang, et al. ∙

research

∙ 02/24/2023

Slate-Aware Ranking for Recommendation

We see widespread adoption of slate recommender systems, where an ordere...

0 Yi Ren, et al. ∙

research

∙ 02/11/2023

How to prepare your task head for finetuning

In deep learning, transferring information from a pretrained network to ...

18 Yi Ren, et al. ∙

research

∙ 01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...

3 Zhenhui Ye, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 01/27/2023

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding

Robotic grasping is a fundamental ability for a robot to interact with t...

6 Yaoxian Song, et al. ∙

research

∙ 01/25/2023

XNLI: Explaining and Diagnosing NLI-based Visual Data Analysis

Natural language interfaces (NLIs) enable users to flexibly specify anal...

5 Yingchaojie Feng, et al. ∙

research

∙ 01/22/2023

Dance2MIDI: Dance-driven multi-instruments music generation

Dance-driven music generation aims to generate musical pieces conditione...

0 Bo Han, et al. ∙

research

∙ 11/28/2022

Toward Global Sensing Quality Maximization: A Configuration Optimization Scheme for Camera Networks

The performance of a camera network monitoring a set of targets depends ...

0 Xuechao Zhang, et al. ∙

research

∙ 11/21/2022

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

Out-of-distribution (OOD) detection is an important task to ensure the r...

0 Luping Liu, et al. ∙

research

∙ 11/19/2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

Video to sound generation aims to generate realistic and natural sound g...

0 Chenye Cui, et al. ∙

research

∙ 11/09/2022

Safety-Critical Optimal Control for Robotic Manipulators in A Cluttered Environment

Designing safety-critical control for robotic manipulators is challengin...

0 Xuda Ding, et al. ∙

research

∙ 11/01/2022

SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation

While deep generative models have empowered music generation, it remains...

0 Chen Zhang, et al. ∙

research

∙ 09/01/2022

Video-Guided Curriculum Learning for Spoken Video Grounding

In this paper, we introduce a new task, spoken video grounding (SVG), wh...

0 Yan Xia, et al. ∙

research

∙ 07/31/2022

DA^2 Dataset: Toward Dexterity-Aware Dual-Arm Grasping

In this paper, we introduce DA^2, the first large-scale dual-arm dexteri...

8 Guangyao Zhai, et al. ∙

research

∙ 07/13/2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved ...

0 Rongjie Huang, et al. ∙

research

∙ 06/05/2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Polyphone disambiguation aims to capture accurate pronunciation knowledg...

0 Ziyue Jiang, et al. ∙

research

∙ 05/27/2022

Improving Item Cold-start Recommendation via Model-agnostic Conditional Variational Autoencoder

Embedding MLP has become a paradigm for modern large-scale recommend...

0 Xu Zhao, et al. ∙

research

∙ 05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...

0 Rongjie Huang, et al. ∙

research

∙ 05/15/2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

Style transfer for out-of-domain (OOD) speech synthesis aims to generate...

0 Rongjie Huang, et al. ∙

research

∙ 04/27/2022

SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks

This paper follows cognitive studies to investigate a graph representati...

0 Sheng Cheng, et al. ∙

research

∙ 04/25/2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has m...

0 Zhenhui Ye, et al. ∙

research

∙ 04/18/2022

Configuration-Aware Safe Control for Mobile Robotic Arm with Control Barrier Functions

Collision avoidance is a widely investigated topic in robotic applicatio...

0 Fan Ding, et al. ∙

research

∙ 03/04/2022

Better Supervisory Signals by Observing Learning Paths

Better-supervised models might have better performance. In this paper, w...

0 Yi Ren, et al. ∙

research

∙ 02/27/2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier

We are interested in a novel task, singing voice beautifying (SVB). Give...

18 Jinglin Liu, et al. ∙

research

∙ 02/26/2022

Revisiting Over-Smoothness in Text to Speech

Non-autoregressive text to speech (NAR-TTS) models have attracted much a...

0 Yi Ren, et al. ∙

research

∙ 02/20/2022

Pseudo Numerical Methods for Diffusion Models on Manifolds

Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quali...

0 Luping Liu, et al. ∙

research

∙ 02/17/2022

Attributable-Watermarking of Speech Generative Models

Generative models are now capable of synthesizing images, speeches, and ...

0 Yongbaek Cho, et al. ∙

research

∙ 02/16/2022

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

Expressive text-to-speech (TTS) has become a hot research topic recently...

0 Yi Ren, et al. ∙

Yi Ren

Featured Co-authors

Sign in with Google

Consider DeepAI Pro