While the performance of cross-lingual TTS based on monolingual corpora ...
The Organic Rankine Cycle (ORC) is widely used in industrial waste heat
...
Maneuvering target tracking will be an important service of future wirel...
This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-sp...
Tractography traces the peak directions extracted from fiber orientation...
For letting mobile robots travel flexibly through complicated environmen...
Automatic classification of electrocardiogram (ECG) signals plays a cruc...
Direct speech-to-speech translation (S2ST) has gradually become popular ...
The recently proposed serialized output training (SOT) simplifies
multi-...
Voice conversion is an increasingly popular technology, and the growing
...
Real-world complex acoustic environments especially the ones with a low
...
In ICASSP 2023 speech signal improvement challenge, we developed a dual-...
It is difficult for an end-to-end (E2E) ASR system to recognize words su...
Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...
This report describes the NPU-HC speaker verification system submitted t...
Self-supervised speech pre-training empowers the model with the contextu...
This paper aims to synthesize target speaker's speech with desired speak...
Speech data on the Internet are proliferating exponentially because of t...
Background sound is an informative form of art that is helpful in provid...
End-to-end singing voice synthesis (SVS) model VISinger can achieve bett...
This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...
Recent development of neural vocoders based on the generative adversaria...
In current two-stage neural text-to-speech (TTS) paradigm, it is ideal t...
Keyword spotting (KWS) enables speech-based user interaction and gradual...
This paper describes the TSUP team's submission to the ISCSLP 2022
conve...
Recently, multi-channel speech enhancement has drawn much interest due t...
Recently cross-channel attention, which better leverages multi-channel
s...
This paper presents the NWPU-ASLP speaker anonymization system for
Voice...
Recent advancements in neural end-to-end TTS models have shown high-qual...
Boosting the runtime performance of deep neural networks (DNNs) is criti...
The zero-shot scenario for speech generation aims at synthesizing a nove...
This paper describes the NPU system submitted to Spoofing Aware Speaker
...
Customized keyword spotting (KWS) has great potential to be deployed on ...
Speech command recognition (SCR) has been commonly used on resource
cons...
Leveraging context information is an intuitive idea to improve performan...
Building a voice conversion system for noisy target speakers, such as us...
The ideal goal of voice conversion is to convert the source speaker's sp...
Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a
pr...
With the development of innovative applications that demand accurate
env...
Deep neural networks (DNNs) have shown promising results for acoustic ec...
Sensing will be an important service for future wireless networks to ass...
General accent recognition (AR) models tend to directly extract low-leve...
In this paper, we conduct a comparative study on speaker-attributed auto...
Building a high-quality singing corpus for a person who is not good at
s...
Recently, we made available WeNet, a production-oriented end-to-end spee...
DeepFake based digital facial forgery is threatening the public media
se...
Talking face generation with great practical significance has attracted ...
Multi-modal based speech separation has exhibited a specific advantage o...
Active speaker detection and speech enhancement have become two increasi...
Conversational automatic speech recognition (ASR) is a task to recognize...