Keisuke Kinoshita

research

∙ 01/31/2023

Neural Target Speech Extraction: An Overview

Humans can listen to a target speaker even in challenging acoustic condi...

0 Katerina Zmolikova, et al. ∙

research

∙ 11/29/2022

On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems

We present a general framework to compute the word error rate (WER) of A...

0 Thilo von Neumann, et al. ∙

research

∙ 07/28/2022

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

Recent speaker diarization studies showed that integration of end-to-end...

0 Keisuke Kinoshita, et al. ∙

research

∙ 06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...

0 Hiroshi Sato, et al. ∙

research

∙ 04/11/2022

Listen only to me! How well can target speech extraction handle false alarms?

Target speech extraction (TSE) extracts the speech of a target speaker i...

0 Marc Delcroix, et al. ∙

research

∙ 04/08/2022

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

In many situations, we would like to hear desired sound events (SEs) whi...

0 Marc Delcroix, et al. ∙

research

∙ 03/31/2022

Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening

It is essential to perform speech intelligibility (SI) experiments with ...

0 Ayako Yamamoto, et al. ∙

research

∙ 02/14/2022

Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model

Speaker diarization has been investigated extensively as an important ce...

0 Keisuke Kinoshita, et al. ∙

research

∙ 01/11/2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

The combination of a deep neural network (DNN) -based speech enhancement...

0 Hiroshi Sato, et al. ∙

research

∙ 11/20/2021

Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithm

This paper develops a framework that can perform denoising, dereverberat...

0 Tomohiro Nakatani, et al. ∙

research

∙ 10/29/2021

SA-SDR: A novel loss function for separation of meeting style data

Many state-of-the-art neural network-based source separation systems use...

0 Thilo von Neumann, et al. ∙

research

∙ 08/04/2021

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

This paper proposes an approach for optimizing a Convolutional BeamForme...

0 Tomohiro Nakatani, et al. ∙

research

∙ 07/30/2021

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

Automatic transcription of meetings requires handling of overlapped spee...

0 Thilo von Neumann, et al. ∙

research

∙ 07/30/2021

Speeding Up Permutation Invariant Training for Source Separation

Permutation invariant training (PIT) is a widely used training criterion...

0 Thilo von Neumann, et al. ∙

research

∙ 06/14/2021

Few-shot learning of new sound classes for target sound extraction

Target sound extraction consists of extracting the sound of a target aco...

0 Marc Delcroix, et al. ∙

research

∙ 06/07/2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

Sound event localization aims at estimating the positions of sound sourc...

0 Christopher Schymura, et al. ∙

research

∙ 06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...

0 Hiroshi Sato, et al. ∙

research

∙ 05/19/2021

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

Recently, we proposed a novel speaker diarization method called End-to-E...

0 Keisuke Kinoshita, et al. ∙

research

∙ 04/17/2021

Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility

Many subjective experiments have been performed to develop objective spe...

0 Ayako Yamamoto, et al. ∙

research

∙ 02/28/2021

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

Sound event localization frameworks based on deep neural networks have s...

0 Christopher Schymura, et al. ∙

research

∙ 02/23/2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings

The continuous speech separation (CSS) is a task to separate the speech ...

0 Chenda Li, et al. ∙

research

∙ 02/23/2021

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

Estimating the positions of multiple speakers can be helpful for tasks l...

0 Julio Wissing, et al. ∙

research

∙ 02/23/2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

Recently, the end-to-end approach has been successfully applied to multi...

0 Wangyou Zhang, et al. ∙

research

∙ 02/02/2021

Multimodal Attention Fusion for Target Speaker Extraction

Target speaker extraction, which aims at extracting a target speaker's v...

0 Hiroshi Sato, et al. ∙

research

∙ 01/14/2021

Speaker activity driven neural speech extraction

Target speech extraction, which extracts the speech of a target speaker ...

0 Marc Delcroix, et al. ∙

research

∙ 01/12/2021

Neural Network-based Virtual Microphone Estimator

Developing microphone array technologies for a small number of microphon...

0 Tsubasa Ochiai, et al. ∙

research

∙ 12/17/2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

Leveraging additional speaker information to facilitate speech separatio...

0 Cong Han, et al. ∙

research

∙ 11/30/2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Time-domain training criteria have proven to be very effective for the s...

0 Christoph Boeddeker, et al. ∙

research

∙ 10/26/2020

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

Recent diarization technologies can be categorized into two approaches, ...

0 Keisuke Kinoshita, et al. ∙

research

∙ 06/24/2020

Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

Recently, the source separation performance was greatly improved by time...

0 Keisuke Kinoshita, et al. ∙

research

∙ 06/10/2020

Listen to What You Want: Neural Network-based Universal Sound Selector

Being able to control the acoustic events (AEs) to which we want to list...

0 Tsubasa Ochiai, et al. ∙

research

∙ 06/04/2020

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Most approaches to multi-talker overlapped speech separation and recogni...

0 Thilo von Neumann, et al. ∙

research

∙ 05/20/2020

Jointly optimal denoising, dereverberation, and source separation

This paper proposes methods that can optimize a Convolutional BeamFormer...

0 Tomohiro Nakatani, et al. ∙

research

∙ 05/10/2020

Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding

The performance of speech enhancement algorithms in a multi-speaker scen...

0 Ali Aroudi, et al. ∙

research

∙ 03/09/2020

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

With the advent of deep learning, research on noise-robust automatic spe...

0 Keisuke Kinoshita, et al. ∙

research

∙ 03/09/2020

Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

Automatic meeting analysis is an essential fundamental technology requir...

0 Keisuke Kinoshita, et al. ∙

research

∙ 01/23/2020

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

Target speech extraction, which extracts a single target source in a mix...

0 Marc Delcroix, et al. ∙

research

∙ 12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...

0 Thilo von Neumann, et al. ∙

research

∙ 12/18/2019

Ene-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...

0 Thilo von Neumann, et al. ∙

research

∙ 10/30/2019

Jointly optimal dereverberation and beamforming

We previously proposed an optimal (in the maximum likelihood sense) conv...

0 Christoph Boeddeker, et al. ∙

research

∙ 08/06/2019

Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation

This article describes a probabilistic formulation of a Weighted Power m...

0 Tomohiro Nakatani, et al. ∙

research

∙ 04/03/2019

GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech

In this study, we proposed a new concept, gammachirp envelope distortion...

0 Katsuhiko Yamamoto, et al. ∙

research

∙ 02/21/2019

All-neural online source separation, counting, and diarization for meeting analysis

Automatic meeting analysis comprises the tasks of speaker counting, spea...

0 Thilo von Neumann, et al. ∙

research

∙ 12/20/2018

A unified convolutional beamformer for simultaneous denoising and dereverberation

This paper proposes a method for estimating a convolutional beamformer t...

0 Tomohiro Nakatani, et al. ∙

Keisuke Kinoshita

Featured Co-authors

Sign in with Google

Consider DeepAI Pro