Ryo Masumura

research

∙ 08/31/2023

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

This paper addresses the tradeoff between standard accuracy on clean exa...

0 Satoshi Suzuki, et al. ∙

research

∙ 06/04/2023

End-to-End Joint Target and Non-Target Speakers ASR

This paper proposes a novel automatic speech recognition (ASR) system th...

0 Ryo Masumura, et al. ∙

research

∙ 05/24/2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Self-supervised learning (SSL) is the latest breakthrough in speech proc...

0 Hiroshi Sato, et al. ∙

research

∙ 03/02/2023

Leveraging Large Text Corpora for End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) is a technique to directly ge...

0 Kohei Matsuura, et al. ∙

research

∙ 10/28/2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

This paper investigates the effectiveness and implementation of modality...

0 Atsushi Ando, et al. ∙

research

∙ 07/11/2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

In this paper, we investigate the semi-supervised joint training of text...

0 Naoki Makishima, et al. ∙

research

∙ 06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...

0 Hiroshi Sato, et al. ∙

research

∙ 02/21/2022

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

There have been many attempts to build multimodal dialog systems that ca...

0 Yoshihiro Yamazaki, et al. ∙

research

∙ 11/24/2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

This paper presents a novel training method for end-to-end scene text re...

0 Shota Orihashi, et al. ∙

research

∙ 11/22/2021

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

This paper presents a novel knowledge distillation method for dialogue s...

0 Shota Orihashi, et al. ∙

research

∙ 07/07/2021

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

We propose a semi-supervised learning method for building end-to-end ric...

0 Tomohiro Tanaka, et al. ∙

research

∙ 07/04/2021

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

We propose a cross-modal transformer-based neural correction models that...

0 Tomohiro Tanaka, et al. ∙

research

∙ 07/04/2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

In this paper, we present a novel modeling method for single-channel mul...

0 Ryo Masumura, et al. ∙

research

∙ 06/23/2021

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...

0 Naoki Makishima, et al. ∙

research

∙ 06/23/2021

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

In this paper, we propose a novel spoken-text-style conversion method th...

0 Mana Ihori, et al. ∙

research

∙ 03/02/2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

We present an audio-visual speech separation learning method that consid...

0 Naoki Makishima, et al. ∙

research

∙ 02/16/2021

End-to-End Automatic Speech Recognition with Deep Mutual Learning

This paper is the first study to apply deep mutual learning (DML) to end...

0 Ryo Masumura, et al. ∙

research

∙ 02/16/2021

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

This paper presents a novel self-supervised learning method for handling...

0 Ryo Masumura, et al. ∙

research

∙ 02/16/2021

Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

We present a novel large-context end-to-end automatic speech recognition...

0 Ryo Masumura, et al. ∙

research

∙ 02/15/2021

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

This paper presents a self-supervised learning method for pointer-genera...

0 Mana Ihori, et al. ∙

research

∙ 10/29/2020

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

This paper presents a novel fusion method for integrating an external la...

0 Mana Ihori, et al. ∙

research

∙ 07/01/2020

A Transformer-based Audio Captioning Model with Keyword Estimation

One of the problems with automated audio captioning (AAC) is the indeter...

0 Yuma Koizumi, et al. ∙

Ryo Masumura

Featured Co-authors

Sign in with Google

Consider DeepAI Pro