Tomohiro Tanaka

research

∙ 09/04/2023

Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion

Although recent advancements in diffusion models enabled high-fidelity a...

0 Ryota Yoshihashi, et al. ∙

research

∙ 06/14/2023

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Self-supervised learning (SSL) for speech representation has been succes...

0 Takanori Ashihara, et al. ∙

research

∙ 06/07/2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) directly summarizes input spe...

0 Kohei Matsuura, et al. ∙

research

∙ 06/04/2023

End-to-End Joint Target and Non-Target Speakers ASR

This paper proposes a novel automatic speech recognition (ASR) system th...

0 Ryo Masumura, et al. ∙

research

∙ 05/24/2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Self-supervised learning (SSL) is the latest breakthrough in speech proc...

0 Hiroshi Sato, et al. ∙

research

∙ 05/09/2023

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Self-supervised learning (SSL) has been dramatically successful not only...

0 Takanori Ashihara, et al. ∙

research

∙ 03/02/2023

Leveraging Large Text Corpora for End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) is a technique to directly ge...

0 Kohei Matsuura, et al. ∙

research

∙ 11/25/2022

Ladder Siamese Network: a Method and Insights for Multi-level Self-Supervised Learning

Siamese-network-based self-supervised learning (SSL) suffers from slow c...

0 Ryota Yoshihashi, et al. ∙

research

∙ 07/14/2022

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Self-supervised learning (SSL) is seen as a very promising approach with...

0 Takanori Ashihara, et al. ∙

research

∙ 06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...

0 Hiroshi Sato, et al. ∙

research

∙ 11/24/2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

This paper presents a novel training method for end-to-end scene text re...

0 Shota Orihashi, et al. ∙

research

∙ 11/22/2021

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

This paper presents a novel knowledge distillation method for dialogue s...

0 Shota Orihashi, et al. ∙

research

∙ 07/07/2021

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

We propose a semi-supervised learning method for building end-to-end ric...

0 Tomohiro Tanaka, et al. ∙

research

∙ 07/04/2021

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

We propose a cross-modal transformer-based neural correction models that...

0 Tomohiro Tanaka, et al. ∙

research

∙ 07/04/2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

In this paper, we present a novel modeling method for single-channel mul...

0 Ryo Masumura, et al. ∙

research

∙ 06/23/2021

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...

0 Naoki Makishima, et al. ∙

research

∙ 06/23/2021

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

In this paper, we propose a novel spoken-text-style conversion method th...

0 Mana Ihori, et al. ∙

research

∙ 06/10/2021

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

In the deployment of scene-text spotting systems on mobile platforms, li...

9 Ryota Yoshihashi, et al. ∙

research

∙ 03/02/2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

We present an audio-visual speech separation learning method that consid...

0 Naoki Makishima, et al. ∙

research

∙ 02/16/2021

End-to-End Automatic Speech Recognition with Deep Mutual Learning

This paper is the first study to apply deep mutual learning (DML) to end...

0 Ryo Masumura, et al. ∙

research

∙ 02/16/2021

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

This paper presents a novel self-supervised learning method for handling...

0 Ryo Masumura, et al. ∙

research

∙ 02/16/2021

Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

We present a novel large-context end-to-end automatic speech recognition...

0 Ryo Masumura, et al. ∙

research

∙ 02/15/2021

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

This paper presents a self-supervised learning method for pointer-genera...

0 Mana Ihori, et al. ∙

research

∙ 10/29/2020

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

This paper presents a novel fusion method for integrating an external la...

0 Mana Ihori, et al. ∙

Tomohiro Tanaka

Featured Co-authors

Sign in with Google

Consider DeepAI Pro