Stavros Petridis

research

∙ 07/10/2023

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

Recent advances in deep neural networks have achieved unprecedented succ...

0 Adriana Fernandez-Lopez, et al. ∙

research

∙ 05/15/2023

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

Speech-driven animation has gained significant traction in recent years,...

0 Antoni Bigata Casademunt, et al. ∙

research

∙ 05/05/2023

Is dataset condensation a silver bullet for healthcare data sharing?

Safeguarding personal information is paramount for healthcare data shari...

13 Yujiang Wang, et al. ∙

research

∙ 03/30/2023

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Recently reported state-of-the-art results in visual speech recognition ...

0 Xubo Liu, et al. ∙

research

∙ 03/25/2023

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

Audio-visual speech recognition has received a lot of attention due to i...

0 Pingchuan Ma, et al. ∙

research

∙ 03/14/2023

Learning Cross-lingual Visual Speech Representations

Cross-lingual self-supervised learning has been a growing research topic...

0 Andreas Zinonos, et al. ∙

research

∙ 01/06/2023

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Talking face generation has historically struggled to produce head movem...

0 Michał Stypułkowski, et al. ∙

research

∙ 12/12/2022

Jointly Learning Visual and Auditory Speech Representations from Raw Data

We present RAVEn, a self-supervised multi-modal approach to jointly lear...

0 Alexandros Haliassos, et al. ∙

research

∙ 11/20/2022

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

Audio-visual speech enhancement aims to extract clean speech from a nois...

0 Rodrigo Mira, et al. ∙

research

∙ 11/03/2022

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Recognizing a word shortly after it is spoken is an important requiremen...

0 Pingchuan Ma, et al. ∙

research

∙ 10/20/2022

SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video

This work focuses on the apparent emotional reaction recognition (AERR) ...

0 Marija Jegorova, et al. ∙

research

∙ 09/03/2022

Training Strategies for Improved Lip-reading

Several training strategies and temporal models have been recently propo...

10 Pingchuan Ma, et al. ∙

research

∙ 05/04/2022

SVTS: Scalable Video-to-Speech Synthesis

Video-to-speech synthesis (also known as lip-to-speech) refers to the tr...

11 Rodrigo Mira, et al. ∙

research

∙ 03/24/2022

Self-supervised Video-centralised Transformer for Video Face Clustering

This paper presents a novel method for face clustering in videos using a...

15 Yujiang Wang, et al. ∙

research

∙ 02/26/2022

Visual Speech Recognition for Multiple Languages in the Wild

Visual speech recognition (VSR) aims to recognise the content of speech ...

5 Pingchuan Ma, et al. ∙

research

∙ 01/18/2022

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

One of the most pressing challenges for the detection of face-manipulate...

3 Alexandros Haliassos, et al. ∙

research

∙ 10/18/2021

Domain Generalisation for Apparent Emotional Facial Expression Recognition across Age-Groups

Apparent emotional facial expression recognition has attracted a lot of ...

4 Rafael Poyiadzi, et al. ∙

research

∙ 06/16/2021

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

The large amount of audiovisual content being shared online today has dr...

5 Pingchuan Ma, et al. ∙

research

∙ 04/27/2021

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

Video-to-speech is the process of reconstructing the audio speech from a...

12 Rodrigo Mira, et al. ∙

research

∙ 02/18/2021

DINO: A Conditional Energy-Based GAN for Domain Translation

Domain translation is the process of transforming data from one domain t...

22 Konstantinos Vougioukas, et al. ∙

research

∙ 02/12/2021

End-to-end Audio-visual Speech Recognition with Conformers

In this work, we present a hybrid CTC/Attention model based on a ResNet-...

12 Pingchuan Ma, et al. ∙

research

∙ 12/14/2020

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

Although current deep learning-based face forgery detectors achieve impr...

5 Alexandros Haliassos, et al. ∙

research

∙ 10/07/2020

Domain Adversarial Neural Networks for Dysarthric Speech Recognition

Speech recognition systems have improved dramatically over the last few ...

0 Dominika Woszczyk, et al. ∙

research

∙ 09/29/2020

Lip-reading with Densely Connected Temporal Convolutional Networks

In this work, we present the Densely Connected Temporal Convolutional Ne...

36 Pingchuan Ma, et al. ∙

research

∙ 07/13/2020

Towards practical lipreading with distilled and efficient models

Lipreading has witnessed a lot of progress due to the resurgence of neur...

0 Pingchuan Ma, et al. ∙

research

∙ 07/08/2020

Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision

The intuitive interaction between the audio and visual modalities is val...

45 Abhinav Shukla, et al. ∙

research

∙ 05/04/2020

Does Visual Self-Supervision Improve Learning of Speech Representations?

Self-supervised learning has attracted plenty of recent research interes...

17 Abhinav Shukla, et al. ∙

research

∙ 01/23/2020

Lipreading using Temporal Convolutional Networks

Lip-reading has attracted a lot of research attention lately thanks to a...

13 Brais Martinez, et al. ∙

research

∙ 01/13/2020

Visually Guided Self Supervised Learning of Speech Representations

Self supervised representation learning has recently attracted a lot of ...

32 Abhinav Shukla, et al. ∙

research

∙ 12/18/2019

Detecting Adversarial Attacks On Audio-Visual Speech Recognition

Adversarial attacks pose a threat to deep learning models. However, rese...

22 Pingchuan Ma, et al. ∙

research

∙ 12/12/2019

Speech-driven facial animation using polynomial fusion of features

Speech-driven facial animation involves using a speech signal to generat...

15 Triantafyllos Kefalas, et al. ∙

research

∙ 11/14/2019

Towards Pose-invariant Lip-Reading

Lip-reading models have been significantly improved recently thanks to p...

19 Shiyang Cheng, et al. ∙

research

∙ 06/14/2019

Video-Driven Speech Reconstruction using Generative Adversarial Networks

Speech is a means of communication which relies on both audio and visual...

2 Konstantinos Vougioukas, et al. ∙

research

∙ 06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...

4 Konstantinos Vougioukas, et al. ∙

research

∙ 06/05/2019

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

Several audio-visual speech recognition models have been recently propos...

3 Pingchuan Ma, et al. ∙

research

∙ 04/02/2019

End-to-End Visual Speech Recognition for Small-Scale Datasets

Traditional visual speech recognition systems consist of two stages, fea...

6 Stavros Petridis, et al. ∙

research

∙ 09/28/2018

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

Recent works in speech recognition rely either on connectionist temporal...

0 Stavros Petridis, et al. ∙

research

∙ 07/19/2018

Transfer Learning for Action Unit Recognition

This paper presents a classifier ensemble for Facial Expression Recognit...

2 Yen Khye Lim, et al. ∙

research

∙ 05/23/2018

End-to-End Speech-Driven Facial Animation with Temporal GANs

Speech-driven facial animation is the process which uses speech signals ...

0 Konstantinos Vougioukas, et al. ∙

research

∙ 04/10/2018

A real-time and unsupervised face Re-Identification system for Human-Robot Interaction

In the context of Human-Robot Interaction (HRI), face Re-Identification ...

0 Yujiang Wang, et al. ∙

research

∙ 02/18/2018

End-to-end Audiovisual Speech Recognition

Several end-to-end deep learning approaches have been recently presented...

0 Stavros Petridis, et al. ∙

research

∙ 02/18/2018

Visual-Only Recognition of Normal, Whispered and Silent Speech

Silent speech interfaces have been recently proposed as a way to enable ...

0 Stavros Petridis, et al. ∙

research

∙ 09/12/2017

End-to-End Audiovisual Fusion with LSTMs

Several end-to-end deep learning approaches have been recently presented...

0 Stavros Petridis, et al. ∙

research

∙ 03/24/2017

Local Deep Neural Networks for Age and Gender Classification

Local deep neural networks have been recently introduced for gender reco...

0 Zukang Liao, et al. ∙

research

∙ 01/20/2017

End-To-End Visual Speech Recognition With LSTMs

Traditional visual speech recognition systems consist of two stages, fea...

0 Stavros Petridis, et al. ∙

Stavros Petridis

Featured Co-authors

Sign in with Google

Consider DeepAI Pro