PhD Studentship - Improve automatic speech recognition using emerging 3-D facial mapping technology

University of Southampton

More jobs at University of Southampton



Position type:

Full time

04 Feb 2020
31 May 2020

Full details:

Reference number: SOU-1237920DA

The TrueDepth camera’s facial recognition system that has been recently introduced by Apple, Inc into iPhones and iPads provides ground-breaking new technology for user identification, but it has also potential applications far wider than that. 

In this PhD we are going to develop new technology based on 3-D facial mapping to enhance acoustic speech perception as well as automatic speech recognition. 

As every user of a smartphone knows, it is often difficult to follow a conversation on the phone when the background is noisy and this is even more so for the around 250 million people worldwide that suffer from a hearing loss. The objective in this PhD is to effectively integrate the use of the latest infrared and proximity sensors used in custom iPhones for real-time face mapping to improve audio-visual speech recognition and to enhance the speech quality and intelligibility in phone conversations in busy environments. 

The research will focus on machine learning principles to develop an effective end-to-end solution for an integrated speech and facial recognition algorithm using deep learning. This will then be used to improve accuracy and communication in these areas, through a precise feedback engine. 

The potential applications for such an improved integrated audio-visual speech recognition system are manifold and include improved human computer interaction for AI systems, supporting autonomous speech therapy and apps so support language learning.

This work is based on on-going research in our ISVR labs. We have 15 years’ experience developing signal processing algorithms to reduce noise and to enhance speech perception in noise. This project for the first time widens the scope of our work into the audio-visual world to combine too modal sensor streams combined to make previously unused information accessible.

This project is to our knowledge the first in the world attempting to make this connection and to use 3-D visual data for speech enhancement. Known competitors (Oxford, UCL) are using standard 2-D camera technology. By including the depth dimension, important information about the vocal tract, lip and tongue movement can be recovered that is lost with flat cameras.

As this research is inherently interdisciplinary between computer science, signal processing and hearing research, this project will require methodologies from current deep learning audio-visual speech recognition methodologies and broader historical lipreading and natural language processing techniques. We will engage with researchers from ECS (deep learning) as well as Psychology (multi-sensory research group) and Linguistics. The project fits into the ISVR strategy to help people communicate in difficult environments and joins up with the research of people from several research groups. This PhD has thus the potential to support linkages between research groups.


Entry Requirements

A very good undergraduate degree (at least a UK 2:1 honours degree, or its international equivalent).


Closing date: applications should be received no later than 31 May 2020.


Funding: full tuition fees for EU/UK students plus for UK students, an enhanced stipend of £15,009 tax-free per annum for up to 3.5 years. 


How To Apply

Applications should be made online here selecting “PhD Eng & Env (Full time)” as the programme. Please enter Stefan Bleeck under the proposed supervisor.

We aim to be an equal opportunities employer and welcome applications from all sections of the community.

Remember to mention Global Academy Jobs when you apply