PTE Speech Grading

#Education

Automated ML solution for English speech transcription and grading.

Data Analysis
Machine Learning

Impact

The solution enables automatic, quick, and reliable grading of people’s speech, facilitating their English language studies and skill improvement.
It supports scalability, making it capable of accommodating a higher number of users.

Services we provided

An automated ML solution for transcribing and grading English audio, utilizing MFCC features and Whisper’s architecture to enable efficient and scalable speech assessment.

Tech Stack

Python

Tensorflow

Huggingface

Pandas

NumPy

Flask

Challenges and Solutions

🧐 Challenges

Procuring data for training the model to accurately grade input audio
Developing and training the models for transcribing audio and grading speech
Creating a pipeline for processing audio of varying length

💡 Solutions

Scraped diverse English learning data, and trained rubric-specific models.  Architecture entailed:

Extracting MFCC features from the audio.
Encoding them using the encoder from OpenAi’s Whisper into an embedding.
Decoding the embedding using several convolutional layers with residual connections and finally, several dense layers to obtain the final grade.
Producing speech transcription using OpenAi’s base Whisper model.

User flow

1. The user is presented with a text or listens to a recording of a text.

2. The user reads the text aloud or repeats the recording.

3. Our solution processes the user's speech and produces a transcription and grades.

4. The user is presented with the results and receives advice on how to improve.