VoiceAI – AI Portfolio

Impact

This platform facilitates a seamless process by integrating speech recognition, translation, and synthesis technologies, optimizing for real-time speed, supporting dual-user communication, employing Whisper, Helsinki models, Coqui TTS, and PaddleSpeech, with an interface through Streamlit. It enables effortless cross-country speech translation through recording and conversion.

Services we provided

Speech-to-Speech translation solution.

Tech Stack

Pytorch

Huggingface

PaddleSpeech

Whisper

Streamlit

EasyNMT

CoquiTTS

Challenges and Solutions

🧐 Challenges

To develop a pipeline that incorporates three different technologies: speech recognition, text-to-text translation, and speech synthesis.
To optimize models so the real-time speed of processing can be reached.
To create an environment that can handle two users at the same time.

💡 Solutions

Our solution successfully implements the processing pipeline and environment for communication between two users.

For speech recognition, we have used the Whisper model and its optimized version.
For text-to-text translation, we have used models that were created by the Helsinki National University.
For speech synthesis, we have used the Coqui TTS and PaddleSpeech.
For the interface, we have used Streamlit.

User flow

1. The user selects the input and output languages, along with the gender of the voice.

2. The user records the audio.

3. The recorded voice is transcribed into text in the original language and translated into the chosen languages.

4. The system generates an audio output in the selected language, using the voice of the chosen gender.