VoiceAI
- #Voicecloning
Real-time voice-to-voice translation system.
- Natural Language Processing

Impact
This platform facilitates a seamless process by integrating speech recognition, translation, and synthesis technologies, optimizing for real-time speed, supporting dual-user communication, employing Whisper, Helsinki models, Coqui TTS, and PaddleSpeech, with an interface through Streamlit. It enables effortless cross-country speech translation through recording and conversion.
Services we provided
Speech-to-Speech translation solution.
Tech Stack
Pytorch
Huggingface
PaddleSpeech
Whisper
Streamlit
EasyNMT
CoquiTTS
Challenges and Solutions
🧐 Challenges
- To develop a pipeline that incorporates three different technologies: speech recognition, text-to-text translation, and speech synthesis.
- To optimize models so the real-time speed of processing can be reached.
- To create an environment that can handle two users at the same time.
💡 Solutions
Our solution successfully implements the processing pipeline and environment for communication between two users.
- For speech recognition, we have used the Whisper model and its optimized version.
- For text-to-text translation, we have used models that were created by the Helsinki National University.
- For speech synthesis, we have used the Coqui TTS and PaddleSpeech.
- For the interface, we have used Streamlit.
User flow