Artificial Intelligence in Speech Processing

Prof. Dr. Eng. Mircea Giurgiu
Team Leader

Eng. Oscar Gal
Junior expert
Resources and technologies for automatic recognition and text-to-speech synthesis (T1)
Obiective: Automatic speech recognition, text-to-speech synthesis, and automatic identification of DeepFake audio documents
Research challenges / Novelty / Innovation
• Automatic speech recognition
• Text-to-speech synthesis
• Automatic identification of AI-generated audio documents (audio DeepFake)
Research results:
• Audio dataset corpus extended by an additional 50 hours of speech for ASR
• Improved performance speech recognition system
• Voice transcription system and application (demonstrator)
• High-quality text-to-speech synthesis system
• Voice synthesis system integrated into a real application
• Models for automatic detection of Audio DeepFake
Inovation:
• Contributions to the expansion of speech signal datasets for training speech recognition systems (estimated size: more than 100 hours of speech annotated with text – current systems have been trained with up to 60 hours of speech)
• Development of robust models based on deep neural network architectures that integrate both acoustic recognition and natural language modeling in Romanian
• Development of high-quality text-to-speech synthesis models in Romanian
• Original contributions to automatic Audio DeepFake detection, language-independent