Echipe-Inteligență artificială în procesarea vorbirii – HUB Român de Inteligență Artificială

Artificial Intelligence in Speech Processing

Prof. Dr. Eng. Mircea Giurgiu

Team Leader

Eng. Oscar Gal

Junior expert

Resources and technologies for automatic recognition and text-to-speech synthesis (T1)

Obiective: Automatic speech recognition, text-to-speech synthesis, and automatic identification of DeepFake audio documents

Research challenges / Novelty / Innovation
• Automatic speech recognition
• Text-to-speech synthesis
• Automatic identification of AI-generated audio documents (audio DeepFake)

Research results:
• Audio dataset corpus extended by an additional 50 hours of speech for ASR
• Improved performance speech recognition system
• Voice transcription system and application (demonstrator)
• High-quality text-to-speech synthesis system
• Voice synthesis system integrated into a real application
• Models for automatic detection of Audio DeepFake

Inovation:
• Contributions to the expansion of speech signal datasets for training speech recognition systems (estimated size: more than 100 hours of speech annotated with text – current systems have been trained with up to 60 hours of speech)
• Development of robust models based on deep neural network architectures that integrate both acoustic recognition and natural language modeling in Romanian
• Development of high-quality text-to-speech synthesis models in Romanian
• Original contributions to automatic Audio DeepFake detection, language-independent