Kenyan Swahili Speech Emotion Recognition System
An innovative deep learning system that recognizes emotions in Swahili speech, addressing the underrepresentation of African languages in speech technology.
Kenyan Swahili Speech Emotion Recognition System
This project develops a state-of-the-art speech emotion recognition system specifically tailored for Kenyan Swahili, addressing the significant gap in speech technology for African languages. The system can identify emotions such as happiness, sadness, anger, fear, surprise, and neutrality from audio recordings.
Key Features
- Specialized African Language Support: Focused on Kenyan Swahili dialect and cultural emotion expressions
- Multi-Modal Analysis: Combines acoustic features, linguistic content, and cultural context
- Custom Dataset: Created a first-of-its-kind labeled dataset of Swahili emotional speech
- Transfer Learning: Adapted pre-trained models from high-resource languages to perform well on Swahili
- Lightweight Deployment: Optimized for deployment on devices with limited computational resources
Tech Stack
- Python
- TensorFlow/Keras
- Librosa (audio processing)
- PyTorch
- Transformers
- Flask API
- WebRTC
Model Performance
The system achieved 78% accuracy across six emotion categories, with particularly strong performance on happiness (86%) and anger (83%). This represents a significant improvement over baseline models not specifically adapted for Swahili speech patterns.
Cultural Significance
The project addresses important technological equity issues by:
- Creating resources for an underrepresented language spoken by 100+ million people
- Accounting for cultural differences in emotion expression
- Providing open-source tools that can be adapted to other African languages
- Enabling localized applications without requiring Western language proficiency
Applications
- Customer Service: Emotion detection for call centers serving Swahili-speaking regions
- Mental Health: Supportive tools for detecting emotional distress in clinical settings
- Education: Language learning applications with emotional feedback
- Accessibility: Emotional context for speech-to-text applications for hearing-impaired users
- Entertainment: Enhanced interactive media experiences
Dataset Contribution
The project includes a carefully curated dataset of Kenyan Swahili emotional speech recordings from 80+ speakers of various ages, genders, and regional backgrounds, with over 5,000 labeled utterances. This dataset has been made available to researchers to foster further development of African language technologies.