Kenyan Swahili Speech Emotion Recognition System

This project develops a state-of-the-art speech emotion recognition system specifically tailored for Kenyan Swahili, addressing the significant gap in speech technology for African languages. The system can identify emotions such as happiness, sadness, anger, fear, surprise, and neutrality from audio recordings.

Key Features

Specialized African Language Support: Focused on Kenyan Swahili dialect and cultural emotion expressions
Multi-Modal Analysis: Combines acoustic features, linguistic content, and cultural context
Custom Dataset: Created a first-of-its-kind labeled dataset of Swahili emotional speech
Transfer Learning: Adapted pre-trained models from high-resource languages to perform well on Swahili
Lightweight Deployment: Optimized for deployment on devices with limited computational resources

Tech Stack

Python
TensorFlow/Keras
Librosa (audio processing)
PyTorch
Transformers
Flask API
WebRTC

Model Performance

The system achieved 78% accuracy across six emotion categories, with particularly strong performance on happiness (86%) and anger (83%). This represents a significant improvement over baseline models not specifically adapted for Swahili speech patterns.

Cultural Significance

The project addresses important technological equity issues by:

Creating resources for an underrepresented language spoken by 100+ million people
Accounting for cultural differences in emotion expression
Providing open-source tools that can be adapted to other African languages
Enabling localized applications without requiring Western language proficiency

Applications

Customer Service: Emotion detection for call centers serving Swahili-speaking regions
Mental Health: Supportive tools for detecting emotional distress in clinical settings
Education: Language learning applications with emotional feedback
Accessibility: Emotional context for speech-to-text applications for hearing-impaired users
Entertainment: Enhanced interactive media experiences

Dataset Contribution

The project includes a carefully curated dataset of Kenyan Swahili emotional speech recordings from 80+ speakers of various ages, genders, and regional backgrounds, with over 5,000 labeled utterances. This dataset has been made available to researchers to foster further development of African language technologies.