AdilShamim8/BUET-CSE-Fest-2026

GitHubGH

Bengali long-form speech recognition and speaker diarization using fine-tuned Whisper models and pyannote.audio.

View on GitHub

Defensibility

2.0/10

stars

Platform Dominationhigh

Market Consolidationhigh

Displacement Horizon6 months

REASONING

This project is a competition entry for the BUET CSE Fest 2026. While it successfully integrates state-of-the-art components like OpenAI's Whisper and pyannote.audio for a specific linguistic context (Bengali), it does not introduce a novel architecture or a proprietary moat. The defensibility is low (2) because the workflow—fine-tuning Whisper and piping it into a diarization library—is the standard industry pattern for ASR tasks today. With only 3 stars and no forks, it lacks the community momentum or data gravity required to resist displacement. Frontier labs and commercial providers like Google Cloud Speech-to-Text, AssemblyAI, and Deepgram are rapidly improving their multi-lingual performance; for example, newer iterations of Whisper or commercial APIs often outperform niche fine-tuned models on long-form audio due to better noise handling and massive training sets. The primary value here is as an academic reference or a local benchmark for the Bengali language, but it faces high risk from platform domination as cloud providers simplify the deployment of localized ASR.

COMPOSABILITY

TECH STACK

PythonPyTorchWhisper (OpenAI)pyannote.audioHugging Face TransformersGradio

INTEGRATION

reference_implementation

speech_recognitionspeaker_diarizationbengali_nlpfine_tuned_llmaudio_processing

READINESS

Composabilityapplication

Depth