Speech Data Scientist

Bengaluru, Karnataka, India

1 month ago

Applicants: 0

Apply Now

Artificial Intelligence Design Whisper Denoising Testing

Salary Not Disclosed

N/A

Job Description

Industry & Sector: Artificial Intelligence & Speech Technology ? developing production-grade voice AI, automatic speech recognition (ASR), and conversational AI used across enterprise products, contact centres, and consumer voice experiences. Location & Role: Bangalore, Karnataka, India Full-time. Primary job title: Senior Speech Scientist (ASR & Speech ML). About The Opportunity Join a high-velocity engineering team building robust, low-latency speech and voice solutions for large-scale deployments. You will design and ship state-of-the-art ASR models and production pipelines?bridging classical signal-processing foundations with modern transformer-based speech models to drive measurable product impact. Role & Responsibilities Lead design, training and optimisation of ASR systems?end-to-end and hybrid?using transformer and sequence modeling (Wav2Vec 2.0, Whisper, CTC, attention-based encoders/decoders). Develop and evaluate speech pre-processing and DSP pipelines (feature extraction, augmentation, denoising, VAD) to improve robustness across noisy, multilingual inputs. Prototype and productionise model-serving solutions: containerised inference, latency optimisation, batching, and autoscaling for cloud and edge deployments. Collaborate with data engineers and linguists to curate datasets, define annotation guidelines, and run rigorous evaluation (WER, CER, streaming metrics) and error-analysis cycles. Implement reproducible training workflows, CI/CD for models, monitoring for drift and performance, and automation for retraining and A/B evaluation. Mentor peers, author engineering-excellence patterns (testing, observability), and present technical results to product and stakeholder teams. Skills & Qualifications Must-Have 5+ years in speech recognition or related audio ML roles with proven production impact. Strong DSP and audio analysis fundamentals (feature engineering, spectrograms, filtering, VAD). Hands-on experience with PyTorch and/or TensorFlow for building and training ASR models. Practical knowledge of transformer-based speech models (Wav2Vec 2.0, Whisper) and sequence losses (CTC), plus RNN/CNN architectures. Proficient in Python; experience with C++/Java for production deployments is highly desirable. Experience deploying models in cloud environments (AWS/GCP) and container orchestration (Docker/Kubernetes); familiar with MLOps tooling and CI/CD. Preferred Background in multilingual ASR, low-resource languages, or on-device/edge inference optimisation. Experience with large-scale data pipelines, annotation platforms, and semi-supervised / self-supervised learning workflows. Familiarity with production monitoring (prometheus/grafana), model explainability, and privacy-preserving ML techniques. Benefits & Culture Highlights High-autonomy engineering culture with strong emphasis on ownership, mentorship, and career growth. Opportunity to influence product direction and work on state-of-the-art speech models at scale. Competitive compensation, flexible hybrid work, and learning budget for conferences and training. We are seeking a results-oriented Speech Scientist who thrives on technical ownership and delivering dependable voice AI in real-world settings. Apply if you want to push ASR boundaries and build production-grade speech systems that scale. Skills: automatic speech recognition,tensorflow,docker,transformers for asr,lstm,language modeling,mlops,digital signal processing,python,ctc loss,attention mechanisms,kubernetes,rnn,pytorch