Senior Speech AI Engineer – On-Device ASR & Real-Time Pronunciation Intelligence

Actively Reviewing the Applications

Capital Numbers

India, Haryana, Gurugram Full-Time On-site

Posted 11 hours ago • Apply by June 16, 2026

Job Description

We are looking for a Senior Speech AI Engineer to build production-grade, on-device Automatic Speech Recognition (ASR) and real-time speech intelligence systems.

In this role, you’ll work across the full speech AI lifecycle — from audio data pipelines and model development to low-latency streaming inference and edge deployment. You’ll help deliver accurate transcription, phoneme-level alignment, and real-time pronunciation feedback optimized for mobile and edge devices.

Key Skills & Experience

✔ 5–8+ years in Speech AI / Audio ML

✔ Strong Python & PyTorch expertise

✔ Experience with ASR models such as Whisper, Conformer, RNN-T, wav2vec 2.0, HuBERT

✔ Knowledge of speech processing & phoneme alignment

✔ Experience optimizing models for edge / mobile deployment (TensorFlow Lite, ONNX, PyTorch Mobile, CoreML)

✔ Familiarity with libraries like NVIDIA NeMo, ESPnet, SpeechBrain, torchaudio.

Nice to Have

• Experience with multilingual or low-resource ASR

• Work on pronunciation assessment or speech learning tools

• Experience with datasets such as Common Voice or LibriSpeech

If you’re passionate about building fast, accurate, and privacy-first speech AI systems, we’d love to connect.