Job Description

Total Exp : 5 to 10 Years

Location : Bangalore


Role: Lead development and optimization of speech‑based machine learning models (STT/TTS) using modern open‑source frameworks.

Responsibilities:

  • Drive end‑to‑end development of speech‑related ML components and model pipelines.
  • Implement fine‑tuning, optimization, and evaluation workflows for AI models.
  • Ensure model efficiency, scalability, and readiness for production environments.
  • Work closely with data, engineering, and platform teams to align training and deployment workflows.
  • Provide technical leadership on model performance, experimentation, and best practices.

Required Skills :

  • Deep expertise in PyTorch: training loops, fine‑tuning, distributed training (DDP, Accelerate).
  • ONNX Runtime & TensorRT experience for model conversion, optimization, and accelerated inference.
  • Understanding of speech/audio processing: spectrograms, MFCCs, VAD, noise reduction.
  • Hands‑on experience with at least one speech framework:
  • Whisper, ESPnet, Coqui TTS, NVIDIA NeMo, or similar.
  • Experience training ML models on GPU clusters (A100, V100) and managing long‑duration jobs.
  • Familiarity with Hugging Face ecosystem: datasets, tokenizers, training utilities.

Preferred Skills:

  • Experience with multilingual or multi‑accent datasets.
  • Basic proficiency in C++ for inference optimization or runtime improvements.
  • Exposure to deploying real‑time STT/TTS or low‑latency ML systems.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application