Machine Learning Engineer (Audio & Video Models)

📍 Mumbai, Maharashtra, India

Full-time Other-General Posted January 28, 2026

Apply Now Similar Jobs

Job Description

 Key Responsibilities  
● Design, train, and optimize audio and video ML models , including classification, detection, segmentation, generative models, speech processing, and multimodal architectures.  
● Develop and maintain data pipelines  for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion.  
● Implement model evaluation frameworks  that measure robustness, latency, accuracy, and overall performance across real-world conditions.  
● Work with product teams to transform research prototypes into production-ready models  with reliable inference performance.  
● Optimize models for scalability, low latency, and edge/cloud deployment , including quantization, pruning, and hardware-aware tuning.  
● Collaborate with cross-functional teams to define technical requirements and experiment roadmaps.  
● Monitor and troubleshoot production models, ensuring reliability and continuous improvement.  
● Stay current with trends in deep learning, computer vision, speech processing, and multimodal AI .  

 
Required Qualifications   
● Bachelor's or Master's degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus).  
● Strong experience with deep learning frameworks  such as PyTorch or TensorFlow.  
● Proven experience training and deploying audio or video models , such as: ○ Speech recognition, speech enhancement, speaker identification  
○ Audio classification, event detection  
○ Video classification, action recognition, tracking  
○ Video-to-text, lip reading, multimodal fusion models  
● Solid understanding of neural network architectures  (CNNs, RNNs, Transformers, diffusion models, etc.).  
● Proficiency in Python , along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning).  
● Experience working with GPU/TPU environments , distributed training, and model optimization.  
● Ability to write clean, maintainable production-quality code.  

 
Preferred Qualifications   
● Experience with foundation models  or multimodal transformers  (e.g., audio-language, video-language).  
● Background in signal processing , feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding.  
● Experience with MLOps tools  (e.g., MLflow, Weights & Biases, Kubeflow, Airflow).  
● Knowledge of cloud platforms  (AWS, GCP, Azure) and scalable model serving frameworks.  
● Experience with real-time audio/video processing  for streaming applications.  
● Publications, open-source contributions, or competitive ML achievements are a plus.  
Experience:   
Min 2 years  

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application

Job Details

Location

Mumbai, Maharashtra, India

Job Type

Full-time