Job Description

 Key Responsibilities

Design, train, and optimize audio and video ML models , including classification, detection, segmentation, generative models, speech processing, and multimodal architectures.

● Develop and maintain data pipelines for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion.

● Implement model evaluation frameworks that measure robustness, latency, accuracy, and overall performance across real-world conditions.

● Work with product teams to transform research prototypes into production-ready models with reliable inference performance.

● Optimize models for scalability, low latency, and edge/cloud deployment , including quantization, pruning, and hardware-aware tuning.

● Collaborate with cross-functional teams to define technical requirements and experiment roadmaps.

● Monitor and troubleshoot production models, ensuring reliability and continuous improvement.

● Stay current with trends in deep learning, computer vision, speech processing, and multimodal AI .


Required Qualifications

● Bachelor's or Master's degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus).

● Strong experience with deep learning frameworks such as PyTorch or TensorFlow.

● Proven experience training and deploying audio or video models , such as: ○ Speech recognition, speech enhancement, speaker identification

○ Audio classification, event detection

○ Video classification, action recognition, tracking

○ Video-to-text, lip reading, multimodal fusion models

● Solid understanding of neural network architectures (CNNs, RNNs, Transformers, diffusion models, etc.).

● Proficiency in Python , along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning).

● Experience working with GPU/TPU environments , distributed training, and model optimization.

● Ability to write clean, maintainable production-quality code.


Preferred Qualifications

● Experience with foundation models or multimodal transformers (e.g., audio-language, video-language).

● Background in signal processing , feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding.

● Experience with MLOps tools (e.g., MLflow, Weights & Biases, Kubeflow, Airflow).

● Knowledge of cloud platforms (AWS, GCP, Azure) and scalable model serving frameworks.

● Experience with real-time audio/video processing for streaming applications.

● Publications, open-source contributions, or competitive ML achievements are a plus.

Experience:

Min 2 years

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application