Job Description
Role 1 : ML Lead:
Lead Responsibilities:
- Lead design and implementation of scalable CI/CD pipelines and deployment workflows
- Drive operational excellence across Dev Ops, ML Ops, automation, and observability
- Manage end‑to‑end model lifecycle: registering, inferencing, endpoint optimization
- Monitor incident management (ICMs), RCA, and DRI responsibilities
- Mentor engineers on debugging, scripting, Dev Ops, and reliability best practices
- Guide automation initiatives to improve resilience, reduce manual work, and increase delivery velocity
- Collaborate with cross-functional teams on architecture, release planning, and operational governance
- Establish and maintain governance, compliance, and production-readiness standards
- SPOC for all stakeholder interactions
Technical Skills required:
- Programming: Python / C# (either of one is mandatory)
- Ops & Reliability: DRI workflows, Incident Management (ICMs), production triaging
- ML Operations: Model registering, inferencing, endpoint configuration & optimization
- Dev Ops & Containerization: Docker, Git, CI/CD pipelines, Azure Dev Ops (ADO)
- Cloud & Infrastructure: Microsoft Azure, Azure Blob Storage, AML Studio
- Good to have :
- Data & Analytics: Kusto Query Language (KQL), ADX Dashboards
- Inference Optimization: v LLM for high‑performance LLM serving
Automation & Monitoring: Ops monitoring, deployment automation, pipeline orchestration
Lead Responsibilities:
- Lead design and implementation of scalable CI/CD pipelines and deployment workflows
- Drive operational excellence across Dev Ops, ML Ops, automation, and observability
- Manage end‑to‑end model lifecycle: registering, inferencing, endpoint optimization
- Monitor incident management (ICMs), RCA, and DRI responsibilities
- Mentor engineers on debugging, scripting, Dev Ops, and reliability best practices
- Guide automation initiatives to improve resilience, reduce manual work, and increase delivery velocity
- Collaborate with cross-functional teams on architecture, release planning, and operational governance
- Establish and maintain governance, compliance, and production-readiness standards
- SPOC for all stakeholder interactions
Technical Skills required:
- Programming: Python / C# (either of one is mandatory)
- Ops & Reliability: DRI workflows, Incident Management (ICMs), production triaging
- ML Operations: Model registering, inferencing, endpoint configuration & optimization
- Dev Ops & Containerization: Docker, Git, CI/CD pipelines, Azure Dev Ops (ADO)
- Cloud & Infrastructure: Microsoft Azure, Azure Blob Storage, AML Studio
- Good to have :
- Data & Analytics: Kusto Query Language (KQL), ADX Dashboards
- Inference Optimization: v LLM for high‑performance LLM serving
Automation & Monitoring: Ops monitoring, deployment automation, pipeline orchestration
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application