Job Description

Lead Machine Learning Operations Engineer

Job Summary:

This role is a senior-level role on the Innovation Analytics & AIteam, leading the design and implementation of enterprise-grade MLOps infrastructure to manage the end-to-end ML lifecycle with a focus on automation, scalability, and governance. This role partners with Solution Architects to align business and technical requirements with enterprise architecture, ensuring seamless integration of diverse data sources. The Lead MLOps Engineer drives the adoption of modern MLOps frameworks, utilizing tools like Databricks, MLFlow, Azure ML, and CI/CD pipelines to deliver scalable and reliable AI/ML systems. This position leads design reviews, defines standards for maintainability and performance, and acts as a key contributor to innovation by integrating emerging technologies. The Lead MLOps Engineer mentors cross-functional teams, champions best practices, and influences enterprise-wide initiatives through technical leadership, fostering a culture of excellence, collaboration, and continuous improvement.

Essential Functions & Responsibilities:

Description:

  • End-to-End ML Lifecycle Leadership & MLOps Infrastructure Design:
  • Lead the design and implementation of enterprise-grade MLOps infrastructure, managing the full ML lifecycle including model versioning, deployment, monitoring, and performance optimization. Develop scalable MLOps processes using Databricks, MLFlow, Azure ML, and CI/CD pipelines to automate experimentation tracking, testing, and model rollout. Write high-quality, testable code in Python and R using libraries like TensorFlow, PyTorch, and scikit-learn, applying NLP techniques and transformer-based architectures like GPT for real-world applications.
  • Ensure infrastructure readiness, observability, and governance to support robust AI deployments in enterprise environments.

Operational Leadership & System Optimization:

  • Proactively identify recurring operational issues, perform root cause analysis, and implement 20% June 20, 2025 Version: 1 Page 2 of 7 long-term solutions to reduce incident frequency and improve system reliability.
  • Act as a liaison with DBAs and Infrastructure teams to ensure proper configuration and support for scalable and resilient Data Analytics solutions.
  • Continuously monitor system performance and execute optimization strategies to maintain high availability, data integrity, and operational efficiency. Mentor developers, data analysts, and data scientists in effective data interaction and troubleshooting practices.

Innovation & Strategic Technology Adoption:

  • Lead innovation efforts by staying at the forefront of trends in MLOps, LLMOps, and AI, driving the adoption of cutting-edge technologies across teams.
  • Experiment with emerging tools and integrate innovative solutions into enterprise AI/ML strategies, supporting initiatives like application security, architecture modernization, and test automation. Act as a subject matter expert in modern platforms, lead internal learning forums, and influence broader technology adoption across the organization.

Technical Leadership & Mentorship:

  • Foster a collaborative team environment, encouraging active contributions during design and implementation phases, and serving as a role model in delivery accountability and process discipline.
  • Mentor and coach team members, providing meaningful feedback to promote a culture of excellence, innovation, and continuous improvement. Facilitate clear communication, proactively remove impediments, and drive alignment across teams to ensure successful enterprise-wide outcomes.

Cross-Functional Collaboration & Enterprise Impact:

  • Partner with Solution Architects to assess business and technical requirements, ensuring seamless integration of diverse data sources and alignment with enterprise architecture.
  • Lead design reviews, define standards for maintainability and performance, and influence enterprise-wide initiatives through technical leadership. Collaborate with data scientists to transition prototype models into production, ensuring scalability, reliability, and governance of AI/ML systems.

Skills:

  • Expert-level proficiency in Python and R programming (TensorFlow, PyTorch, scikit-learn)
  • Advanced expertise with MLOps tools (MLFlow, Azure ML, Databricks) and CI/CD pipelines
  • Deep knowledge of NLP techniques and transformer-based architectures (e.g., GPT)
  • Strong understanding of infrastructure readiness, observability, and governance for AI/ML systems
  • Proven ability to design and implement scalable, enterprise-grade MLOps processes
  • Exceptional leadership, mentorship, and communication skills
  • Experience with enterprise initiatives (application security, architecture modernization, test automation)
  • Strategic mindset for driving innovation and adopting emerging technologies
    Minimum Qualifications: Except where required by licensure or regulation a combination of comparable education and experience may be used to satisfy qualification requirements.

Education:

  • Minimum Education Required to Perform Essential Job Functions
  • 4 Year / bachelor's degree
  • Specific Degree, if required (ex - Engineering, Juris Doctorate, etc)
  • Degree Bachelors Degree, preferably in Computer Science, Data Science, Machine Learning, or equivalent experience

Experience:

  • Minimum Years of Experience Required to Per:form Essential Job Functions:5

Additional Experience Qualifier (optional):

  • Extensive experience in designing and managing MLOps infrastructure, model deployment, and performance optimization in production environments.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application