Job Description
1 day ago Be among the first 25 applicants
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting‑edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Join our team as a Senior AI Platform Engineer , where you will design, deploy, and maintain next‑generation Databricks platforms on AWS to support advanced analytics and machine learning workflows.
You will collaborate closely with data scientists and ML engineers to deliver a seamless developer experience on the Lakehouse. Apply now to contribute to cutting‑edge AI infrastructure development.
Responsibilities
- Design and implement scalable Databricks platform solutions for analytics, ML, and GenAI workflows across development, testing, and production environments
- Administer and optimize Databricks workspaces including cluster policies, pools, job clusters, autoscaling, and GPU/accelerated compute
- Implement and manage Unity Catalog governance including metastores, catalogs, schemas, data sharing, masking, lineage, and access controls
- Build and maintain Infrastructure as Code using Terraform for reproducible platform provisioning and configuration
- Implement CI/CD pipelines for notebooks, libraries, DLT pipelines, and ML assets using GitHub Actions and Databricks APIs
- Standardize experiment tracking, model registry workflows, and deploy model serving endpoints with monitoring and rollback capabilities
- Develop and optimize Delta Lake batch and streaming pipelines using Auto Loader, Structured Streaming, and DLT, enforcing data quality and SLAs
- Collaborate with cross‑functional teams to integrate platform capabilities and ensure best‑in‑class developer experience
- Monitor platform performance, troubleshoot issues, and implement improvements to ensure reliability and scalability
- Maintain documentation and automation runbooks for platform operations and governance
- Coordinate with security teams to enforce data governance, encryption, and compliance policies
- Promote best practices for coding, testing, and deployment within the platform engineering team
- Drive continuous improvement in platform automation and operational efficiency
- Engage with stakeholders to gather requirements and provide technical guidance
- Mentor junior engineers and share knowledge of platform technologies
Requirements
- Proven hands‑on experience administering Databricks on AWS including Unity Catalog governance and enterprise integrations, with 3+ years in platform engineering
- Strong foundation in AWS services such as VPC, IAM, KMS, S3, CloudWatch, and network architecture
- Proficiency with Terraform including databricks provider, and experience with Infrastructure as Code for cloud resources
- Advanced Python and SQL skills with experience packaging libraries and managing notebooks and repos
- Experience with MLflow for experiment tracking, model registry, and familiarity with model serving endpoints
- Knowledge of Delta Lake, Auto Loader, Structured Streaming, and DLT
- Experience implementing DevOps automation, CI/CD pipelines, and using GitHub Actions or similar tools
- Strong Git and GitHub proficiency including code review and branching strategies
- Familiarity with REST APIs, Databricks CLI, and scripting for automation
- Excellent communication and stakeholder management skills
- Ability to work independently and within a distributed team environment
- Detail‑oriented with strong problem‑solving and organizational skills
- English proficiency at B2 (Upper‑Intermediate) level or higher
Nice to have
- Experience with AWS EKS and Kubernetes
- Familiarity with MLOps practices and pipeline automation
- Knowledge of attribute‑based access control and advanced data governance concepts
- Experience with Secrets management and SSO/SCIM provisioning
- Certification in AWS or Databricks platform engineering
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award‑winning culture recognized by Glassdoor, Newsweek and LinkedIn
Seniority level
- Mid‑Senior level
Employment type
- Full‑time
Job function
- Information Technology, Engineering, and Business Development
Industries
- Software Development, IT Services and IT Consulting, and Pharmaceutical Manufacturing
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application