Job Description

OverviewDevOps & ML Ops Engineer would be responsible for developing and maintaining scalable, stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure, deployment, and continuous integration/continuous delivery (CI/CD) processes for our ML services.

Responsibilities

Manage resource allocation and workload scheduling for multiple ML services, ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.

Maintain VM environments and manage OS updates, keep up-to-date VM inventory

Work alongside the Dev and QA team to detect hot spots in our applications and set preventative measure before it becomes a live issue.

Troubleshooting and provide solutions for system configurations

Plan, execute and test disaster recovery

Monitor and examine all application, performance, event, and system logs to assist in troubles...

Apply for this Position

Ready to join TransPerfect? Click the button below to submit your application.

Submit Application