Job Description



  • Own end to end support for Domino Data Lab, GCP Dataproc, Galileo, and adjacent ML platforms.

  • Perform installation, upgrades, configuration, patching, and environment maintenance.

  • Monitor cluster health, resource utilization, job execution, performance, and alerts.

  • Troubleshoot ML workloads involving Spark, Python, R, GPUs, containers, and orchestrators based on the JIRA tickets (SLAs are very much applicable)

  • Manage access, security policies, service accounts, and platform governance.

  • Ensure high availability, optimal performance, and adherence to operational SLAs



Apply for this Position

Ready to join TechDigital Corporation? Click the button below to submit your application.

Submit Application