Job Description
Key responsibilities
- Design, build, and operate OCI based Kubernetes platforms for AI/ML/LLM services with strong security, observability, and reliability.
- Implement and manage IaC/GitOps for repeatable environments, model/inference deployments, and traffic policies.
- Enable progressive delivery (blue green/canary/A B) with metric gated rollouts and fast rollback.
- Stand up and optimize LLM serving stacks, vector search, and RAG pipelines; enforce guardrails and monitor quality/cost SLOs.
- Integrate Oracle Databases and OCI services securely; manage secrets, credentials, and network segmentation.
- Establish SLOs, dashboards, runbooks, and incident/DR procedures; lead operational readiness reviews and postmortems.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application