Job Description
Drive reliability and operational maturity for Kubernetes workloads on GKE through safe rollout patterns, high-signal observability, resilient IaC, and effective incident response. Collaborate with developers to harden CI/CD pipelines and address infrastructure concerns within application code.
Key responsibilities:
Design and maintain resilient deployment patterns (blue-green, canary, GitOps syncs) across services. Instrument and optimize logs, metrics, traces, and alerts to reduce noise and improve signal. Review backend code (e.g., Django, Node.js, Go, Java) with a focus on infra touchpoints like database usage, timeouts, error handling, and memory consumption. Tune and troubleshoot GKE workloads, HPA configs, network policies, and node pool strategies. Improve or author Terraform modules for infrastructure resources (e.g., VPC, CloudSQL, Secrets, Pub/Sub). Diagnose production issues from logs, tra...
Apply for this Position
Ready to join Orion Innovation? Click the button below to submit your application.
Submit Application