Job Description

Overview

Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments.

Responsibilities

  • System Architecture & Management: Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools (RKE2). Includes components such as Ingress, Kong, Artifactory, and Sonar.
  • Observability & Monitoring: Implement and manage observability solutions with Prometheus, Grafana, Splunk, and Elastic to ensure deep visibility into system health and performance, including in air-gapped settings.
  • Compliance & Optimization: Ensure deployments meet stringent compliance standards and are optimized for performance and security.
  • Code Quality & Security: Perform regular code quality analysis and security assessments using Sonar to identify and mitigate vulnerabilities.
  • Incident Response: Collaborate with leads and specialized teams to resolve incidents quickly and improve resilience and recovery procedures.
  • Documentation: Create and maintain documentation for system configurations, runbooks, and disaster recovery plans for managing systems in sensitive environments.

Required Skills and Qualifications

  • 8+ years of Site Reliability Experience.
  • Experience with Kubernetes and Rancher.
  • Technical Expertise: Proficiency with RKE2, Kubernetes, Ingress, Kong, Artifactory, Prometheus, Grafana, Splunk, Elastic, and Sonar.
  • SRE & Observability: Strong background in Site Reliability Engineering and implementing comprehensive observability strategies.
  • Secure Environments: Experience in air-gapped or zero-connectivity environments and protecting classified data.
  • Troubleshooting: Ability to troubleshoot and optimize complex, multi-tenant infrastructures under pressure.

Preferred Qualifications

  • Relevant SRE or DevOps certifications (e.g., CKAD, CKA).
  • Experience in government or defense-related SRE roles.
  • Experience with Rancher and its ecosystem.

Seniority level

  • Mid-Senior level

Employment type

  • Full-time

Job function

  • Engineering and Information Technology

Industries

  • IT Services and IT Consulting

#J-18808-Ljbffr

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application