Job Description
We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles , ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions that improve service availability, scalability, and performance, while mentoring others and driving best practices across teams.
ResponsibilitiesDesign, develop, and deploy software to improve the reliability, scalability, and efficiency of Oracle Cloud Infrastructure (OCI).
Build automation frameworks to eliminate manual toil and prevent recurring issues.
Implement observability practices including metrics, logging, and tracing to ensure system health and proactive monitoring.
Lead deployments, capacity planning, and demand forecasting to support large‑scale distributed systems.
Conduct performance analysis, system tuning, and incident response to maintain service excellence.
Influence architecture and standards for distributed systems, driving best practices across teams.
Provide technical leadership and mentorship to engineers, fostering a culture of reliability and innovation.
Career Level - IC4
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application