Job Description
Responsibilities
- Design, build, and maintain highly available, scalable, and reliable production systems .
- Ensure system uptime, performance, and reliability by proactively monitoring, troubleshooting, and resolving incidents.
- Implement and manage monitoring, alerting, and observability solutions (metrics, logs, traces).
- Automate operational tasks to reduce manual effort and improve system reliability.
- Lead incident response , root cause analysis (RCA), and post-incident reviews.
- Collaborate with development teams to define SLIs, SLOs, and error budgets .
- Improve CI/CD pipelines to enable safe, fast, and reliable deployments.
- Manage capacity planning, performance tuning, and cost optimization .
- Ensure security best practices across infrastructure and application layers.
- Participate in on-call rotations and provide production...
Apply for this Position
Ready to join Net2Source (N2S)? Click the button below to submit your application.
Submit Application