Job Description

Responsibilities

  • Design, build, and maintain highly available, scalable, and reliable production systems .
  • Ensure system uptime, performance, and reliability by proactively monitoring, troubleshooting, and resolving incidents.
  • Implement and manage monitoring, alerting, and observability solutions (metrics, logs, traces).
  • Automate operational tasks to reduce manual effort and improve system reliability.
  • Lead incident response , root cause analysis (RCA), and post-incident reviews.
  • Collaborate with development teams to define SLIs, SLOs, and error budgets .
  • Improve CI/CD pipelines to enable safe, fast, and reliable deployments.
  • Manage capacity planning, performance tuning, and cost optimization .
  • Ensure security best practices across infrastructure and application layers.
  • Participate in on-call rotations and provide production...

Apply for this Position

Ready to join Net2Source (N2S)? Click the button below to submit your application.

Submit Application