Job Description
About ProArch:
At ProArch, we partner with businesses around the world to turn big ideas into better outcomes through IT services that span cybersecurity, cloud, data, AI, and app development. We’re 400+ team members strong across 3 countries (we call ourselves ProArchians)—and here’s what connects us all:
- A love for solving real business problems
- A belief in doing what’s right
What’s it like to work here?
- You’ll keep growing. You’ll work alongside domain experts who love to share what they know.
- You’ll be supported, heard, and trusted to make an impact.
- You’ll take on projects that touch industries, communities, and lives.
- You’ll have the time to focus on what matters most in your life outside of work.
At ProArch, you’ll be part of teams that design and deliver technology solutions solving real business challenges for our clients. With services spanning AI, Data, Application Development, Cybersecurity, Cloud & Infrastructure, and Industry Solutions, your work may involve building intelligent applications, securing business‑critical systems, or supporting cloud migrations and infrastructure modernization.
Every role here contributes to shaping outcomes for global clients and driving meaningful impact. You’ll collaborate with experts across data, AI, engineering, cloud, cybersecurity, and infrastructure—solving complex problems with creativity, precision, and purpose. You’ll join a culture rooted in technology, curiosity, and continuous learning. A place where we move fast, trust you to make an impact, encourage innovation, and support your growth.
ProArch is looking for a passionate and skilled Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, availability, and performance of our systems and services. You will collaborate with various teams to optimize production environments, troubleshoot performance issues, and implement best practices for service reliability. Your contributions will be critical to improving system uptime and enhancing user satisfaction.
Key Responsibilities:
- Monitor system performance and reliability, ensuring uptime meets organizational SLAs.
- Implement and maintain observability tools to gather metrics and logs for proactive issue detection.
- Troubleshoot and resolve complex production issues across various components of our infrastructure.
- Collaborate with software engineering teams to design and implement scalable, fault-tolerant architectures.
- Develop and maintain automation scripts for deployment, monitoring, and system management.
- Participate in on-call rotation to respond to production incidents and perform root cause analysis.
- Contribute to capacity planning and performance tuning to ensure optimal resource utilization.
- Document infrastructure, processes, and incident responses to promote knowledge sharing.
Requirements
Required Qualifications:
- 8+ years of experience as a Site Reliability Engineer, DevOps Engineer, or related role.
- Strong experience with cloud providers such as AWS, Azure, or GCP.
- Proficiency in scripting languages such as Python, Bash, or Go.
- Experience with container orchestration tools like Kubernetes.
- Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
- Solid understanding of networking and security principles.
- Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.
- Excellent problem-solving skills and a proactive attitude.
- Strong communication and teamwork skills, with an emphasis on collaboration.
Preferred Qualifications:
- Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
- Knowledge of service mesh architectures and modern microservices patterns.
- Background in software development and familiarity with Agile methodologies.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application