Job Description
Position Overview
As a Systems & Infrastructure Engineer, you will support and maintain our Linux-based data analytics platform. You will be responsible for system lifecycle management, platform reliability, containerized workloads, and operational compliance in a regulated environment. The ideal candidate has hands-on experience with Ubuntu Linux , understands modern containerization and orchestration technologies (Docker/Kubernetes) , and thrives in a distributed, technically complex, data-centric environment.
Essential Duties & Responsibilities:
Platform Operations & System Administration
- Install, configure, upgrade, and decommission:
- BIOS and firmware
- Ubuntu/Linux operating systems
- System-level packages, software applications, modules, and dependencies
- Manage and maintain virtualization or container environments, including Docker and Kubernetes workloads
- Monitor system resource utilization, scalability, and performance of compute nodes and platform services.
- Perform routine system health checks, vulnerability assessments, and patch management
- Troubleshoot and resolve Linux OS issues, compute environment problems, network connectivity concerns, storage issues, and node-level failures
Platform Management & User Operations
- Handle daily operational requests including:
- User management, access provisioning, and permissions updates
- Data access requests and entitlement adjustments
- Break-fix support and incident response
- Ticket queue management, documenting work in accordance with SLAs
- Collaborate with engineering, analytics, and DevOps teams to support environment stability and improvements
- Ensure high availability of critical platform services used by computation, data analysis, and ETL workflows
Security, Compliance & Audit Support
- Maintain environment compliance with SOC 2, HIPAA, and PCI requirements through year-round operational discipline
- Implement and validate security controls such as:
- Patch management
- Access controls and logging
- Vulnerability remediation
- Configuration management and change tracking
- Document platform changes, architecture, and controls to support compliance
- Provide audit support annually through evidence collection, system reports, configuration exports, and control demonstrations
Automation & Reliability Engineering
- Develop automation scripts using Bash, Python, or similar languages to streamline operational processes
- Enhance system reliability through:
- Infrastructure-as-Code templates (e.g., Terraform, Ansible)
- Automated deployments and environment builds
- Monitoring and alerting improvements
- Participate in capacity planning, performance tuning, and architectural enhancements for high-volume compute and analytics workloads
Systems Engineering in a Computational Analytics Environment
- Manage compute clusters supporting data science, analytics, and batch workloads
- Oversee job scheduling environments (Kubernetes jobs, Cron, workflow schedulers)
- Support distributed file systems, object storage, or high-throughput data pipelines as needed
- Maintain security and operational continuity across multi-node environments
Required Skills:
Required
- 3–6 years of hands-on experience with Ubuntu/Linux system administration
- Working knowledge of Docker and Kubernetes in a production environment
- Experience with system patching, kernel upgrades, firmware/BIOS updates, and environment hardening
- Familiarity with security best practices, access control, and compliance-driven operations
- Strong troubleshooting skills across systems, networking, and application layers
- Scripting experience (Bash, Python, or similar)
- Experience working in remote, distributed teams
Preferred
- Experience supporting a high-performance computing (HPC), large-scale analytics, or distributed compute environment
- Exposure to CI/CD pipelines, GitOps, or automated infrastructure provisioning
- Understanding of SOC2/HIPAA/PCI controls, audits, or regulated computing environments
- Experience with monitoring tools (Prometheus, Grafana, Zabbix, etc.)
Soft Skills
- Strong communication skills and ability to document clearly
- Attention to detail, especially regarding compliance requirements
- Ability to work independently, manage priorities, and meet operational SLAs
- Proactive mindset with a drive to automate and improve platform
What's in it for You?
- Opportunity to work in the booming field of cloud, data management and analytics alongside some of the brightest minds in the industry
- Opportunity to work with cutting-edge technology
- Chance to work with a rapidly expanding US tech company
- Flexible schedule and paid time off
- Competitive salary and benefits package
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application