Job Description

Position Overview

As a Systems & Infrastructure Engineer, you will support and maintain our Linux-based data analytics platform. You will be responsible for system lifecycle management, platform reliability, containerized workloads, and operational compliance in a regulated environment. The ideal candidate has hands-on experience with Ubuntu Linux , understands modern containerization and orchestration technologies (Docker/Kubernetes) , and thrives in a distributed, technically complex, data-centric environment.  

Essential Duties & Responsibilities:

Platform Operations & System Administration  

  • Install, configure, upgrade, and decommission: 
    • BIOS and firmware 
    • Ubuntu/Linux operating systems 
    • System-level packages, software applications, modules, and dependencies 
  • Manage and maintain virtualization or container environments, including Docker and Kubernetes workloads
  • Monitor system resource utilization, scalability, and performance of compute nodes and platform services.
  • Perform routine system health checks, vulnerability assessments, and patch management
  • Troubleshoot and resolve Linux OS issues, compute environment problems, network connectivity concerns, storage issues, and node-level failures

Platform Management & User Operations  

  • Handle daily operational requests including: 
    • User management, access provisioning, and permissions updates 
    • Data access requests and entitlement adjustments 
    • Break-fix support and incident response 
    • Ticket queue management, documenting work in accordance with SLAs 
  • Collaborate with engineering, analytics, and DevOps teams to support environment stability and improvements
  • Ensure high availability of critical platform services used by computation, data analysis, and ETL workflows

Security, Compliance & Audit Support  

  • Maintain environment compliance with SOC 2, HIPAA, and PCI requirements through year-round operational discipline
  • Implement and validate security controls such as: 
    • Patch management 
    • Access controls and logging 
    • Vulnerability remediation 
    • Configuration management and change tracking 
  • Document platform changes, architecture, and controls to support compliance
  • Provide audit support annually through evidence collection, system reports, configuration exports, and control demonstrations

Automation & Reliability Engineering  

  • Develop automation scripts using Bash, Python, or similar languages to streamline operational processes
  • Enhance system reliability through: 
    • Infrastructure-as-Code templates (e.g., Terraform, Ansible) 
    • Automated deployments and environment builds 
    • Monitoring and alerting improvements 
  • Participate in capacity planning, performance tuning, and architectural enhancements for high-volume compute and analytics workloads

Systems Engineering in a Computational Analytics Environment  

  • Manage compute clusters supporting data science, analytics, and batch workloads
  • Oversee job scheduling environments (Kubernetes jobs, Cron, workflow schedulers)
  • Support distributed file systems, object storage, or high-throughput data pipelines as needed
  • Maintain security and operational continuity across multi-node environments

Required Skills:

Required  

  • 3–6 years of hands-on experience with Ubuntu/Linux system administration
  • Working knowledge of Docker and Kubernetes in a production environment
  • Experience with system patching, kernel upgrades, firmware/BIOS updates, and environment hardening
  • Familiarity with security best practices, access control, and compliance-driven operations
  • Strong troubleshooting skills across systems, networking, and application layers
  • Scripting experience (Bash, Python, or similar)
  • Experience working in remote, distributed teams

Preferred  

  • Experience supporting a high-performance computing (HPC), large-scale analytics, or distributed compute environment 
  • Exposure to CI/CD pipelines, GitOps, or automated infrastructure provisioning
  • Understanding of SOC2/HIPAA/PCI controls, audits, or regulated computing environments
  • Experience with monitoring tools (Prometheus, Grafana, Zabbix, etc.)

Soft Skills  

  • Strong communication skills and ability to document clearly
  • Attention to detail, especially regarding compliance requirements
  • Ability to work independently, manage priorities, and meet operational SLAs
  • Proactive mindset with a drive to automate and improve platform

What's in it for You?

  • Opportunity to work in the booming field of cloud, data management and analytics alongside some of the brightest minds in the industry
  • Opportunity to work with cutting-edge technology
  • Chance to work with a rapidly expanding US tech company
  • Flexible schedule and paid time off
  • Competitive salary and benefits package

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application