Job Description

Role: Sr. HPC Administrator
Desired Experience Range: 7 - 12 yrs
Notice Period: Immediate to 60 Days only
Location of Requirement: Bangalore
JOB DESCRIPTION
● Strong experience in providing support for Linux HPC clusters.
● Strong working knowledge on Following:
o IBM Platform LSF 9 and 10 administration.
o Redhat Enterprise Linux Administration.
o Lustre Parallel File system.
o Mellanox Infiniband Connectivity.
o Cluster Manager Administration (HPCM or x CAT)
o SSSD & NIS Authentication mechanisms.
o Bash & Python scripting.
o Ansible playbooks.
● Experience of Abaqus, and CFD application (Fluent and Star CCM..etc.,)
● Strong knowledge of application installations and version management on shared file systems.
● IT infrastructure Technical Operation Management under ITIL framework
● Security compliance and remediation management.
Intermediate Level
● Dev Ops, ITIL, Agile, Safe (certifications are desirable)
Responsibilities
● Installation, configuration, troubleshooting and administration of Linux HPC clusters (compute,
storage, and network) and applications in support of CAE environments.
● Monitor and analyze LSF job queues and resource utilization to optimize workload management.
● Troubleshoot and resolve any issues with LSF and its components, including master servers, compute
nodes, and resource managers.
● Collaborate with users to understand their HPC requirements and design LSF job workflows to meet
their needs.
● Develop and maintain LSF documentation, including standard operating procedures, installation
guides, and troubleshooting procedures.
● Develop and maintain LSF scripts for automation and task scheduling.
● Diagnose and troubleshoot complex RHEL OS, application and HPC cluster technical problems.
● Interact with hardware and software vendors for external support.
● Develop and maintain technical solution documents (TSD) and standard operating procedures(SOP).
● Keep all HPC infrastructure systems/servers/devices up to date and working condition to enhance
business continuity.
● Design and implement HPC network topology, including Mellanox connectivity.
● Create and maintain HPC capacity planning and periodical cluster utilization reports.
● Troubleshoot Abaqus, Star CCM+ and Fluent applications, and resolve any issues in a timely manner.
● Develop and maintain scripts for automation and task scheduling using Python and Bash scripting.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application