Job Description

About the Job

We are looking for an experienced L2 IT Infrastructure Operations Engineer to provide advanced technical support for our enterprise server and network infrastructure. This mid-level position bridges the gap between frontline support and expert-level engineering, handling escalated incidents, performing complex troubleshooting, and contributing to operational excellence. The ideal candidate will possess hands-on experience with Dell PowerEdge servers, Cisco networking equipment, and enterprise monitoring solutions. You will mentor L1 engineers, participate in change management activities, and collaborate with cross-functional teams to ensure high availability and performance of critical infrastructure in a 24x7 global environment.

Key Responsibilities

  • Provide advanced troubleshooting and fault isolation for escalated server and network incidents, utilizing iDRAC, Redfish, and Cisco CLI tools to diagnose and resolve complex issues.

  • Execute firmware, BIOS, and driver updates on Dell PowerEdge servers following standardized procedures, ensuring minimal service disruption and maintaining system stability.

  • Perform IOS/NX-OS firmware and software updates on Cisco routers and switches, adhering to change management protocols and conducting post-update validation.

  • Manage hardware break/fix procedures for server infrastructure, coordinating with Dell support for warranty claims, parts ordering, and scheduling on-site technician dispatch.

  • Conduct regular network health audits and performance analysis, identifying potential bottlenecks and recommending optimization measures to prevent service degradation.

  • Collaborate with the SRE team to enhance monitoring dashboards and refine alerting thresholds, ensuring proactive detection of infrastructure instability or security events.

  • Mentor and provide technical guidance to L1 engineers, conducting knowledge transfer sessions and assisting with complex ticket resolution to build team capability.

  • Participate in blameless post-mortems following major incidents, contributing to root cause analysis and implementing preventative actions to improve system reliability.

  • Maintain and update operational runbooks, network diagrams, and technical documentation to reflect current configurations and best practices.

  • Support hardware lifecycle management activities, including equipment provisioning, asset tracking, and coordination with vendors for hardware returns and repairs.

  • Provide 24x7 on-call support for critical escalations, ensuring rapid response to high-priority incidents affecting production systems.

  • Collaborate with the FTE IT Team Lead on capacity planning activities, providing data-driven insights on infrastructure utilization trends and growth projections.

  • Required Skills

  • Related field Experience with 5+ years of hands-on experience in enterprise IT infrastructure operations.

  • Strong proficiency with Dell PowerEdge server administration, including hardware troubleshooting, iDRAC/Redfish management, and firmware lifecycle management.

  • Solid experience with Cisco networking equipment (routers, switches), including IOS/NX-OS configuration, troubleshooting, and upgrade procedures.

  • Working knowledge of monitoring and logging tools, with the ability to create dashboards, configure alerts, and analyze performance metrics for proactive issue detection.

  • Excellent problem-solving abilities with demonstrated experience in incident management, root cause analysis, and implementing corrective actions in production environments.

  • Industry certifications such as Dell Server certifications or ITIL Foundation; ability to work rotating shifts in a 24x7 global support model.

  • Tools Required

  • Server & Hardware Tools: Dell iDRAC, Lifecycle Controller, OpenManage, and RAID/PERC utilities for server provisioning, firmware baselining, and remote management.

  • OS Deployment Tools: PXE boot infrastructure, iDRAC Virtual Media, Windows Server & Linux ISOs with hardening and automation scripts.

  • Network Tools: Cisco IOS CLI, PoE management, VLAN/QoS configuration tools, network monitoring, and bandwidth/latency testing utilities.

  • Automation & Operations Tools: Ansible, Python, CMDB systems, configuration backup tools, and documentation/diagramming platforms for global 24x7 operations.

  • Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application