Job Description
About the Job
We are looking for an experienced L2 IT Infrastructure Operations Engineer to provide advanced technical support for our enterprise server and network infrastructure. This mid-level position bridges the gap between frontline support and expert-level engineering, handling escalated incidents, performing complex troubleshooting, and contributing to operational excellence. The ideal candidate will possess hands-on
experience with Dell PowerEdge servers, Cisco networking equipment, and enterprise monitoring solutions. You will mentor L1 engineers, participate in change management activities, and collaborate with cross-functional teams to ensure high availability and performance of critical infrastructure in a 24x7 global environment.
Key Responsibilities-
Provide advanced troubleshooting and fault isolation for escalated server and network incidents, utilizing iDRAC, Redfish, and Cisco CLI tools to diagnose and resolve complex issues.
Execute firmware, BIOS, and driver updates on Dell PowerEdge servers following standardized procedures, ensuring minimal service disruption and maintaining system stability.
Perform IOS/NX-OS firmware and software updates on Cisco routers and switches, adhering to change management protocols and conducting post-update validation.
Manage hardware break/fix procedures for server infrastructure, coordinating with Dell support for warranty claims, parts ordering, and scheduling on-site technician dispatch.
Conduct regular network health audits and performance analysis, identifying potential bottlenecks and recommending optimization measures to prevent service degradation.
Collaborate with the SRE team to enhance monitoring dashboards and refine alerting thresholds, ensuring proactive detection of infrastructure instability or security events.
Mentor and provide technical guidance to L1 engineers, conducting knowledge transfer sessions and assisting with complex ticket resolution to build team capability.
Participate in blameless post-mortems following major incidents, contributing to root cause analysis and implementing preventative actions to improve system reliability.
Maintain and update operational runbooks, network diagrams, and technical documentation to reflect current configurations and best practices.
Support hardware lifecycle management activities, including equipment provisioning, asset tracking, and coordination with vendors for hardware returns and repairs.
Provide 24x7 on-call support for critical escalations, ensuring rapid response to high-priority incidents affecting production systems.
Collaborate with the FTE IT Team Lead on capacity planning activities, providing data-driven insights on infrastructure utilization trends and growth projections.
Required Skills-
Related field experience with 5+ years of hands-on experience in enterprise IT infrastructure operations.
Strong proficiency with Dell PowerEdge server administration, including hardware
troubleshooting, iDRAC/Redfish management, and firmware lifecycle management.
Solid experience with Cisco networking equipment (routers, switches), including IOS/NX-OS configuration, troubleshooting, and upgrade procedures.
Working knowledge of monitoring and logging tools, with the ability to create dashboards, configure alerts, and analyze performance metrics for proactive issue detection.
Excellent problem-solving abilities with demonstrated experience in incident management, root cause analysis, and implementing corrective actions in production environments.
Industry certifications such as Dell Server certifications or ITIL Foundation; ability to work rotating shifts in a 24x7 global support model.
Tools Required-
Server & Hardware Tools: Dell iDRAC, Lifecycle Controller, OpenManage, and RAID/PERC utilities for server provisioning, firmware baselining, and remote management.
OS Deployment Tools: PXE boot infrastructure, iDRAC Virtual Media, and Windows Server and & Linux ISOs with hardening and automation scripts.
Network Tools: Cisco IOS CLI, PoE management, VLAN/QoS configuration tools, network monitoring, and bandwidth/latency testing utilities.
Automation & Operations Tools: Ansible, Python, CMDB systems, configuration backup tools, and documentation/diagramming platforms
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application