Job Description

ROLE AND RESPONSIBILITIES:


The candidate is expected to own the operational stability and performance of hybrid cloud infrastructure (AWS, GCP and Nutanix). This involves leading automation efforts, architecting for reliability, and acting as the final escalation point for critical incidents to ensure the platform is scalable and efficient.



Cloud Platform Engineering


Architect and deploy enterprise-scale, highly available multi-cloud solutions across AWS and GCP with multi-region/multi-account strategies

Expert-level proficiency with AWS CLI, GCP CLI, SDK, boto3, and Python for advanced automation and infrastructure orchestration

Design AWS Organizations and GCP Organization hierarchies with consolidated billing, IAM policies, and centralized governance

Configure and manage AWS Systems Manager (SSM) including Session Manager, Run Command, State Manager, and Automation for centralized fleet operations

Implement centralized logging using CloudWatch/CloudTrail and GCP Cloud Logging with S3/Cloud Storage aggregation

Integrate AWS and GCP with Splunk using HEC, CloudWatch subscriptions, Pub/Sub, Dataflow, and cloud-specific add-ons for SIEM correlation

Design and deploy advanced load balancing solutions with AWS ALB/NLB/ELB and GCP Cloud Load Balancing including SSL termination and auto-scaling

Develop infrastructure-as-code using Terraform, CloudFormation, CDK for repeatable multi-cloud deployments and CI/CD pipelines

Configure AWS SSO, cross-account IAM roles, GCP Workload Identity, and federated access for centralized identity management

Design VPC architectures with AWS Transit Gateway/PrivateLink and GCP Shared VPC/VPC peering for hybrid connectivity

Manage containerized workloads using EKS, GKE, ECS, Cloud Run with service mesh, observability, and security best practices

Implement disaster recovery using AWS Backup, Cross-Region Replication, GCP snapshots, and multi-region failover strategies

Troubleshooting using CloudWatch Insights, GCP Cloud Trace, VPC Flow Logs, X-Ray, and vendor support escalation

Perform cost optimization through Reserved Instances, Committed Use Discounts, rightsizing, and automated resource lifecycle management



System Administration


Administer and support Windows Server and Unix/Linux environments in production and non-production settings

Perform OS-level hardening, patch management, and security compliance across heterogeneous systems

Automate routine administrative tasks using PowerShell, Bash, Python, or similar scripting languages

Manage GitHub organization settings, user permissions, repository access controls, and monitor GitHub Actions workflows and repository health across multiple teams

Configure Splunk forwarders, heavy forwarders and other integrations for data ingestion from cloud and on-premises sources


Nutanix Platform Management


Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and Prism Central for multi-cluster management

Expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations, troubleshooting, and automation

Develop automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform for infrastructure-as-code

Create and manage VM templates, golden images, and standardized deployment catalogs for consistent provisioning

Design disaster recovery solutions using Leap, Protection Domains, cross-cluster replication, and metro clustering

Implement network micro-segmentation using Nutanix Flow and configure RBAC, encryption, and security hardening

Lead L3 troubleshooting using advanced diagnostics, log analysis (CVM, Genesis), NCC health checks, and cluster service resolution

Configure high availability, VM affinity rules, QoS policies, and optimize performance for mission-critical workloads

Manage AHV networking with OVS bridges, VLANs, bonds, LACP and implement resource reservations and workload balance.

Design, deploy, and maintain hybrid cloud infrastructure across Nutanix HCI, AWS, and GCP platforms

Architect and implement multi-cloud solutions ensuring high availability, scalability, and disaster recovery



PERSONAL AND PROFESSIONAL QUALIFICATIONS:


10+ years infrastructure experience in enterprise cloud (AWS and GCP)

Nutanix HCI or other virtualization (VMware) experience is a plus

Expert-level skills in Python, PowerShell, Bash scripting, infrastructure-as-code (Terraform/CloudFormation), and container orchestration (Kubernetes, EKS/GKE)

Proven experience managing enterprise-scale environments, hybrid cloud migrations, disaster recovery, and L3 critical incident management

Good networking knowledge (TCP/IP, VLANs, routing, VPN), security hardening, and compliance frameworks (ITIL)

Self-motivated continuous learner committed to staying current with evolving cloud technologies and automation opportunities

Available for on-call rotations with strong documentation skills and customer service orientation

Certifications (Mandatory): AWS Certified solution Architect, AWS Certified SysOps Administrator,

Certifications (plus): AWS Solutions Architect Professional, AWS DevOps Professional, GCP Professional Cloud Architect, Terraform, Nutanix NCP/NCAP, Redhat Certified Engineer (RHCE), Windows Server Hybrid Administrator Associate



EDUCATION:


Bachelor’s or master’s degree in computer science/IT

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application