Lead Solutions Architect AI Infrastructure & Private Cloud

📍 Bengaluru, Karnataka, India
Full-time Other-General Posted January 28, 2026
Apply Now Similar Jobs
Job Description

Location : Bengaluru
  PS- Global Competency Center
 Hewlett Packard Enterprise
 Job Title – Lead Solutions Architect – AI Infrastructure & Private Cloud
 Job Description:
 We are seeking an experienced Lead Solutions Architect with deep expertise in AI/ML
 infrastructure, High Performance Computing (HPC), and container platforms to join our
 dynamic team focused on delivering HPE Private Cloud AI and Enterprise AI Factory
 Solutions. This role is instrumental in architecting, deploying, and optimizing private cloud
 environments that leverage HPE's co-developed solutions with NVIDIA, as well as validated
 HPE reference architectures, to support enterprise-grade AI workloads at scale.
 The ideal candidate will bring strong technical expertise in AI infrastructure, container
 orchestration platforms, and hybrid cloud environments, and will play a key role in
 delivering scalable, secure, and high-performance AI platform solutions powered by HPE
 GreenLake and NVIDIA AI Enterprise technologies.
 Key Responsibilities:
 1. Leadership and Strategy:
   Provide delivery assurance and serve as the lead design authority to ensure
 seamless execution of Enterprise grade container platform —including Red
 Hat OpenShift and SUSE Rancher, HPE Private Cloud AI and HPC/AI
 solutions, fully aligned with customer AI/ML strategies and business
 objectives.
   Align solution architecture with NVIDIA Enterprise AI Factory design
 principles, including modular scalability, GPU optimization, and hybrid cloud
 orchestration.
   Oversee planning, risk management, and stakeholder alignment throughout
 the project lifecycle to ensure successful outcomes.
 
 2. Solution Planning and Design:
   Architect and optimize end-to-end solutions across container orchestration
 and HPC workload management domains, leveraging platforms such as Red
 Hat OpenShift, SUSE Rancher, and/or workload schedulers like Slurm and
 Altair PBS Pro.
   Ensure seamless integration of container and AI platforms with the broader
 software ecosystem, including NVIDIA AI Enterprise, as well as open-source
 DevOps, AI/ML tools, and frameworks.
 
 3. Opportunity assessment:
   Lead technical responses to RFPs, RFIs, and customer inquiries, ensuring
 alignment with business and technical requirements.
   Conduct proof-of-concept (PoC) engagements to validate solution feasibility,
 performance, and integration within customer environments.
   Assess customer infrastructure and workloads to recommend optimal
 configurations using validated reference architectures from HPE and strategic
 partners such as Red Hat, NVIDIA, SUSE, along with components from the
 open-source ecosystem.
 4. Innovation and Research:
   Stay current with emerging technologies, industry trends, and best practices
 across HPC, Kubernetes, container platforms, hybrid cloud, and security to
 inform solution design and innovation.
 
 5. Customer-centric mindset:
   Act as a trusted advisor to enterprise customers, ensuring alignment of AI
 solutions with business goals.
   Translate complex technical concepts into value propositions for stakeholders
 
 6. Team Collaboration:
   Collaborate with cross-functional teams, including subject matter experts in
 infrastructure components—such as HPE servers, storage, networking—and
 data science teams to ensure cohesive and integrated solution delivery.
   Mentor technical consultants and contribute to internal knowledge sharing
 through tech talks and innovation forums.
 
 Required Skills:
 1. HPC & AI Infrastructure
   Extensive knowledge of HPC technologies and workload scheduler such as
 Slurm and/or Altair PBS Pro,
   Proficient in HPC cluster management tools, including HPE Cluster Management
 (HPCM) and/or NVIDIA Base Command Manager.
   Experience with HPC cluster managers like HPE Cluster Management (HPCM)
 and/or NVIDIA Base Command Manager.
   Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and
 performance tuning of HPC components.
   Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet.
 2. Containerization & Orchestration
   Extensive hands-on experience with containerization technologies such as Docker,
 Podman, and Singularity
   Proficiency with at least two container orchestration platforms: CNCF Kubernetes,
 Red Hat OpenShift, SUSE Rancher (RKE/K3S), Canonical Charmed Kubernetes.
   Strong understanding of GPU technologies, including the NVIDIA GPU Operator for
 Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU
 health and performance monitoring.
 3.Operating Systems & Virtualization
   Extensive experience in Linux system administration, including package
 management, boot process troubleshooting, performance tuning, and network
 configuration.
   Proficient with multiple Linux distributions, with hands-on expertise in at least two of
 the following: RHEL, SLES, and Ubuntu.
   Experience with virtualization technologies, including KVM and OpenShift
 Virtualization, for deploying and managing virtualized workloads in hybrid cloud
 environments.
 4. Cloud, DevOps & MLOps
   Solid understanding of hybrid cloud architectures and experience working with major
 cloud platforms in conjunction with on-premises infrastructure.
   Familiarity with DevOps practices, including CI/CD pipelines, infrastructure as code
 (IaC), and microservices-based application delivery.
   Experience integrating and operationalizing open-source AI/ML tools and
 frameworks, supporting the full model lifecycle from development to deployment.
   Good understanding of cloud-native security, observability, and compliance
 frameworks, ensuring secure and reliable AI/ML operations at scale.
 5. Networking & Protocols
   Strong understanding of core networking principles, including DNS, TCP/IP, routing,
 and load balancing, essential for designing resilient and scalable infrastructure.
   Working knowledge of key network protocols, such as S3, NFS, and SMB/CIFS, for
 data access, transfer, and integration across hybrid environments.
 6. Programming & Automation
   Proficiency in scripting or programming languages such as Python and Bash.
   Experience automating infrastructure and AI workflows.
 7. Soft Skills & Leadership
 
   Excellent problem-solving, analytical thinking, and communication skills for engaging
 both technical and non-technical stakeholders.
   Proven ability to lead complex technical projects from requirements gathering
 through architecture, design, and delivery.
   Strong business acumen with the ability to align technical solutions with client
 challenges and objectives.
 Qualifications:
   Bachelor's/master's degree in computer science, Information Technology, or a
 related field.
   Professional certifications in AI Infrastructure, Containers and Kubernetes are highly
 desirable —such as RHCSA, RHCE, CNCF certifications (CKA, CKAD, CKS),
 NVIDIA-Certified Associate - AI Infrastructure and Operations
   Typically, 8–10 years of hands-on experience in architecting and implementing HPC,
 AI/ML, and container platform solutions within hybrid or private cloud environments,
 with a strong focus on scalability, performance, and enterprise integration.   
Apply for this Position

Ready to join ? Click the button below to submit your application.
Submit Application
Job Details

Location
Bengaluru, Karnataka, India
Job Type
Full-time