Machine Learning Engineer

Red Hat, Inc.

📍 Boston, Massachusetts, United States

Full-time Computer Occupations Posted February 09, 2026

Apply Now Similar Jobs

Job Description

Design and implement high-performance Python and C++ code for vLLM-based inference systems, GPU kernels, and numerical methods.
*Telecommuting permitted: work may be performed within normal commuting distance from the Red Hat, Inc. office in Boston, MA.
What You Will Do:
Develop, test, and optimize LLM inference algorithms, including quantization and sparsification techniques, to improve latency, throughput, and memory use.
Conduct performance profiling and modeling on NVIDIA GPUs using tools such as Nsight, tune CUDA, Triton, or CUTLASS kernels for deep neural networks.
Participate in technical design reviews and propose innovative HPC solutions for large-scale model serving.
Review peer code promptly and leverage AI-assisted development tools to uphold code quality standards.
Collaborate with cross-functional AI, product, and research teams to deliver features to Red Hat AI Inference Platform.

Apply for this Position

Ready to join Red Hat, Inc.? Click the button below to submit your application.

Submit Application

Job Details

Location

Boston, Massachusetts, United States

Job Type

Full-time