Job Description

Principal AI Engineer – Agentic AI, System Architecture & Data Science

Experience: 13+ years

About the Team

The AI Center of Excellence team includes Data Scientists and AI Engineers who work together to conduct research, build prototypes, design features, and deliver production-grade AI systems at scale. Our mission is to leverage the best available technology—including advanced ML, LLMs, and agentic AI systems—to protect our customers’ attack surfaces.

We partner deeply with Detection and Response teams, including our MDR service, to embed AI into real-world security workflows. Our work builds on more than 20 years of threat intelligence, deep domain expertise, and a growing patent portfolio. We operate in ambiguous problem spaces and value technical rigor, strong ownership, and principled decision-making.

As a Principal engineer, you will define and own the system architecture for AI platforms and services across the organization.

The technologies we use include

  • Python for large-scale data science, modeling, and experimentation


  • Jupyter notebooks (local & remote)


  • Classical ML using scikit-learn


  • Deep learning for NLP and sequence-based security problems


  • Anomaly detection and behavioral modeling


  • LLM / GenAI toolchains: HuggingFace, Transformers, LangChain, LangGraph


  • Agentic AI systems: multi-agent orchestration, tool-calling, reasoning, memory, evaluation


  • RAG pipelines using vector databases


  • AWS cloud ecosystem: Bedrock, SageMaker, Lambda, EKS, S3, DynamoDB, Redshift, Kinesis


  • CI/CD for ML & LLM systems (GitHub Actions, Jenkins)


  • Model registry, versioning, drift detection, and retraining frameworks


  • Observability & operations: CloudWatch, Prometheus, Grafana, PagerDuty


  • Infrastructure as Code using Terraform


  • About the Role

    Rapid7 is seeking a Principal AI Engineer – Data Science who brings deep system design and architectural leadership to our AI organization.

    This role sits at the intersection of data science, large-scale distributed systems, agentic AI, and cloud-native architecture. You will be responsible not just for building models, but for designing the end-to-end systems that make AI reliable, scalable, secure, and operable in production.

    This role is ideal for someone who:


  • has designed complex, distributed AI systems end to end,


  • understands how data, models, infrastructure, and services interact, and


  • can make long-term architectural decisions under real-world constraints.


  • In this role, you will


  • Own the system architecture for AI, ML, LLM, and agentic AI platforms across multiple teams



  • Design end-to-end AI system architectures, including:

    data ingestion and streaming pipelines


    feature stores and offline/online data paths


    model training, fine-tuning, and evaluation


    inference services, APIs, and microservices


    monitoring, alerting, and incident response workflows



  • Define reference architectures and design patterns for:

    LLM orchestration and agentic workflows


    RAG systems and vector retrieval


    secure and scalable inference


  • Lead architectural reviews and act as the final technical authority on AI system design decisions


  • Make trade-offs across accuracy, latency, cost, scalability, security, and reliability



  • Establish architectural standards for:

    model registry and lifecycle management


    drift detection and retraining


    LLM evaluation, guardrails, and governance


  • Ensure AI systems comply with cloud security best practices (IAM, KMS, VPC, secrets)


  • Serve as the escalation point for complex production incidents


  • Mentor Staff and Senior engineers on system design and architectural thinking


  • Influence product roadmaps and long-term AI platform investments


  • The skills you’ll bring include

    Core (Required)


  • 13+ years of experience in Data Science, ML Engineering, or Applied AI


  • Proven experience designing and architecting large-scale AI systems



  • Strong background in:

    data acquisition, cleaning, enrichment, and transformation


    feature engineering for structured and unstructured data


    supervised and unsupervised ML


    deep learning (NLP, CNNs, RNNs, sequence models)


  • Experience with model explainability (SHAP, LIME)



  • Hands-on experience with security-focused ML models:

    malware detection


    malware behavior-based models


    user behavioral analytics


  • Exceptional ability to reason at the system and architecture level


  • Agentic AI & LLM Systems (Strongly Required)



  • Deep hands-on experience with:

    LLM orchestration (LangChain, LangGraph)


    agentic and multi-agent architectures


    RAG pipelines and vector databases


    prompt engineering at scale


    LLM evaluation frameworks (Promptfoo, HELM)


    fine-tuning approaches (LoRA, PEFT)


  • Designing robust guardrails, governance, and evaluation frameworks for LLM systems


  • Understanding of failure modes and risks in autonomous and agentic AI systems


  • System Design, Cloud & MLOps (Strongly Required)


  • Strong experience designing distributed systems and microservice architectures



  • Expertise in:

    model registry and versioning (MLflow, SageMaker)


    drift detection and automated retraining


    monitoring and observability (CloudWatch, Prometheus, Grafana)


    incident management and on-call leadership (PagerDuty)



  • Deep AWS experience:

    Bedrock, SageMaker, Lambda, EKS


    data storage systems (S3, DynamoDB, Redshift, Kinesis)


    cloud security (IAM, KMS, Secrets Manager, VPCs)



  • Working knowledge of:

    Docker and Kubernetes


    CI/CD pipelines for ML/LLM workloads


    Infrastructure as Code using Terraform


  • Experience with the following would be advantageous


  • Architecting internal AI platforms used by multiple product teams


  • Defining AI governance, risk, and compliance frameworks


  • Operating AI systems in high-scale or regulated environments


  • Important clarification

    A Principal AI Engineer at Rapid7:


  • owns architecture, not just code


  • sets standards others follow


  • is accountable for the long-term technical health of AI systems across teams


  • If a candidate has not designed and defended system architectures for production AI platforms, they are not a fit for this role, regardless of individual modeling strength.

    Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application