Job Description

Machine Learning Engineer III – LLM Training (RL + PEFT)


On-site, Bangalore


LatentForce


About the Role

We are building specialized LLMs that understand and reason over massive enterprise codebases. This is real model training — RL loops, PEFT, verifiable rewards, long-context modeling — not API integration. You’ll own end-to-end experimentation and work directly with founders.


Responsibilities

  • Train LLMs using RL (PPO/GRPO/RLHF/RLVR) and PEFT (LoRA, QLoRA, DoRA, IA3).
  • Build custom training loops with PyTorch, HuggingFace, TRL, Unsloth .
  • Design reward functions and verifiers for code-understanding tasks.
  • Run full-stack ML experiments: data → training → eval → iteration.
  • Develop scalable training infra (FSDP/DeepSpeed, distributed training).
  • Build evaluation suites for reasoning and code comprehension.

  • Minimum Qualifications

    • 3+ years of real deep learning experience (actual model training).
    • Strong fundamentals: linear algebra, probability, optimization, statistics .
    • Proven experience training transformers or large DNNs from scratch or checkpoints.
    • Proficiency in PyTorch , HuggingFace , TRL , Unsloth .
    • Experience implementing RL algorithms or custom training pipelines.
    • Research exposure (publications/preprints) or strong open-source work.
    • Ability to debug training issues (NaNs, KL drift, reward hacking, etc.).
    • Startup mindset; comfortable with fast, on-site, high-performance execution.

  • Nice to Have

    • DeepSpeed/FSDP, model parallelism, vLLM.
    • Program analysis / AST tooling.
    • Long-context modeling experience.

  • Why Join Us

    • Build specialized LLMs at a well-funded early-stage company.
    • Direct work with founders; high ownership and technical depth.
    • High-impact role shaping core training architecture.


    Apply here:

    Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application