Job Description

Requirements

  • You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)
  • You have a track record of developing current methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.
  • You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance.
  • You have robust software engineering skills.

What the job involves

  • Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop recent techniques to accurately measure our models’ performance on frontier capabilities.
  • In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to ...

Apply for this Position

Ready to join Deepstreamtech? Click the button below to submit your application.

Submit Application