Job Description
Requirements
- You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)
- You have a track record of developing current methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.
- You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance.
- You have robust software engineering skills.
What the job involves
- Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop recent techniques to accurately measure our models’ performance on frontier capabilities.
- In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to ...
Apply for this Position
Ready to join Deepstreamtech? Click the button below to submit your application.
Submit Application