Job Description

Join a specialized project focused on evaluating large language models (LLMs) within professional UX and product design workflows. This role centers on identifying and mitigating contextual misuse risks in AI-generated outputs, such as privacy violations, manipulative patterns, and unsafe advice. You will assess LLM responses to prompts related to onboarding flows, consent mechanisms, healthcare and finance UX patterns, and accessibility. The position requires developing or refining scoring rubrics that address privacy risk, dark patterns, policy evasion, user harm, and clarity. You will deliver concise evaluations, including scores, rationales, and actionable recommendations for safer outputs, and design red-team prompts that simulate realistic misuse scenarios. This opportunity is ideal for experienced UX professionals with a strong background in regulated domains and a passion for advancing AI safety.

Responsibilities

  • Review and evaluate LLM outputs for a range...

Apply for this Position

Ready to join Twine? Click the button below to submit your application.

Submit Application