Job Description

Job description

This project will investigate attacks on large language models (LLMs), a major recent development in artificial intelligence that has already seen many integrations into public life. If these LLMs can be triggered into providing malicious output, this may have disastrous consequences, leading to the generation of harmful content, the execution of malicious code on connected devices, or the abuse of limited resources. The idea is to assess the resistance of these models against new attacks, using techniques coming from the domain of AI and optimisation, and develop methods to defend against such harms by leveraging cryptographic approaches. To this end, you will:

1. Investigate how to adapt existing adversarial attacks for image classification and other domains against existing open source LLM (e.g., Llama 3, Phi-3), as well as develop new kinds of attacks, for example based on evolutionary algorithms.
2. Investigate to what extent data ...

Apply for this Position

Ready to join Academic Positions? Click the button below to submit your application.

Submit Application