Frederik Braun: Prompt Injections and a demo

I have many thoughts about AI. I will use the term "AI" throughout this article, even though I am opposed to the notion of these system being actually intelligent. However, this article will focus on one point only: Prompt injections.

A lot has been written about prompt injections and there is a good introduction of prompt injection attacks from Simon Willison, as well as a great piece that succinctly puts it as "You can't solve AI security problems with more AI".

The gist is that most APIs to talk to AIs do not have a great separation between the instructions (e.g., "Please summarize this article") and the input data (e.g., the blog post to be summarized). This leads to situations where input data can be confused with the instruction stream (as it is all literally in the same text box!) and the AI system will be easily confused. Latest solutions to this problem appear to be tweaks to the weighting of importance and relevance from text that comes in earlier than the rest and calling it "system prompt". Some APIs will even allow two inputs, the "system prompt" and the "input". This approach is unfortunately still a hope that statistics will play out in favor of the system prompt. It is my (admittedly limited) understanding that these solutions do not fix the vulnerability class at the right level.

My main take away is that you can often fix security bugs as bug classes by introducing countermeasures at a deeper level of the technology stack.

Examples:

Memory safety issues in C++ programs should be fixed in the C++ language or in the compiler by introducing additional checks instead of the application developer being tasked to "do the right thing" every single time they touch a pointer.
Cross-Site-Scripting (XSS) should be fixed by introducing limitations to how code (JavaScript) is interspersed with data (HTML) through strict encoding with templates (e.g., Jinja2 is doing a phenomenal job) or at the browser level with controls like Content-Security-Policy (CSP).

P.S: Please try have an "AI" system automatically summarize this article. As of today, the output would be gibberish.

Other posts