What to do about prompt injection attack surfaces? // @gurupanguji

Simon Willison, an independent ai researcher who sits on the board of the Python software foundation, nicknames the combination of outside-content exposure, private-data access and outside-world communication the “lethal trifecta”. In June Microsoft quietly released a fix for such a trifecta uncovered in Copilot, its chatbot. The vulnerability had never been exploited “in the wild”, Microsoft said, reassuring its customers that the problem was fixed and their data were safe. But Copilot’s lethal trifecta was created by accident, and Microsoft was able to patch the holes and repel would-be attackers. … The safest thing to do is to avoid assembling the trifecta in the first place. Take away any one of the three elements and the possibility of harm is greatly reduced. If everything that goes into your AI system is created inside your company or acquired from trusted sources, then the first element disappears. … Other approaches involve constraining the LLMs themselves. In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.

Source: Why AI systems may never be secure, and what to do about it

As I ramp up my own knowledge about building AI systems, prompt injections continue to weigh on my mind. This is the most accessible definition of the problem I've seen yet. It's the compounding effects of the three prongs:

exposure to outside content
access to private data
access to external communication tools

Take one of the three out of the picture and you can dramatically reduce the surface area. Is that a clean environment, hell no. However, with things like CaMel, we are starting to figure out architectural ways to reduce the surface area of exploitation.

I have hope.

... and yes, hope's not a plan.