What to do about prompt injection attack surfaces?

Simon Willison, an independent ai researcher who sits on the board of the Python software foundation, nicknames the combination of outside-content exposure, private-data access and outside-world communication the “lethal trifecta”. In June Microsoft quietly released a fix for such a trifecta uncovered in Copilot, its chatbot. The vulnerability had never been exploited “in the wild”, Microsoft said, reassuring its customers that the problem was fixed and their data were safe. But Copilot’s lethal trifecta was created by accident, and Microsoft was able to patch the holes and repel would-be attackers. … The safest thing to do is to avoid assembling the trifecta in the first place. Take away any one of the three elements and the possibility of harm is greatly reduced. If everything that goes into your AI system is created inside your company or acquired from trusted sources, then the first element disappears. … Other approaches involve constraining the LLMs themselves. In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.

Original post: gurupanguji.com