NIST · NIST AI
NIST Mathematical Proof Supports Transition to a Continuous-Monitor-and-Update Security Model for AI Systems
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
★ Tier-1 Source
Try as they might, they can never render AI completely unassailable using conventional security models.
Key facts
- In the peer-reviewed journal IEEE Security and Privacy, Apostol Vassilev, a senior scientist at the National Institute of Standards and Technology (NIST), has published a mathematical proof
- Paper: Apostol Vassilev, Robust AI Security and Alignment: A Sisyphean Endeavor
- In AI’s case, the “finite set of statements” is the group of guardrails an AI’s designer creates to keep the AI from doing something undesired
- You can’t escape Gödel in math, and in AI you likely can’t patch an AI system like an LLM and then expect to be OK forever
Summary
Can they make artificial intelligence impervious to adversaries who want to twist the technology to nefarious ends? The guardrails that govern an AI’s behavior are such a system, and one of the proof’s implications is that there will always be a way to prompt an AI system to disregard its rules, it’s a matter of finding it. “One of the pillars of responsible AI is that you want the technology to be secure,” said Vassilev, the proof’s author and an expert in adversarial machine learning. Companies that develop AI often acknowledge that the tools they are creating have the potential to cause harm in the physical world, so they build in constraints intended to stop AI from generating prohibited content such as deepfakes, malware or instructions for making biological weapons or illicit drugs.