Safety · MIT Technology Review
Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
Safety has long concerned AI companies, but the stakes amped up last year when Matthew Livelsberger, a US Army Green Beret, blew up a Cybertruck in front of the Trump International Hotel in Las Vegas.
Key facts
- As a research staff member at the MIT–IBM Watson AI lab from 2018 to 2020, Liu focused on AI safety and model debugging, which involves identifying errors in a machine-learning model’s behavior
- Safety has long concerned AI companies, but the stakes amped up last year when Matthew Livelsberger, a US Army Green Beret, blew up a Cybertruck in front of the Trump International Hotel in Las Vegas
- Casper began collaborating with Liu in 2024 after posting on the AI Alignment Forum, an online community for researchers, about robust unlearning and what it might mean for AI risk management
- The solution lies in the developing field known as “machine unlearning,” says Sijia Liu, a computer science professor at Michigan State University and a PI affiliated with the MIT–IBM Watson AI Lab
Summary
Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer. Developers can retrain large language models (LLMs) after removing undesirable training data such as unsafe or sensitive samples. The solution lies in the developing field known as “machine unlearning,” says Sijia Liu, a computer science professor at Michigan State University and a PI affiliated with the MIT–IBM Watson AI Lab. As a research staff member at the MIT–IBM Watson AI lab from 2018 to 2020, Liu focused on AI safety and model debugging, which involves identifying errors in a machine-learning model’s behavior and then correcting them.