Safety · MIT Technology Review

Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer

Fri, Apr 10 · 1:14 AM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Grad student Stephen Casper is researching AI safeguards in Dylan Hadfield-Menell’s lab at CSAIL.

Safety has long concerned AI companies, but the stakes amped up last year when Matthew Livelsberger, a US Army Green Beret, blew up a Cybertruck in front of the Trump International Hotel in Las Vegas.

Key facts

As a research staff member at the MIT–IBM Watson AI lab from 2018 to 2020, Liu focused on AI safety and model debugging, which involves identifying errors in a machine-learning model’s behavior
Safety has long concerned AI companies, but the stakes amped up last year when Matthew Livelsberger, a US Army Green Beret, blew up a Cybertruck in front of the Trump International Hotel in Las Vegas
Casper began collaborating with Liu in 2024 after posting on the AI Alignment Forum, an online community for researchers, about robust unlearning and what it might mean for AI risk management
The solution lies in the developing field known as “machine unlearning,” says Sijia Liu, a computer science professor at Michigan State University and a PI affiliated with the MIT–IBM Watson AI Lab

Summary

Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer. Developers can retrain large language models (LLMs) after removing undesirable training data such as unsafe or sensitive samples. The solution lies in the developing field known as “machine unlearning,” says Sijia Liu, a computer science professor at Michigan State University and a PI affiliated with the MIT–IBM Watson AI Lab. As a research staff member at the MIT–IBM Watson AI lab from 2018 to 2020, Liu focused on AI safety and model debugging, which involves identifying errors in a machine-learning model’s behavior and then correcting them.

Read full article at MIT Technology Review →

#safety #trump