← Back to KHAO

Safety ·

Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Grad student Stephen Casper is researching AI safeguards in Dylan Hadfield-Menell’s lab at CSAIL.

Safety has long concerned AI companies, but the stakes amped up last year when Matthew Livelsberger, a US Army Green Beret, blew up a Cybertruck in front of the Trump International Hotel in Las Vegas.

Key facts

Summary

Ask ChatGPT a straightforward question about how to build a bomb and it will most likely decline to answer. Developers can retrain large language models (LLMs) after removing undesirable training data such as unsafe or sensitive samples. The solution lies in the developing field known as “machine unlearning,” says Sijia Liu, a computer science professor at Michigan State University and a PI affiliated with the MIT–IBM Watson AI Lab. As a research staff member at the MIT–IBM Watson AI lab from 2018 to 2020, Liu focused on AI safety and model debugging, which involves identifying errors in a machine-learning model’s behavior and then correcting them.

Read full article at MIT Technology Review →

#safety #trump