← Back to KHAO

AGI ·

To his credit, he raises the issue that human desires are manipulable in the section “Potential Cartesian Objections

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Alignment Forum think Jacob is suggesting that the AGI will autonomously develop a robust notion of self -empowerment, including “what it means for me (the AGI) to not get manipulated”, and then it can (somehow?) transfer that notion to humans.

Key facts

Summary

Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below). In this post the reporter will propose an explanation of how they humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps them brainstorm how they might put that distinction into AI. …But (spoiler alert) it turns out not to help, because the reporter will argue that they humans think about it in a deeply incoherent way, intimately tied to their scientifically-inaccurate intuitions around free will. The reporter jump from there into a broader review of every approach that the reporter can think of for writing a “True Name” for manipulation or things related to it (empowerment, agency, corrigibility, culpability, etc.

Read full article at Alignment Forum →

#AGI