AGI · Alignment Forum
To his credit, he raises the issue that human desires are manipulable in the section “Potential Cartesian Objections
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
Alignment Forum think Jacob is suggesting that the AGI will autonomously develop a robust notion of self -empowerment, including “what it means for me (the AGI) to not get manipulated”, and then it can (somehow?) transfer that notion to humans.
Key facts
- Here’s a modified excerpt from their Intuitive Self Models (ISM) series, summarizing a few key points from ISM Post 3: The Active Self
- …Needless to say, this whole intuitive ontology is pretty messed up, in the sense that nothing in it is a veridical, observer-independent accounting of what is happening in the real world ( ISM
- Another example is in Empowerment is (almost) All The team Need (2022)
- And indeed, it’s somewhat specific to mainstream western culture ( ISM §3.2 )
Summary
Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below). In this post the reporter will propose an explanation of how they humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps them brainstorm how they might put that distinction into AI. …But (spoiler alert) it turns out not to help, because the reporter will argue that they humans think about it in a deeply incoherent way, intimately tied to their scientifically-inaccurate intuitions around free will. The reporter jump from there into a broader review of every approach that the reporter can think of for writing a “True Name” for manipulation or things related to it (empowerment, agency, corrigibility, culpability, etc.