AI Agent · Anthropic · Claude · SpaceX · Elon Musk · AMD · The Register
Deploying AI to click around on a website burns 45x as many tokens as just deploying APIs
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
For AI agents, seeing is expensive.
Key facts
- Anthropic estimates that processing a 1000×1000-pixel image with Claude Sonnet 4.6 uses about 1,334 tokens
- The vision agent expended around 500,000 input tokens and around 38,000 output tokens to complete its task
- When the prompt was revised to help the vision model perform better, the vision agent still took ~17 minutes, significantly longer than the API agent at ~20 seconds
- An API agent here refers to Claude Sonnet interacting with a web app via tools and APIs
Summary
Businesses deploying AI agents to automate computer usage may be spending far more money than necessary if those agents try to emulate human visual interaction. Reflex, an enterprise application platform, recently set out to compare vision agents with API agents. A vision agent in this context refers to an AI agent that mimics human interaction by relying on image processing and optical character recognition to operate an application. Musk has never built a wafer fab, but he wants to burn $119B on one anyway.