← Back to KHAO

Agentic ·

Speeding up agentic workflows with WebSockets in the Responses API

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

Diagram titled “A Codex agent loop in practice” showing an iterative flow between Codex and the Responses API, with tool calls (rg, sed, apply_patch, pytest) and results exchanged until the final message: “The bug has been fixed.”

By Brian Yu and Ashwin Nathan, Members of the Technical Staff.

Key facts

Summary

When you ask Codex to fix a bug, it scans through your codebase for relevant files, reads them to build context, makes edits, and runs tests to verify the fix worked. All of these requests can add up to minutes that users spend waiting for Codex to complete complex tasks. In the past, running LLM inference on GPUs was the slowest part of the agentic loop, so API service overhead was easy to hide. In this post, they'll explain how they made agent loops using the API 40% faster end-to-end, letting users experience the jump in inference speed from 65 to nearly 1,000 tokens per second. In the Responses API, previous flagship models like GPT‑5 and GPT‑5.2 ran at roughly 65 tokens per second (TPS).

Read full article at OpenAI →

#agentic