

Large language model developer Anthropic PBC today rolled out its newest Claude 4 frontier models, starting with Opus 4 and Sonnet 4, which the company said set new standards for coding, advanced reasoning and AI agents.
Opus is the company’s most powerful model yet, designed to sustain the performance of complex, long-running tasks, such that might take thousands of steps.
Anthropic said it is designed to power AI agents that can operate for multiple hours at a time. AI agents are a type of AI software that acts autonomously, with little or no human input. They can process information, make decisions and take action based on their own internal logic, understanding of the environment and a set goal.
“Opus 4 offers truly advanced reasoning for coding,” said Yusuke Kaji, general manager of AI at Rakuten Group Inc. “When our team deployed Opus 4 on a complex open-source project, it coded autonomously for nearly seven hours — a huge leap in AI capabilities that left the team amazed.”
Alex Albert, head of developer relations at Anthropic, told SiliconANGLE in an interview that the new version of Opus has driven significant benchmarks in how long it can maintain tasks.
“When you’re doing the tasks that Rakuten was doing, you can get the models to stretch that long, which is absolutely unbelievable,” Albert said. “When compared to the previous models, you could eke out maybe 30 minutes to an hour of coherent performance.”
With the new AI build, Albert said, Anthropic has seen the model perform even longer with internal testing.
A lot of this is because, under the hood, both models have received substantial improvements to memory training so that they do not need to rely as heavily on their context windows. This is the total amount of tokens, or data, that a large language model can consider when preparing a response.
“It’s able to write out to an external scratch pad, summarize its results and make sure it doesn’t get stuck,” Albert said. “So that when its memory has to be wiped again, it has some guides and sticky notes, basically, that it can refer back to.”
Sonnet 4 acts as a direct upgrade for Sonnet 3.7, providing a model designed for strict adherence to instruction while maintaining high performance with coding and reasoning.
Albert said Anthropic spent time training Claude Sonnet 4 so that it would be less likely to go off the beaten path like its predecessor. He described it as a “little bit over-eager.” The company made it a major focus to train Sonnet 4 to be more steerable and controllable, especially in coding settings.
“So, we’ve cut down on this behavior that we’ve called reward hacking by about 80% and reward hacking is this tendency to take shortcuts,” Albert said. “So maybe that’s like producing extra code to, like, satisfy all the tests when really it shouldn’t have.”
Both models are “hybrid models,” meaning that they are “thinking models,” capable of step-by-step reasoning or instant responses, depending on the desires of the user.
In addition to the new frontier models, Anthropic also announced new tools to accompany them, including the general availability of Claude Code, a new model specifically focused on agentic coding tasks. Previously only available in a beta preview. Claude Code is a tool that lives in a terminal, a code editor or is even available through a software development kit. It understands developer codebases and can assist with accelerating coding tasks through natural language prompts.
The company launched four new application programming interface capabilities through Anthropic API that will allow developers to build more powerful AI agents. These include a code execution tool, a connector for the Model Context Protocol, the Files API and the ability to cache prompts for up to one hour.
Both models have improved and extended tool use, such as web search, during extended thinking, allowing Claude to alternate between reasoning and tool usage.
In previous models, Albert said they would do all their reasoning up front and then call on tools. With the ability to alternate, they can reason, call a tool and then go back to reasoning. This opens up a whole new horizon for LLM capabilities.
Instead of providing raw thinking processes, Claude will now share user-friendly summaries. Anthropic said this will preserve visibility for users while better securing the models against potential adversarial attacks.
THANK YOU