GPT-5.6 gets better at cybersecurity

OpenAI has started rolling out the GPT-5.6 series models in limited preview to a small group of trusted partners through the API and Codex. The series includes Sol as the flagship model, Terra as a balanced option, and Luna as the fastest and most cost-efficient model. The rollout is being coordinated with the U.S. government before expanding to ChatGPT, Codex, and API users in the coming weeks.

GPT-5.6 models

“GPT-5.6 Sol launches with our most robust safety stack to date. We strengthened protections for higher-risk activity, sensitive cyber requests, and repeated misuse, and spent multiple weeks finding weaknesses, pressure-testing our system, and hardening it against real-world attacks,” the company said.

Key capabilities

Sol introduces improved agentic capabilities for coding, biology, and cybersecurity. OpenAI also published a system card, a technical report that explains what the model can do, how it was tested, the risks identified, the safeguards added, and its known limitations.

GPT-5.6 introduces max reasoning effort and ultra mode, which uses subagents to speed up complex tasks. In coding, Sol tops the Terminal-Bench 2.1 benchmark, which evaluates command-line workflows requiring tool coordination, planning, and iteration. The model uses fewer tokens for biology workflows.

In cybersecurity, GPT-5.6 advances the performance-efficiency frontier on long-horizon security tasks, including vulnerability research and exploitation.

Safety and safeguards

OpenAI says it developed safeguards tailored to each model’s capabilities. The goal is to make prohibited offensive activity more difficult, uncertain, and detectable while preserving legitimate uses.

Sol can identify security flaws and components of an exploit, but in OpenAI’s tests it could not carry out a complete cyberattack on its own. The company notes that no evaluation can cover every real-world scenario.

GPT-5.6 uses multiple layers of safety instead of relying on a single safeguard. The model is trained to refuse prohibited cyber and biology assistance, even when users attempt to disguise their intent. Responses are screened for potentially harmful content during generation, and high-risk requests may be paused for review by a more capable model before they are delivered.

OpenAI monitors patterns of misuse across accounts to distinguish malicious activity from legitimate security research. During the preview, some legitimate requests may be blocked or delayed while these safeguards are tested and refined.

“We are also working with enterprise customers on longer-term approaches—including privacy-preserving detection, customer-operated safety controls, and access calibrated to the risk of a customer, user, or workload—to advance safety while supporting enterprise privacy requirements,” OpenAI continued.

Red teaming and security testing

To test the models’ safeguards, OpenAI conducted automated red teaming to find universal jailbreaks that work across many prompts and contexts. The testing explored attack patterns beyond what human testing alone could cover, helped identify failure patterns earlier, and shortened the time needed to address newly discovered weaknesses.

The company worked with third-party experts to conduct human red teaming, testing the models with creative attack techniques that automated systems might not anticipate.

AI security lab Irregular evaluated GPT-5.6 Sol on real-world offensive security benchmarks and found that it performs slightly better than GPT-5.5, particularly on longer, more complex hacking tasks. The model discovered previously unknown vulnerabilities in widely used software and mobile devices, while continuing to struggle with well-defended targets and complete end-to-end attacks.

More about

GPT-5.6 gets better at cybersecurity

Key capabilities

Safety and safeguards

Red teaming and security testing

Featured news

Resources

Don't miss