Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground

ETH-0,08%

In brief

  • EVMbench tests AI agents on 120 real-world Ethereum smart contract vulnerabilities.
  • Tool evaluates detection, patching, and exploitation across three distinct modes.
  • GPT-5.3-Codex achieved 72.2% success rate in exploit mode testing.

ChatGPT maker OpenAI and crypto-focused investment firm Paradigm have introduced EVMbench, a tool to help improve Ethereum Virtual Machine smart contract security. EVMbench is designed to evaluate AI agents’ ability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. Smart contracts are the heart of the Ethereum network, holding the code that powers everything from decentralized finance protocols to token launches. The weekly number of smart contracts deployed on Ethereum reached an all-time high of 1.7 million in November 2025, with 669,500 deployed last week alone, according to Token Terminal. 

EVMbench draws on 120 curated vulnerabilities from 40 audits, most sourced from open audit competitions such as Code4rena, according to an OpenAI blog post. It also includes scenarios from the security auditing process for Tempo, Stripe’s purpose-built layer-1 blockchain focused on high-throughput, low-cost stablecoin payments. Payments giant Stripe launched the public testnet for Tempo in December, saying at the time that it was being built with input from Visa, Shopify, and OpenAI, among others. The goal is to ground testing in economically meaningful, real-world code—particularly as AI-driven stablecoin payments expand, the firm added.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

EVMbench is meant to evaluate AI models across three modes: Detect, patch, and exploit. In “detect,” agents audit repositories and are scored on their recall of ground-truth vulnerabilities. In “patch,” agents must eliminate vulnerabilities without breaking intended functionality. Finally, in the “exploit” phase, agents attempt end-to-end fund-draining attacks in a sandboxed blockchain environment, with grading performed via deterministic transaction replay. In exploit mode, GPT-5.3-Codex running via OpenAI’s Codex CLI achieved a score of 72.2%, compared to 31.9% for GPT-5, which was released six months earlier. Performance was weaker in the detect and patch tasks, where agents sometimes failed to audit exhaustively or struggled to preserve full contract functionality. The ChatGPT makers’ researchers cautioned that EVMbench does not fully capture real-world security complexity. Still, they added that measuring AI performance in economically relevant environments is critical as models become powerful tools for both attackers and defenders. Sam Altman’s OpenAI and Ethereum co-founder Vitalik Buterin have previously been at odds over the pace of AI development. In January 2025, Altman said that his firm was “confident we know how to build AGI as we have traditionally understood it.” But Buterin advocated that AI systems should include a “soft pause” capability that could temporarily restrict industrial-scale AI operations if warning signs emerge.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

ETH 15-minute pump 1.31%: On-chain capital inflows and whale buying power are in sync, driving the rally

2026-04-09 15:30 to 2026-04-09 15:45 (UTC), the ETH price closed at 2219.86 USDT, with a range low of 2181.68 USDT. The 15-minute return was +1.31%, and the amplitude was 1.75%. During this period, market trading activity was active, attention increased significantly, and short-term volatility intensified. The main drivers of this unusual move are on-chain capital inflows and persistent buying by large whales. First, on-chain trading volume suddenly surged during the window above; the frequency of large transfers increased, indicating that institutions or major players entered quickly. At the same time, active addresses

GateNews15m ago

Bitcoin ETF Sees $159.62M Single-Day Outflow While Ethereum and Solana ETFs Continue Negative Trend

Gate News message, according to April 9 data, Bitcoin ETFs recorded a single-day net outflow of 2,242 BTC (valued at $159.62M), while showing a 7-day net inflow of 2,723 BTC ($193.89M). Ethereum ETFs experienced a single-day net outflow of 23,158 ETH ($50.48M), with 7-day net outflows reaching 22,90

GateNews38m ago

Ethereum developer Joe Schiarizzi runs for Virginia congressional seat as a Democrat

Gate News update: On April 9, Ethereum developer Joe Schiarizzi is running for Congress in Virginia as a Democrat. Joe Schiarizzi says he is against Trump and argues that cryptocurrencies should focus on public-interest use cases, not be used for political profit. He also criticizes some lawmakers who support cryptocurrencies as opportunists, saying these people have no real interest in the underlying crypto technology.

GateNews58m ago

Grayscale transferred 5,322 ETH and 155.649 BTC to a certain CEX, with a total value of over $22 million

Gate News update, April 9. Arkham monitoring shows that about an hour ago, Grayscale transferred a total of 5,322 ETH (worth $11.6 million) and 155.649 BTC (worth $11.07 million) to a certain CEX Prime address.

GateNews1h ago

“Maji big brother” Huang Licheng closes out an HYPE long position; the Ethereum long position is increased to 12,300 ETH.

Gate News message, on April 9, Hyperbot data shows that the HYPE long position newly opened by Big Brother A-ma JI — Huang Licheng — has been fully closed out within the past 20 minutes. Currently, he is still continuously adding to his 25x leveraged Ethereum long position; he holds 12,300 ETH, with a position value of $26.61M, and a liquidation price of $2,081.

GateNews2h ago

Etherscan contract tab redesign, adds an IDE-style code browser

Etherscan announced a redesign of the contract tabs page, adding an IDE-style code browser with features like file tree navigation, multiple-file tabbed browsing, and cross-file search. It also updated the design of the read/write contract interface; users can switch to full screen and edit contract code directly in Blockscan and Ethereum Remix.

GateNews2h ago
Comment
0/400
No comments