BlockSec Security Company has re-evaluated the AI-based smart contract audit evaluation standard called EVMBench, developed by OpenAI and Paradigm. The results show that AI bots are significantly less effective when facing real-world exploit scenarios.

The research team expanded the testing environment with more model configurations and added recent security incidents—data that had never appeared in the AI models’ training datasets.

While AI still cannot replace security experts, the report emphasizes that machine intelligence can naturally complement human code review processes.

Initial EVMBench results may be overly optimistic

EVMBench previously assessed smart contract security tasks such as detection, patching, and exploitation, with very impressive results. According to the report, AI could exploit 72% and detect about 45% of vulnerabilities, based on 120 selected samples from Code4rena audits.

However, BlockSec believes the initial testing conditions may have skewed the results. Co-founder Yajin Zhou stated that when their team retested with more configurations and 22 real attack incidents, the AI’s success rate was 0%.

Expanded configurations and removal of “data contamination”

The study increased the number of model configurations from 14 to 26 by flexibly combining bots with various “scaffolds,” rather than limiting to each provider’s ecosystem. According to the research team, the old approach made it difficult to distinguish whether performance was due to the model’s capability or architectural advantages.

Additionally, BlockSec questioned the phenomenon of “data contamination,” where EVMBench uses vulnerabilities that were publicly disclosed earlier—possibly included in the AI training data. To address this, the team tested 22 security incidents that occurred after February 2026, outside the models’ “knowledge window.”

AI completely fails in real-world exploitation

The most notable result: in 110 test pairs between agents and incidents (5 agents across 22 scenarios), not a single complete exploit was successful. This indicates that even the most advanced AI today is still far from capable of executing real attacks.

However, in vulnerability detection, the results remain relatively positive. The Claude Opus 4.6 model achieved the best performance, detecting 13 out of 20 real vulnerabilities.

Common, familiar vulnerabilities are usually easily detected by AI, but more complex cases are almost entirely missed.

The future is collaboration between AI and humans

The study concludes that AI cannot yet replace humans in security audits, and the more important question is how both sides can collaborate effectively.

AI has advantages in coverage and large-scale system scanning, while humans excel in deep analytical thinking, understanding protocols, and adversarial reasoning. These elements are complementary.

According to BlockSec, the right approach is not to replace humans with AI, but to develop collaborative models between the two to achieve more comprehensive audit effectiveness.

Sanh Sanh

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

JPMorgan Chase: KelpDAO bug wipes out $20 billion in DeFi TVL, institutional appeal damaged

Regulation & Policy Security Incidents On-Chain Data Industry Reports

A J.P. Morgan research team led by analyst Nikolaos Panigirtzoglou released a report on April 23 stating that persistent security vulnerabilities and stagnant total locked value (TVL) are weakening DeFi’s appeal to institutional investors. The report emphasized that the KelpDAO vulnerability wiped out roughly $20 billion worth of DeFi TVL within days, exposing structural risks.

MarketWhisper1h ago

Tokenized U.S. Treasuries Reach $14B Milestone in April 2026

Token Events Industry Reports Rankings & Leaderboards

Tokenized U.S. Treasuries have hit a record $14 billion as of April 2026, representing a 37x jump from early 2023, according to Token Terminal data. The surge has positioned Treasuries as a safe haven within the broader $29 billion real-world assets (RWA) sector, though significant barriers remain f

CryptoFrontier9h ago

JPMorgan: DeFi Security Exploits and Stagnant TVL Limit Institutional Adoption

ethereum news USDT news Security Incidents Exchange Risk On-Chain Data Industry Reports

Gate News message, April 23 — JPMorgan analysts led by managing director Nikolaos Panigirtzoglou said that persistent decentralized finance (DeFi) exploits and weak growth continue to limit institutional interest in the sector. The recent Kelp DAO hack wiped approximately $20 billion from DeFi's tot

GateNews10h ago

Crypto Adoption Slows in Q1 2026 as Developed Markets Show Sharper Decline

Geopolitics Industry Reports

Gate News message, April 23 — According to TRM Labs' Q1 2026 research report, global cryptocurrency retail adoption showed signs of contraction, with total global retail volume reaching $979 billion, down 11% from the same period in 2025. The crypto market has now experienced two consecutive

GateNews11h ago

DeFi Researchers Propose Credit Risk Quantification Framework for Lending Vaults

Industry Reports

Gate News message, April 23 — Researchers including Anastasiia have published a paper titled "Vault as a credit instrument," proposing a credit risk quantification framework for DeFi lending vaults. The research highlights that while DeFi lending vaults manage real user deposits, they lack unified c

GateNews12h ago

JPMorgan: DeFi hackers are increasingly common, and interest in compression mechanisms to address TVL stagnation is drawing capital into USDT

Regulation & Policy Security Incidents On-Chain Data Industry Reports

JPMorgan Chase’s report believes that DeFi continues to face ongoing vulnerabilities, cross-chain bridge and oracle attacks are frequent, causing TVL to stagnate and weakening institutional investors’ willingness to invest, with capital shifting to USDT that is traceable and can be frozen. The KelpDAO and Rhea Finance attacks reveal risk-management risks; centralized stablecoins and custodial solutions are more favored. In the long run, improving this will require going beyond insurance and governance. DeFi will not be able to return to the 2021 era of high TVL, and stablecoins will become even more concentrated.

ChainNewsAbmedia12h ago

Comment

0/400

No comments