2026-02-19
new collab from @paradigm and @OpenAI: evmbench is a benchmark and agent harness for exploiting smart contract bugs a few months ago, the best models found <20% of critical, fund-draining @Code4rena bugs in our benchmark. today they find > 70% [video]
OpenAI
OpenAI and Paradigm announce EVMbench, a benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities
Making smart contracts safer by evaluating AI agents' ability to detect, patch, and exploit vulnerabilities in blockchain environments.
2026-02-18
new collab from @paradigm and @OpenAI: evmbench is a benchmark and agent harness for exploiting smart contract bugs a few months ago, the best models found <20% of critical, fund-draining @Code4rena bugs in our benchmark. today they find > 70% [video]
OpenAI
OpenAI and Paradigm announce EVMbench, a benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities
Making smart contracts safer by evaluating AI agents' ability to detect, patch, and exploit vulnerabilities in blockchain environments.