๐๐๐-๐.๐ ๐๐ฅ๐ฆ๐จ๐ฌ๐ญ ๐ญ๐จ๐ฉ๐ฌ ๐๐ฅ๐๐ฎ๐๐ ๐.๐ ๐จ๐ง ๐๐จ๐๐ข๐ง๐ ?! New eval dropping using our #1 SWE-bench coding agent! - GPT-4.1 beats Gemini 2.5 Pro and almost tops Claude 3.7 Sonnet! - Even GPT-4.1 mini matches Claude 3.5 Sonnet V2 performance. It was the top model just 2mo ago!

The evaluation is done through our proprietary codebase understanding benchmark AugmentQA. You can learn more at: Try our agent yourself at:
