GPT-4.1: SWE improvements!
April 14, 2025 @ 12 PM
OpenAI’s GPT-4.1 has some pretty standout results on SWE-bench Verified, Aider’s polyglot diff, and this leads to immediate adoption by top AI coding IDEs like Windsurf and Cursor.
SWE-bench: Real-World Coding, Real Progress
- GPT-4.1 scores 54.6% on SWE-bench Verified, up from 33.2% for GPT-4o and 38% for GPT-4.5—a 21.4% absolute gain over 4.0 (OpenAI).
- SWE-bench tests the model’s ability to solve real software engineering tasks in open-source Python repos, including bug fixes and feature additions.
- This jump means GPT-4.1 is much better at exploring codebases, generating code that runs, and passing tests—crucial for devs building agents or automation.
Aider Polyglot: Multi-Language, Diff-Based Coding
- Aider’s polyglot diff benchmark measures how well models handle code changes across multiple languages and output only the necessary diffs.
- GPT-4.1 more than doubles GPT-4o’s score and beats GPT-4.5 by 8% absolute (OpenAI).
- The model is specifically trained to follow diff formats more reliably, saving time and reducing merge conflicts for devs who rely on precise, minimal code changes.
- For those who prefer full file rewrites, GPT-4.1’s output token limit is now 32,768—double that of GPT-4o.
Windsurf: Real-World IDE Impact
- Windsurf now supports GPT-4.1, free for all users from April 14–21 (Windsurf Changelog, Reddit).
- Performance gains:
- 60% higher coding accuracy vs. GPT-4o
- 30% more efficient tool calling
- 50% fewer repeated/unnecessary edits
- 40% fewer unnecessary file reads
- 70% fewer incorrect file modifications
- 50% less verbosity (details).
- These improvements mean faster iteration, smoother workflows, and more accepted code changes on first review for engineering teams.
Cursor: GPT-4.1 Now Live
- Cursor IDE has added GPT-4.1 as a selectable model—just enable it in Settings → Models (Cursor Forum, Reddit).
- It’s currently free to try, letting users experience the new coding and tool-calling capabilities firsthand.
- Cursor’s integration means developers can leverage GPT-4.1’s improved multi-language, diff, and instruction-following skills in a familiar, VS Code-like environment (Dr. Lee's Blog).