
OpenAI might have just killed Claude
AI Generated Summary
Airdroplet AI v0.2Alright, so OpenAI just dropped some major news right when the presenter was trying to take a break for a conference – typical! They've rolled out two new models, O3 and O4 Mini, and O4 Mini is already stealing the spotlight as potentially the new best all-around model, especially considering its price. This whole release feels like a direct shot at Anthropic, aiming to win back the title of the most developer-friendly AI company.
Here’s the breakdown of what went down:
New Models: O3 and O4 Mini
- O4 Mini: This is the star of the show. It's basically an upgrade to the already excellent O3 Mini and is immediately the presenter's "new favorite model ever." It's incredibly capable for its price.
- Performance: In tests like solving Advent of Code, O4 Mini performs as well as, if not better than, O3 Mini. Benchmarks show it even outperforming the larger O3 model on some tasks, especially when allowed to use tools like Python.
- O3: This is the bigger, more powerful model. However, it's significantly more expensive and feels slower and more verbose, reminiscent of the older O1 models. The presenter isn't convinced it's worth the extra cost for most tasks compared to O4 Mini, questioning its practical advantages unless you have very complex problems.
- API Quirks: Interestingly, OpenAI made some minor breaking changes to the API (like how temperature is passed) but still doesn't expose the model's internal reasoning steps, which makes models feel slower than they are because you don't see them working immediately. There's speculation these API changes might have been paving the way for other features that didn't launch.
Pricing & Benchmarks
- O4 Mini Price: It costs the exact same as O3 Mini ($1.10/M input, $4.40/M output tokens). This makes it insanely cheap for its intelligence level – less than half the output price of Gemini 2.5 Pro and cheaper than older OpenAI models like 4.1.
- O3 Price: It's pricier ($10/M input, $40/M output), sitting closer to O1 but still representing a price drop compared to previous high-end models.
- Price War: OpenAI is clearly competing aggressively on price now, focusing on reducing the "cost per intelligence unit." This is a welcome change.
- Benchmark Highlights: While O4 Mini holds its own or wins in many areas, O3 does significantly better on certain specific, complex benchmarks like SWE Lancer (a freelancing dev task benchmark where Claude previously looked good) and Eider Polyglot (code diffing). Both models show big improvements in math.
Thinking with Images & Enhanced Tools
- Visual Reasoning: This is wild – O3 and O4 Mini can now perform image manipulation during their internal thought process. They use built-in tools to crop, zoom, rotate, and enhance images to better understand them, without needing separate specialized models. This helps with tasks like reading text from awkwardly angled photos (OCR) or even solving visual puzzles like mazes.
- How it Works: It seems to confirm previous theories that OpenAI models achieve complex tasks by using a suite of internal 'tools' (like image processing, web search, Python execution). The models now 'think' for longer and use these tools extensively in their chain-of-thought before giving an answer.
- Tool Integration: OpenAI is doubling down on making models better at using tools. Tools like image generation are also becoming available across more models, not just the top-tier ones.
- Potential Downside: These models can make hundreds of tool calls for a single query to ensure accuracy, which could get expensive if the tools themselves cost money (though the current models are cheaper overall).
Codex: The Open-Source Claude Code Killer
- What it is: OpenAI launched 'Codex', a command-line interface (CLI) tool for helping with coding tasks (like suggesting code changes as diffs). It runs right in your terminal.
- The Big Deal: Unlike Anthropic's 'Claude Code' (which is just a name and an issue tracker with no public code), OpenAI's Codex is fully open-source under the permissive Apache 2.0 license. This is huge.
- Tech Stack: It's built with TypeScript and uses React with Ink.js for the terminal user interface.
- Why it Matters: This is a direct attack on one of Anthropic's key developer-focused products. By making it open-source, OpenAI encourages community contribution and adoption, potentially making it much more popular and powerful than the closed-source alternative.
The Strategy: Winning Back Developers
- Targeting Anthropic: The whole announcement, from the models and pricing to Codex, seems strategically designed to undermine Anthropic's position as the preferred choice for developers.
- Anthropic's Edge: Developers tended to like Anthropic partly because Claude models were good at coding tasks and tool usage.
- OpenAI's Countermoves:
- Releasing cheaper, highly capable models (O4 Mini).
- Focusing heavily on improving tool use.
- Launching a better, open-source alternative to Claude Code (Codex).
- Potentially acquiring AI coding startup Windsurf for $3 billion (giving them an IDE integration story similar to Cursor/Anthropic).
- Planning to release a powerful open-source model soon.
- Shifting Tides: The presenter feels these moves are effective. OpenAI is lowering prices (Anthropic isn't), releasing useful open-source tools, and catching up fast on tool performance. It feels like OpenAI is now working more in developers' interests than Anthropic, which might have been 'coasting' on its reputation.
- The Hope: This intense competition might force Anthropic to respond by lowering prices and embracing open source more themselves, which is good for everyone.
Overall, this is a massive update from OpenAI, significantly improving model capabilities, drastically lowering costs for high performance, and making aggressive strategic plays to regain favor with the developer community, primarily by targeting Anthropic's strengths.