OpenAI released the strongest inference model of o3 and o4-mini: can think about pictures, automatically select tools, and make breakthroughs in mathematics and coding performance

OpenAI today officially announced the o3 and o4-mini inference models, realizing image inference and multi-tool integration for the first time, and the community is optimistic about its potential to promote "AI agents". (Synopsis: OpenAI secretly builds its own social platform, pointing to Musk's X) (Background supplement: GPT-5 postponed!) OpenAI first pushes o3, o4-mini, Sam Altman revealed: integration is more difficult than expected) Artificial intelligence giant OpenAI officially released two new generation inference models - o3 and o4-mini in the early morning of (17), emphasizing its "image inference" and the ability to use all ChatGPT tools autonomously, triggering global AI The developer community is buzzing, symbolizing another key step towards "surrogate AI" for the company. Mathematics, coding and other performance breakthroughs o3 is positioned as OpenAI's strongest inference model to date, designed for complex math, science, code writing and graphical logic tasks, and achieves state-of-the-art performance in SWE-bench Verified (software engineering benchmark), with a score of 69.1%, ahead of Claude 3.7 Sonnet's 62.3%. O4-mini retains high reasoning power while taking into account cost and speed, becoming a lightweight first choice for developers. According to the test data of OpenAI, the performance of o4-mini in AIME (American Mathematics Competition) 2024 and 2025 is 93.4% and 92.7%, respectively, surpassing the full version of o3 and becoming the current model with the highest accuracy; Score 2700 points in Codeforces competitions and rank among the top 200 engineers in the world. O3 and O4-mini continue the inference-oriented training method emphasized by the O series, specially designed as a model architecture of "think longer before responding", so that AI can not only react quickly, but also solve complex and multi-step problems. This design also represents that OpenAI continues to walk in the technical context of "more inference time = higher performance" and tests this hypothesis in the reinforcement learning process. Image inference for the first time: AI can "understand diagrams, sketches and PDFs" The most striking update is that both models have image inference capabilities for the first time. o3 and o4-mini can understand and analyze images, even in low quality, such as handwritten whiteboards, blurry PDFs, sketches and statistical charts, and incorporate multi-step reasoning processes. This means that AI can not only read and respond to text instructions, but also "think" the logic and association behind the image, moving towards a true multimodal agent system. In addition to improved visual comprehension, models can also operate on images, such as rotation, scaling, or deformation processing, making images part of the inference chain and unlocking new solutions to cross-modal problems. Multi-tool integration: from "chat" to "task solving" Both models can autonomously call the tools provided by ChatGPT, including search, program execution, DALL· E Image generation and analysis to realize the integrated process from instruction reception, information capture to visual reasoning. Different from the previous passive execution of tool use logic, O3 and O4-mini have autonomous decision-making capabilities, and can automatically choose whether to enable tools such as search, program execution, or image generation according to the nature of the problem, showing a workflow close to that of human experts. This flexible way of applying policies also allows the model to dynamically adjust the processing order and content based on the input, which is an important milestone in the move towards "surrogate AI". OpenAI also launched the open-source tool Codex CLI for developers to integrate AI in the local terminal to assist in code writing and debugging. The Codex CLI is now open source and a million-dollar development grant program is open. Pricing and availability: o4-mini has the advantage of "high CP" The o3 model API price is $10 per million input tokens and $40 output tokens; In comparison, the O4-MINI costs only $1.10 and $4.40, which is slightly inferior in performance but has an overwhelming cost advantage. ChatGPT Plus ($20/month), Pro ($200/month) and Team users are available now, and businesses and educational institutions will be available in a week. OpenAI clearly demonstrates the evolution direction of "inference AI" through o3 and o4-mini, not only improving language capabilities, but also integrating image understanding and tool operation for the first time. These two models are not just a single point of update, but also an important transition from ChatGPT to proxy AI. The future launch of o3-pro (which will be available to Pro users in the coming weeks) and GPT-5, if this round of technological breakthroughs can be integrated, will have the opportunity to define the next generation of AI product standards. Related reports OpenAI strengthens GPT-4o to rush to the second place in the ranking! Sam Altman: Better understanding of people and writing programs, creativity increases OpenAI announces: Open Agents SDK supports MCP, connecting everything in series to take another key step OpenAI launches the strongest graph model: accurate information graphics, multi-modal input, realistic and difficult to distinguish, built into GPT-4o 〈OpenAI releases o3 and o4-mini The strongest inference models: can think about pictures, automatically select tools, mathematics, This article was first published in BlockTempo's "Dynamic Trend - The Most Influential Blockchain News Media".

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
  • Pin