What is OpenAI’s New GPT OSS

What is OpenAI’s New GPT OSS: Free, Open-Source & Powerful – But What’s the Catch?

When it comes to AI models, the name of OpenAI comes first in the mind. And now in 2025, OpenAI has surprised everyone by launching its two new open-weight models, GPT OSS-120B and GPT-OSS-20B! Both these models come with Apache 2.0 license, which is a big gift for developers. But are these models really that amazing? Let’s talk in detail about OpenAI’s New GPT OSS performance, features, and usability and see what they are perfect for.

OpenAI’s New GPT OSS:

OpenAI released open-weight models for the first time after GPT-2 in 2019, and these models are designed for reasoning and agentic tasks. GPT-OSS-120B is a large model that comes with 117 billion parameters, while GPT-OSS-20B is compact with 21 billion parameters. Both models use Mixture-of-Experts (MoE) architecture, which makes them efficient and consumes fewer resources.

The 120B model activates tokens per 5.1B parameters and runs on an 80GB GPU (such as Nvidia H100), while the 20B model activates 3.6B parameters and can run on a laptop with 16GB memory. Their context length is up to 128,000 tokens, which is quite impressive.

Training and Capabilities

These models are mostly trained on English text data, with a focus on STEM, coding, and general knowledge. OpenAI has also open-sourced the O200K Harmony tokenizer for them, which was previously used in GPT-4 and GPT-4o-mini. These models feature chain-of-thought (CoT) reasoning and tool use, which makes them powerful for web search, Python code execution, and complex queries.

This makes it easy to create agentic workflows without any complicated setup. NVIDIA H100 GPUs were used for training, and reinforcement learning (RL) techniques were used in post-training, which gives them ChatGPT-like personality and reasoning ability.

Tool Use and Reasoning

The biggest highlight of GPT-OSS models is their tool use and chain-of-thought reasoning. You can use them for web search, code execution, or calculations, and all this happens within the thinking process of the model. For example, if you ask how many layers of experts to layer on a model, it will first perform a web search, analyze the results, and then answer – all within CoT.

This feature is very useful for developers, as you don’t need to create a separate agent architecture. The reasoning level is also adjustable – low, medium, or high – which can be set according to your task and latency needs.

Benchmark Performance

According to OpenAI’s benchmarks, GPT-OSS-120B and 20B are very competitive. The 120B model performs equal to or better than OpenAI’s proprietary o4-mini, especially in competition math (AIME 2024/2025) and health-related queries (HealthBench). The 20B model also competes with the o3-mini, and scores well in competition coding (Codeforces) and general problem-solving (MMLU, HLE).

120B scores 90% on MMLU and 20B scores 85.3%. In GPQA Diamond (PhD-level science questions), 120B scores 80.1% and 20B scores 71.5%. However, the hallucination rate is a little high – on PersonQA, 120B hallucinates 49% and 20B 53%, which is significantly more than o1 (16%).

Getting Started with GPT-OSS

Both models are available for free download on Hugging Face, and come with native MXFP4 quantization, which makes them very efficient. The 120B model runs on a single H100 GPU, while the 20B model runs on consumer devices with 16GB RAM. You can use Ollama, vLLM, or LM Studio to run them locally. Setup with Ollama is simple – just run ollama pull gpt-oss:20b or gpt-oss:120b.

Platforms like Azure, AWS, Fireworks, and OpenRouter support cloud operations. Pricing on OpenRouter for 120B is 10 cents per million input tokens and 50 cents per million output tokens, while for 20B it is 5 cents and 20 cents. Their low latency and customization options make them ideal for developers.

Feature	Description
Models	GPT-OSS-120B, GPT-OSS-20B
Parameters	117B (5.1B active), 21B (3.6B active)
Architecture	Mixture-of-Experts (MoE)
Quantization	Native MXFP4
Context Length	128,000 tokens
License	Apache 2.0
Reasoning Level	Low, Medium, High (Adjustable)
Memory Requirement	120B: 80GB GPU, 20B: 16GB RAM
Supported Platforms	Hugging Face, Ollama, OpenRouter, Azure, vLLM
Tool Usage	Web search, Python execution, Function calling

Conclusion

GPT OSS-120B and 20B are OpenAI’s first open-weight models. They are quite powerful in reasoning and agentic tasks. Their Apache 2.0 license and low resource needs make them a great option for developers, whether you’re using a local or cloud setup.

But the high hallucination rate and lack of multimodal features can be a drawback for some users. If you’re looking for a customizable, efficient, and offline NE-capable AI model, these models are perfect to try.