Recently OpenAI released GPT-5.2 which has superior benchmark results. However, some online chatters reveal that OpenAI might have used more tokens and compute for the benchmark test, and might be considered “cheating” the tests. If everything is equal, is GPT-5.2 actually on par with Gemini 3 Pro? Here we try to find out.

The "Cheating" Controversy: Compute & Tokens

The core of the controversy lies in inference-time compute. "Cheating" in this context refers to OpenAI using a configuration for benchmarks that is significantly more powerful (and expensive) than what is available to standard users or what is typical for a "fair" comparison.

Benchmark Comparison (GPT-5.2 vs. Gemini 3 Pro)

When the massive compute boost is factored in, GPT-5.2 does post higher scores, but the gap narrows or reverses when conditions are scrutinized.

Benchmark

GPT-5.2 (Thinking/Pro)

Gemini 3 Pro

Context

ARC-AGI-2

52.9%

~31.1%

Measures abstract reasoning. GPT-5.2's score is heavily reliant on the "Thinking" process.

GPQA Diamond

92.4%

91.9%

Graduate-level science. The scores are effectively tied (within margin of error).

SWE-Bench Pro

55.6%

N/A

Real-world software engineering. GPT-5.2 sets a new SOTA here.

SWE-Bench Verified

80.0%

76.2%

A more established coding benchmark. The models are roughly comparable here.


Are They "On Par"?

Yes, and Gemini 3 Pro may even be superior in "base" capability.

If "everything is equal"—meaning both models are restricted to the same amount of inference compute (thinking time)—the general consensus implies they are highly comparable, with different strengths:

Conclusions

The claim that they are "on par" is accurate. If you strip away OpenAI's "xhigh" compute advantage used in benchmarks, Gemini 3 Pro is likely equal or slightly ahead in raw model intelligence. GPT-5.2's "superiority" in benchmarks largely comes from its ability to spend significantly more time and compute processing a single prompt.

Based on the verification performed, here is the compiled list of sources regarding the GPT-5.2 release, the Gemini 3 Pro comparison, and the associated benchmarking controversy.

References

1. Official Release Announcements

OpenAI – System Card Update


Google – The Gemini 3 Era

2. Benchmark Performance & Technical Analysis

R&D World – Comparative Analysis

Vellum AI – Deep Dive

Simon Willison’s Weblog

3. The "Cheating" & Compute Controversy

Reddit (r/LocalLLaMA & r/Singularity)

InfoQ News