Model Detail
GPT-5.4
Dominates terminal-style external benchmarks, but still has a gap in our structured internal operator benchmark data.
Benchmark score
88/100
Source
External benchmark canon
Role
Terminal/coding specialist
Strengths
- #1 on Terminal-Bench 2.0
- Very strong coding/execution profile
- Competitive frontier model
Weaknesses
- Provider path blocked prior internal benchmarking
- Less direct operator-suite evidence in our canon
Operator read
Dominates terminal-style external benchmarks, but still has a gap in our structured internal operator benchmark data.
Source artifacts
Raw machine-readable files for anyone who wants to dig deeper or run their own analysis.