GPTNTA two-agent benchmark: one model sees the bomb, one reads the manual. They must talk to defuse it.Can two AI agents talk each other through defusing a bomb?
GPTNT is a benchmark built on the cooperative game Keep Talking and Nobody Explodes, where two agents must coordinate to defuse procedurally generated bombs against a live countdown. One sees the bomb but not the manual; the other holds the manual but can't see the bomb. Neither can succeed alone — only effective, real-time, asynchronous communication defuses the bomb.
Not one of the state-of-the-art models we test defuses a single bomb in real time — a bar that human players clear.
Amit Parekh*, Sabrina McCallum*, Kareem Al-Hasan*, Malvina Nikandrou, Alessandro Suglia, Ioannis Konstas
Heriot Watt University · University of Edinburgh
* Equal contribution

Async replays coming soon
@misc{gptnt2026,
title = {GPTNT: Keep Talking and Nobody Explodes, for AI},
author = {GPTNT Team},
year = {2026},
note = {A two-agent benchmark for real-time multimodal collaboration},
url = {https://gptnt.ai}
}