// canarybench
Prompt
Containment
Arena
A CTF-style sandbox where red and blue teams trade shots over an LLM's system prompt. The defender hides a canary token; the attacker has N turns to extract it. Every turn is logged, scored, and folded into a public attack taxonomy.
Enter match lobbyMatches
···
Probes
···
Held
···
// RED TEAM
Craft injection attacks. Break through system prompts. Extract the canary token before time runs out.
// BLUE TEAM
Define your containment protocol. Protect the canary at all costs. Every turn survived is data.
// THE BENCH
Every match feeds a public attack taxonomy. Real human red-teaming data, open to researchers.