// canarybench

Prompt
Containment
Arena

A CTF-style sandbox where red and blue teams trade shots over an LLM's system prompt. The defender hides a canary token; the attacker has N turns to extract it. Every turn is logged, scored, and folded into a public attack taxonomy.

Enter match lobby
Matches
···
Probes
···
Held
···
// RED TEAM

Craft injection attacks. Break through system prompts. Extract the canary token before time runs out.

// BLUE TEAM

Define your containment protocol. Protect the canary at all costs. Every turn survived is data.

// THE BENCH

Every match feeds a public attack taxonomy. Real human red-teaming data, open to researchers.

// every turn is logged · scored · taxonomized
View Public DatasetHall of Fame