Gerolamo
A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents | Gerolamo