Gerolamo
Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces | Gerolamo