Collected molecules will appear here. Add from search or explore.
Benchmark and evaluation framework (CAUSALT3) for assessing LLM sycophancy and skepticism specifically within causal reasoning tasks across Pearl's hierarchy (association, intervention, counterfactuals).
Defensibility
citations
0
co_authors
1
The project addresses a sophisticated failure mode in LLMs: the tendency to abandon sound causal logic when prompted with authoritative or biased social hints (sycophancy). While sycophancy is a known issue explored by labs like Anthropic, applying it specifically to Pearl's 'Ladder of Causation' is a novel and valuable combination. The defensibility is currently low (3) because the project is a nascent research artifact (9 days old, 0 stars) with a relatively small dataset (454 instances). In the competitive landscape of LLM benchmarks, scale and community adoption are the primary moats. It competes with broader causal benchmarks like CLADDER and general safety/alignment benchmarks. The 'control failure vs. knowledge failure' distinction is a high-value insight for frontier labs, who are likely to integrate such testing into their internal RLHF and red-teaming pipelines. The small instance count makes it easy to replicate or expand upon, but the specific expert-curated nature of the causal traces provides a temporary qualitative advantage over automated synthetic benchmarks.
TECH STACK
INTEGRATION
reference_implementation
READINESS