Blog posts

2025

Probabilistic Soundness Guarantees in LLM Reasoning Chains

less than 1 minute read

Published:

Large language models (LLM) often make reasoning errors. However, current LLM-based error detection methods often fail to detect propagated errors because earlier errors can corrupt downstream judgments. To address this, we introduce Autoregressive Reasoning Entailment Stability (ARES), an algorithmic framework for measuring reasoning soundness with statistical guarantees. ARES can reliably detect errors in long reasoning chains, especially propagated errors that other methods fail to catch.

2023

Sum-of-Parts Models: Faithful Attributions for Groups of Features

less than 1 minute read

Published:

We identify a fundamental barrier for feature attributions in faithfulness tests. To overcome this limitation, we create faithful attributions to groups of features. The groups from our approach help cosmologists discover knowledge about dark matter and galaxy formation.