Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
HiddenState -> List<TokenProposal>
Employ auxiliary parallel prediction heads during inference to forecast subsequent tokens simultaneously, generating draft sequences for single-step verification.
Problem it solves
Auto-regressive generation is memory-bandwidth bound; traditional speculative decoding requires a separate, mismatched draft model.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.