n-gram-speculative-decoding

transform

TokenSequence -> DraftTokenProposals

Predict candidate draft tokens by searching for recurring historical token sub-sequences in the past prompt context.

Problem it solves

Traditional speculative decoding requires a secondary draft model that consumes extra GPU memory and runtime overhead.

Consumes

TokenSequence

Emits

DraftTokenProposals

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.