Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
PromptTensor -> Sequence<Token>
Run the token generation loop within persistent GPU kernels to avoid per-token host CPU launch overhead.
Problem it solves
CPU-to-GPU kernel launch latency limits token generation speed and introduces jitter.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.