Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
Prompt -> GenerationStream
Route prompt processing (prefill) and token generation (decode) to separate, specialized hardware node pools.
Problem it solves
Co-locating compute-bound prefill and I/O-bound decode steps in the same batch causes hardware underutilization and latency spikes.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.