Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
SequenceTensor -> PartitionedSequenceTensors
Shard input token sequences across arbitrary GPU cluster sizes by combining Ulysses head-wise partitioning with Ring-Attention token-wise ring communication.
Problem it solves
Long-sequence training is memory-bound and constrained by attention head counts when using Ulysses alone.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.