Collected molecules will appear here. Add from search or explore.
SequenceTensor -> PartitionedAttentionTensor
Partition sequence tensors along the time/sequence dimension across multiple GPUs, using an All-to-All collective to gather query, key, and value vectors for attention calculations.
Problem it solves
Extremely long context sequences exceed the memory capacity of a single GPU's attention head allocation.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.