Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
DistributedSparseFeatures -> AggregatedNodeEmbeddings
Schedule intra-node embedding reduction and aggregation to run concurrently inside the execution window of inter-node all-to-all communication.
Problem it solves
Uncoordinated hierarchical communication (intra-node and inter-node) leads to severe serialization overheads in clustered accelerators.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.