one-sided-moe-routing

write

DistributedTokenTensor -> RoutedTokenTensor

Route token activations directly to target expert node memories using one-sided RDMA/NVLink writes instead of bulk collective operations.

Problem it solves

Standard collective All-to-All communication blocks GPU execution paths and incurs coordination overhead in Mixture of Experts (MoE) scales.

Consumes

DistributedTokenTensor

Emits

RoutedTokenTensor

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.