Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
Tensor<Batch, Seq, Dim> -> Tensor<Batch, Seq, Dim>
Project input representations into a lower-dimensional latent space before applying Mixture-of-Experts routing and token-to-expert multiplication.
Problem it solves
Standard Mixture-of-Experts layers suffer from high parameter and memory bandwidth overhead relative to their FLOP usage.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.