Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
Tensor<VisualFeature> -> Tensor<Embedding>
Project spatial visual features from a Vision Transformer into the LLM input dimension via an affine transformation.
Problem it solves
Visual feature dimensions mismatch the LLM token embedding dimension.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.