Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
(Image, TextQuery) -> PointAnnotatedText
Translate natural language target queries into normalized 2D pixel coordinate tokens embedded directly inside text output.
Problem it solves
Traditional bounding box formats are overly verbose and computationally complex for generative language decoders to output.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.