Gerolamo
Do Vision Language Models Need to Process Image Tokens? | Gerolamo