Gerolamo
Enhanced Text-to-Image Generation by Fine-grained Multimodal Reasoning | Gerolamo