Gerolamo
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding | Gerolamo