Gerolamo
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC | Gerolamo