Collected molecules will appear here. Add from search or explore.
Optimization heuristics (Greedy and Adaptive Greedy) for allocating mixed-scale LLMs across heterogeneous GPU clusters to satisfy Service Level Objectives (SLOs) and budget constraints.
Defensibility
citations
0
co_authors
2
This project is a research artifact (9 days old, 0 stars) providing a mathematical approach to the 'packing and routing' problem for LLM inference. While the optimization heuristics (GH/AGH) solve a critical problem—balancing cost, latency, and model accuracy across varied GPU tiers—the code lacks the infrastructure to be a standalone product. It is highly vulnerable to 'feature absorption' by existing orchestration and serving frameworks. Specific competitors include SkyPilot (for cloud orchestration), Ray Serve (for inference scaling), and vLLM's internal scheduling logic. Frontier labs and hyperscalers (AWS, Azure, Google) already utilize similar internal MILP-based or heuristic schedulers for their managed LLM services (Bedrock, Vertex AI). The primary value of this work is as a reference implementation for engineers building in-house inference platforms rather than a defensible open-source project.
TECH STACK
INTEGRATION
algorithm_implementable
READINESS