viveknarayana/wave

GitHubGH

A Kubernetes-native inference gateway designed to optimize LLM serving through KV-cache-aware routing, prompt caching, and request prioritization.

View on GitHub

Defensibility

2.0/10

stars

Platform Dominationhigh

Market Consolidationhigh

Displacement Horizon6 months

REASONING

Wave targets a highly sophisticated problem in LLM infrastructure: maximizing throughput by routing requests with shared prefixes to the same physical GPU instances to leverage KV-cache reuse. Technically, this is a 'hot' area of research (e.g., RadixAttention in vLLM). However, as a project with 1 star and no forks, it is currently a personal experiment rather than a viable tool. The defensibility is extremely low because established players in the LLM gateway space (like LiteLLM, Portkey, or Martian) and orchestration layers (KServe, Ray Serve, and vLLM's own native router) are rapidly implementing these same features. Furthermore, frontier labs (OpenAI/Anthropic) have moved prompt caching to the API level, reducing the need for mid-stack routing for many users. The project's 'Kubernetes-native' approach is the correct architectural choice for enterprise scale, but without a significant community or unique algorithmic advantage in cache-prediction, it faces immediate displacement by more mature ecosystems.

COMPOSABILITY

TECH STACK

GoKubernetesPrometheusvLLM (target)Custom Resource Definitions (CRDs)

INTEGRATION

api_endpoint

prompt_cachingkv_cache_affinityllm_routingkubernetes_orchestration

READINESS

Composabilityapplication

Depthprototype

Noveltyreimplementation