NVIDIA/TensorRT-LLM

GitHub

View on GitHub

10.0/10

Platform Domination RiskN/A

Market Consolidation RiskN/A

Displacement HorizonN/A

CORE FUNCTION

Official NVIDIA high-performance inference optimization library for Large Language Models on NVIDIA hardware, providing advanced kernels, quantization, and orchestration.

TRACTION

stars

13,319

↑2.0 velocity

forks

2,263

0.0 velocity

REASONING

TensorRT-LLM is the gold standard for LLM inference on NVIDIA hardware. It is defensible because it is maintained by the hardware manufacturer with deep architectural access that third parties cannot easily replicate. Frontier labs are strategic partners/users rather than competitors, as they rely on this stack to maximize the ROI of their H100/B200 clusters.

COMPOSABILITY

TECH STACK

CUDAC++PythonTensorRTNCCLPyTorchcuBLAScuDNN

INTEGRATION

library_import

inference_optimizationgpu_accelerationquantizationdistributed_inferencememory_management

READINESS

Composability