direct-nvme-state-paging

write

GPUMemoryTensor -> NVMeStorageTensor

Asynchronously page model tensor data directly between NVMe storage and GPU memory, bypassing intermediate host memory buffers.

Problem it solves

System DRAM capacity limits the maximum trainable parameter count on a given node.

Consumes

GPUMemoryTensor

Emits

NVMeStorageTensor

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.