Collected molecules will appear here. Add from search or explore.
A Vision-Language-Action (VLA) foundation model (7B parameters) designed for zero-shot, cross-embodiment robotic control across diverse hardware platforms using a massive 10,000-hour robotic dataset.
Defensibility
citations
0
co_authors
8
RDT2 represents a significant push into the 'robotic foundation model' space, leveraging a massive 10,000-hour dataset which is an order of magnitude larger than many academic datasets (like BridgeV2). Its defensibility stems from this 'data gravity'—the UMI data format allows for cheaper, handheld data collection, which creates a scalable data flywheel that is hard to replicate without significant physical operational effort. The 7B parameter scale puts it in the same class as OpenVLA and Octo, but with a specific focus on zero-shot cross-embodiment (the ability to run on a new robot without fine-tuning). Despite having 0 stars currently (likely due to a very recent paper release), the 8 forks indicate immediate researcher interest. The primary risk is that frontier labs like Google DeepMind (RT-X) or Physical Intelligence (pi0) have access to even larger private datasets and compute, and could release weights that generalize even better. Furthermore, NVIDIA (Isaac) or AWS could provide the hosted infrastructure that makes such models easier to deploy, potentially commoditizing the underlying model architecture.
TECH STACK
INTEGRATION
reference_implementation
READINESS