Collected molecules will appear here. Add from search or explore.
A reproduction pipeline of the DeepSeek-R1 training methodology (SFT, GRPO, and Rejection Sampling) applied to various open-weights models like LLaMA-3.1 and Mistral.
stars
0
forks
0
The project is a personal experiment or educational reproduction of the DeepSeek-R1 paper. With zero stars and forks, it has no market traction. The methodology (GRPO, SFT) is rapidly becoming standardized in established libraries like HuggingFace TRL and Axolotl, making this specific implementation highly vulnerable to obsolescence.
TECH STACK
INTEGRATION
reference_implementation
READINESS