Collected molecules will appear here. Add from search or explore.
A research repository for evaluating how various mixtures of training data (cell types, tissues, and sequencing technologies) impact the performance and generalization of single-cell foundation models (scFMs).
Defensibility
stars
4
forks
1
scFM-datamix is a Microsoft Research artifact specifically designed to accompany a scientific paper. With only 4 stars and 1 fork after more than a year, it lacks any community momentum or utility as a standalone software product. Its primary value is as a set of scripts for replicating a specific ablation study on data composition for models like Geneformer or scGPT. In the rapidly evolving 'AI for Science' domain, such benchmarking repositories are quickly superseded by newer, more comprehensive evaluation frameworks (e.g., scib-metrics or the benchmarks provided by the HEAL foundation). The moat is non-existent; the code is a standard implementation of PyTorch training loops and data loaders for single-cell data. The risk of displacement is high because the 'state of the art' in single-cell foundation models shifts every few months, rendering specific data-mixing insights from 2023 potentially obsolete for the next generation of architectures.
TECH STACK
INTEGRATION
reference_implementation
READINESS