Collected molecules will appear here. Add from search or explore.
Synthetic tabular data generation library using statistical distributions and JSON-based schema definitions.
Defensibility
stars
0
Pygmalion is a nascent project (1 day old, 0 stars) that provides a programmatic wrapper for generating synthetic tabular data based on explicit statistical distributions. While the feature set (bootstrap resampling, auto-fit with AIC, conditional dependencies) is solid for a utility library, it enters a highly crowded and mature market. Existing heavyweights like SDV (Synthetic Data Vault), Gretel.ai, and YData-synthetic offer significantly more advanced capabilities, including GAN-based and LLM-based generation which handle complex correlations better than explicit JSON distribution specs. The project currently lacks any form of moat; its functionality is a clean implementation of standard SciPy/NumPy patterns. From a competitive standpoint, platform providers (AWS, Azure, Google) are increasingly integrating data synthesis into their ML pipelines (e.g., SageMaker Data Wrangler), making standalone statistical generators vulnerable. Without a unique algorithmic breakthrough or massive community adoption, it remains a commodity tool.
TECH STACK
INTEGRATION
library_import
READINESS