EduardoMVA/pygmalion

GitHubGH

Synthetic tabular data generation library using statistical distributions and JSON-based schema definitions.

View on GitHub

Defensibility

2.0/10

stars

Platform Dominationhigh

Market Consolidationhigh

Displacement Horizon6 months

REASONING

Pygmalion is a nascent project (1 day old, 0 stars) that provides a programmatic wrapper for generating synthetic tabular data based on explicit statistical distributions. While the feature set (bootstrap resampling, auto-fit with AIC, conditional dependencies) is solid for a utility library, it enters a highly crowded and mature market. Existing heavyweights like SDV (Synthetic Data Vault), Gretel.ai, and YData-synthetic offer significantly more advanced capabilities, including GAN-based and LLM-based generation which handle complex correlations better than explicit JSON distribution specs. The project currently lacks any form of moat; its functionality is a clean implementation of standard SciPy/NumPy patterns. From a competitive standpoint, platform providers (AWS, Azure, Google) are increasingly integrating data synthesis into their ML pipelines (e.g., SageMaker Data Wrangler), making standalone statistical generators vulnerable. Without a unique algorithmic breakthrough or massive community adoption, it remains a commodity tool.

COMPOSABILITY

TECH STACK

PythonNumPySciPyPandasJSON

INTEGRATION

library_import

synthetic_data_generationstatistical_modelingdata_augmentationtabular_data

READINESS

Composabilitycomponent

Depthbeta

Noveltyreimplementation