gnina/gnina

GitHubGH

Deep learning–augmented molecular docking framework (gnina) for pose prediction/scoring of small molecules bound to proteins.

bygnina

View on GitHub

Published Nov 4, 2015

Utility

7.0/10

stars

928

forks

197

Platform Dominationhigh

Market Consolidationmedium

Displacement Horizon1-2 years

REASONING

Quantitative signals suggest meaningful adoption but not de facto standard lock-in. With ~928 stars and 196 forks over ~3863 days, gnina has sustained interest across years (classic indicator of a research tool that turned into a widely cited baseline). The fork count indicates active derivative usage by other groups rather than passive read-only popularity. However, the reported velocity (~0.104/hr ≈ 2.5/day, aggregated) is moderate rather than explosive—suggesting steady maintenance and ongoing use, but not currently in a rapid growth phase that would imply strong emergent network effects. Defensibility (7/10): gnina’s defensibility comes from practical engineering + learned docking heuristics that many labs incorporate into pipelines (data/model familiarity, parameter conventions, preprocessing quirks, and GPU-accelerated throughput). In this niche, switching costs can arise because docking workflows are brittle: preprocessing (atom types, protonation, grid/box generation), feature computation, batching, and evaluation protocols all matter. That creates a usability moat that is larger than typical academic prototypes. Moat limits: the core idea—deep learning–based rescoring/pose prediction for docking—is not singularly novel (so there’s no algorithmic “category definition” moat). Also, docking frameworks are naturally replaceable at the integration layer because they often interoperate with standard preprocessing and binding-site definitions. Therefore, while gnina is fairly robust and widely used, it’s not clearly irreplaceable like a dataset/model with exclusive access or a proprietary benchmark. Novelty assessment (incremental): gnina is best viewed as a significant improvement on established docking/scoring workflows (traditional docking + ML rescoring/pose evaluation), but it does not appear to introduce a fundamentally unprecedented paradigm with no direct precedent. This reduces the theoretical moat. Frontier risk (medium): Frontier labs could incorporate “deep docking” as part of broader drug-discovery platforms, but gnina’s specific implementation and docking workflow specialization makes it less likely that frontier model labs build/maintain gnina as a standalone product. More likely they would: (1) ship an internal docking/scoring module, (2) integrate docking-like scoring into their multimodal protein–ligand stacks, or (3) provide an adjacent feature (structure-based scoring) rather than adopt gnina unchanged. Hence medium, not low. Three-axis threat profile: - Platform domination risk: HIGH. Major platforms (Google DeepMind, OpenAI, AWS Marketplace ecosystems, Microsoft/NDI partners, plus big biotech software vendors) could absorb the capability by embedding a learned docking/scoring component into their existing protein/chemistry tooling. The technical complexity (GPU inference + standard geometry features) is well within platform capability. If a frontier platform product already exists for protein–ligand modeling, adding docking-like rescoring is an “adjacent build,” not a research moonshot. That makes this highly susceptible. - Market consolidation risk: MEDIUM. The docking/scoring market tends to consolidate around a few ecosystems (commercial suites, a small number of widely used open tools, and increasingly integrated ML modules). But complete consolidation is less certain because researchers have different ligand/protein types, benchmarking preferences, and regulatory/compliance constraints; also multiple docking engines remain useful. Thus not low, but also not inevitably high. - Displacement horizon: 1-2 years. If frontier labs (or dominant ML drug discovery vendors) ship higher-accuracy end-to-end protein–ligand modeling with docking-ready outputs, gnina’s role could shift from “baseline docking ML rescoring” to “one option among many.” Given current pace of ML in structure-based discovery, a 1–2 year window for meaningful displacement of *default* usage is plausible, even if gnina remains in legacy pipelines. Key opportunities: 1) Pipeline entrenchment: Many groups will keep gnina because updating docking workflows is costly. If gnina continues to support reproducibility and GPU performance, it retains long-tail defensibility. 2) Benchmark leverage: If gnina remains competitive on widely used docking benchmarks and continues to attract new adopters via citations, it can maintain relevance. 3) Integration as a component: Even if end-to-end models emerge, gnina can persist as a fast rescoring/post-processing step where stability and determinism matter. Key risks: 1) Platform embedding: A dominant platform’s integrated docking/scoring module could render gnina primarily a reference tool. 2) Algorithmic convergence: End-to-end structure/complex prediction models may outperform docking-rescoring hybrids, reducing gnina’s relative advantage. 3) Maintenance and reproducibility risk: If CUDA/model dependencies lag behind platform updates, usability switching costs can invert (people move away). Overall rationale for score 7: strong practical adoption signals (stars/forks over many years) + engineering utility (GPU-accelerated deep docking workflows) create meaningful switching costs, but the absence of a unique algorithmic moat and the high likelihood of platform-level integration keep frontier obsolescence risk at medium with displacement possible on a 1–2 year horizon.

COMPOSABILITY

TECH STACK

C++CUDAPythonPyTorch (implied by common gnina ecosystem usage)OpenBabel (commonly used in molecule preprocessing; implied integration surface)cuDNN (via CUDA workflows; implied)Linux/Unix toolchain

INTEGRATION

reference_implementation

deep_learning_dockingpose_scoringligand_protein_binding_predictiongpu_acceleration

READINESS

Composability

PATTERNS

The reusable building blocks distilled from this project — each a mechanism you could lift into your own.

CNN-guided pose refinement

othertransform

(List<LigandPose>, Receptor) -> List<OptimizedLigandPose>

Optimize and score candidate ligand-receptor docking poses using a 3D convolutional neural network scoring function.

distance-based flexible residue selection

othertransform

(Receptor, ReferenceLigand, DistanceThreshold) -> List<FlexibleResidue>

gnina/gnina

REASONING

COMPOSABILITY

PATTERNS

CNN-guided pose refinement

distance-based flexible residue selection

reference-ligand autoboxing

SMARTS-targeted covalent matching