Anonpsy: A Graph-Based Framework for Structure-Preserving De-identification of Psychiatric Narratives

arXivarX

Graph-based, structure-preserving de-identification of psychiatric narrative text by reframing de-identification as graph-guided semantic rewriting to preserve clinically relevant structure while reducing identity leakage beyond explicit PHI.

View on arXiv

Defensibility

2.0/10

citations

Platform Dominationhigh

Market Consolidationmedium

Displacement Horizon6 months

REASONING

Quantitative signals indicate essentially no open-source adoption yet: 0 stars, 2 forks, and ~0.0/hr velocity with a repo age of 1 day. That combination strongly suggests early release / pre-adoption status rather than an established ecosystem or user base. From the README context (and arXiv paper reference), the project claims a specific niche angle: psychiatric de-identification by modeling narrative structure as a graph and using that graph to guide semantic rewriting, aiming to preserve clinically relevant “structure” while altering identity-bearing elements that standard PHI masking or generic LLM rewriting might miss. This is plausibly a novel combination (graph-guided rewriting applied specifically to psychiatric narrative structure), but the defensibility hinges on (a) code maturity, (b) empirical benchmarks, and (c) whether it becomes a de facto standard workflow. Why the defensibility score is low (2/10): - No adoption/moat signals: 0 stars and no measurable commit velocity. - Likely easy to replicate at the algorithmic level for frontier labs: graph-guided semantic rewriting is an approach that can be implemented using standard tooling and LLM prompting/conditioning plus graph extraction. Even if the psychiatric-specific target is narrower, the core engineering pattern (structure extraction -> constraints -> rewrite -> de-ID evaluation) is not inherently protected. - No ecosystem lock-in apparent: with no packaging details, no API/CLI/docker surfaces described, and no community traction, switching costs are effectively zero. Frontier risk assessment (medium): - Frontier labs are unlikely to market a dedicated psychiatric-only de-identification product, but they can incorporate the underlying mechanism (structure/constraint-guided rewriting for de-ID) into broader compliance or healthcare text processing systems. - Because the problem is adjacent to widely pursued safety/compliance pipelines and synthetic rewriting, the techniques could be absorbed as an internal feature. Threat profile rationale: 1) Platform domination risk: HIGH - Who could displace it: OpenAI, Anthropic, and Google (or their enterprise compliance stacks) could add structured de-identification modes that combine (i) PHI detection, (ii) entity/event abstraction, and (iii) constraint-guided rewriting. They can also reproduce graph conditioning logic as part of their orchestration layers. - Timeline: likely fast because (given the lack of code maturity/adoption) this looks like a research-to-implementation path rather than entrenched infrastructure. 2) Market consolidation risk: MEDIUM - The de-identification/deployment tooling market tends to consolidate around a few “platform” providers (clinical NLP vendors, HIPAA-compliance solution suites, and general LLM vendors with healthcare offerings). - However, psychiatric de-identification might retain some niche value for specialized integrators, keeping consolidation from being fully inevitable. 3) Displacement horizon: 6 months - With 1-day age and no velocity, any competitor with similar research capability can implement a comparable constrained rewriting system quickly, especially using mature LLM tooling. - If the paper’s approach is validated, frontier labs or well-funded competitors could incorporate it into their compliance workflows within a year-scale horizon; hence 6 months is a conservative but plausible displacement window for the specific repo/tooling. Key opportunities: - If the project publishes strong clinical de-ID benchmarks (identity leakage metrics, utility preservation, and adversarial re-identification tests) plus an easy-to-use implementation, it could improve adoption rapidly. - A compelling differentiator would be reproducible end-to-end pipelines, standardized graph extraction/constraints, and measurable gains over baseline PHI masking and generic LLM rewriting. Key risks: - Unclear maturity: no evidence yet of production-quality evaluation harnesses, privacy threat modeling, or robust graph extraction for psychiatric narrative styles. - Competitive substitutability: constrained/structured rewriting plus improved PHI/event masking is a pattern likely to be implemented across competitors. Net: the idea may be interesting (graph-guided, structure-preserving psychiatric de-identification), but with current repo signals (0 stars, 1-day age, no velocity), there is no observable moat or ecosystem inertia; displacement by platform-level compliance features appears plausible on a short horizon.

COMPOSABILITY

TECH STACK

unknown (paper-only; not provided in prompt)likely pythonlikely NLP/graph tooling (e.g., spaCy/NLTK + graph library) but not verifiable from supplied data

INTEGRATION

reference_implementation

structure_preserving_rewritinggraph_guided_deidentificationpsychiatric_narrative_semantic_controlphi_leakage_mitigation

READINESS

Composabilityframework

Depthprototype

Noveltynovel_combination