Skip to content

ACMG Variant Classification Assistant

Latest

Choose a tag to compare

@Bilmem2 Bilmem2 released this 03 Dec 17:51
· 8 commits to main since this release
b0263b3

ACMG Assistant v4.0.0 – Release Notes

Release Date: December 2025


Highlights

  • Multi-source predictor pipeline: Predictor scores (REVEL, CADD, AlphaMissense, SIFT, PolyPhen-2, etc.) are now fetched from external APIs (myvariant.info/dbNSFP, dedicated AlphaMissense/CADD endpoints) via PredictorAPIClient. No predictor values are hardcoded.
  • Population AF from gnomAD GraphQL: PopulationAPIClient queries gnomAD v4 directly; BA1/BS1/PM2 derived from live data, not static tables.
  • Validated caching layer: New ResultCache with TTL-based expiration and strict validation. Invalid entries (e.g., REVEL > 1.0, AF > 1.0) are rejected and auto-invalidated.
  • PM1 via external APIs: Hardcoded hotspot/domain regions removed. DomainAPIClient queries CancerHotspots.org and UniProt for functional annotations.
  • PhenotypeMatcher wired to CLI: PP4/BP5 now derived from weighted Jaccard similarity over gene–phenotype associations and HPO synonyms. Users can input HPO IDs or free-text phenotypes interactively.
  • InteractiveEvidenceCollector: Structured prompts for PS3/BS3, PS4, PP1/BS4, PS1/PM5, and deprecated PP5/BP6. Responses encoded as ManualEvidence and merged with automatic evidence.
  • CLI transparency: Revised welcome banner explains automatic vs. interactive evidence, data sources, and research-only disclaimer.
  • ~198 tests passing: Comprehensive coverage of caching, API clients, PM1, phenotype matching, interactive evidence, and classification logic.

Detailed Changes

Multi-source Predictors & PP3/BP4

PredictorAPIClient fetches scores from multiple backends with configurable priority (PREDICTOR_SOURCE_PRIORITY). Each score is wrapped in a PredictorScore dataclass that tracks source, version, and whether the predictor is inverted (e.g., SIFT: lower = more damaging).

  • Primary source: myvariant.info (aggregates dbNSFP 4.x).
  • Fallback sources: AlphaMissense API, CADD API.
  • MissenseEvaluator is now a pure interpreter: it consumes pre-fetched PredictorScore objects, computes a weighted composite score, and maps to PP3/BP4 based on thresholds.
  • Weight renormalization handles partial data (e.g., if only REVEL and CADD are available).

Population AF & BA1/BS1/PM2

PopulationAPIClient queries gnomAD v4 via GraphQL. Responses are parsed into PopulationStats dataclass (AF, AC, AN, homozygote count, subpopulation breakdown, popmax).

  • PopulationAnalyzer and EthnicityAwarePopulationAnalyzer consume typed population data.
  • BA1 (stand-alone benign), BS1 (strong benign), PM2 (absent from controls) derived from live AF values.
  • No static frequency tables remain in the codebase.

Validated Caching Layer

ResultCache (in cache.py) provides file-based, thread-safe caching with:

  • CacheKey: category + source + normalized variant ID.
  • CacheEntry: value + timestamp + TTL.
  • Default TTLs: 7 days (predictors), 30 days (population).

Validation rules enforced on cache read:

Data Type Validation
REVEL ∈ [0, 1]
CADD_PHRED ∈ [0, 60]
AF ∈ [0, 1]
AC ≤ AN

Invalid entries → cache miss → re-fetch from API.

PM1 via External Hotspots/Domains

DomainAPIClient queries:

  • CancerHotspots.org: Returns tumor count and mutation frequency at a given position.
  • UniProt REST API: Returns functional domains (DNA-binding, kinase, etc.) overlapping the variant position.

GeneSpecificRules.evaluate_pm1() maps annotation confidence to PM1 or PM1_supporting via thresholds. No hardcoded KNOWN_HOTSPOTS or hotspot_regions remain.

PhenotypeMatcher & PP4/BP5

PhenotypeMatcher is now fully integrated into the interactive CLI:

  1. User provides phenotypes (HPO IDs or free-text).
  2. HPOClient normalizes text to HPO terms via hpo_synonyms.json.
  3. Weighted Jaccard similarity computed against gene-specific phenotype profiles (gene_phenotypes.json).
  4. Evidence assignment:
    • PP4: similarity ≥ 0.8
    • PP4_supporting: similarity ≥ 0.5
    • BP5: similarity ≤ 0.2
    • Neutral: 0.2 < similarity < 0.5

The old phenotype_match yes/no flag has been removed from the interactive flow.

Interactive Evidence (PS3/PS4/PP1/PS1/PP5)

InteractiveEvidenceCollector provides structured prompts for criteria that require literature review:

Criterion User Input
PS3/BS3 Functional study count, quality, direction
PS4 Case-control counts → OR computed
PP1/BS4 Segregation pattern, affected carriers
PS1/PM5 Same codon pathogenic variant presence
PP5/BP6 Reputable source assertion (deprecated)

Responses are encoded as ManualEvidence and merged into the final evidence set. PP5/BP6 are labeled deprecated per ClinGen guidance.

CLI & Documentation

  • Welcome banner displays version, guideline set (2015 or 2023), mode, and date.
  • Explicit explanation of automatic evidence (API-derived) vs. interactive evidence (user-provided).
  • Research/educational disclaimer shown at startup.
  • README rewritten to document:
    • Data flow architecture (ASCII diagram).
    • Caching model.
    • Evidence types.
    • Limitations.

Upgrade Notes

  1. PP4/BP5 derivation changed: Previously, users could set a manual phenotype_match flag. This is replaced by the PhenotypeMatcher pipeline. If no phenotypes are provided, PP4/BP5 will not be evaluated.

  2. PM1 requires API availability: Without access to CancerHotspots.org or UniProt, PM1 will not be assigned. The tool does not fall back to hardcoded hotspot lists.

  3. Stricter cache validation: Cached entries with out-of-range values (e.g., REVEL = 5.0) will be rejected. If you have old cache files, they may be auto-invalidated on first access.

  4. Evidence may not trigger if data is missing: By design, the tool only assigns evidence when factual data is available from external sources or user input. This is intentional—absence of data ≠ absence of pathogenicity.


Breaking Changes / Incompatibilities

Change Impact
phenotype_match flag removed Replace with HPO ID / free-text phenotype input via collect_patient_phenotypes().
Hardcoded PM1 regions removed PM1 now requires live API access to CancerHotspots.org / UniProt.
VariantData.patient_phenotypes type changed Now Optional[List[str]] instead of Optional[str].
Old cache format Legacy api_cache.json entries may be invalidated if they fail validation.

The public CLI interface (command-line flags, interactive prompts) remains backward-compatible. Internal behavior is stricter.


Known Limitations

  • API dependency: Live classification requires internet access to gnomAD, myvariant.info, CancerHotspots.org, and UniProt. Offline mode relies on cached data only.
  • Phenotype similarity is approximate: The weighted Jaccard approach is educational, not a production-grade semantic HPO engine.
  • Missense composite score is heuristic: The weighted average of normalized predictor scores is a reasonable proxy but not a validated clinical metric.
  • Not a clinical decision tool: All results require validation by qualified professionals. The tool is intended for research, education, and workflow augmentation.

Testing

  • ~198 tests passing across:
    • test_cache_and_validation.py: Cache TTL, validation, corruption recovery.
    • test_predictor_population_api.py: Multi-source API clients with mocked responses.
    • test_gene_specific_pm1.py: PM1 evaluation via mocked DomainAPIClient.
    • test_acmg_classifier.py: End-to-end classification scenarios (pathogenic, benign, VUS, conflicting).
    • test_interactive_evidence.py: ManualEvidence mapping and merging.
    • tests/conftest.py: Fixtures for offline testing with mocked APIClient.

No regressions detected in prior classification scenarios.