monarch-initiative
diff --git a/‎.claude/skills/create-definitions-from-ohdsi/SKILL.md‎
Lines changed: 41 additions & 0 deletions b/‎.claude/skills/create-definitions-from-ohdsi/SKILL.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎.claude/skills/create-definitions-from-ohdsi/references/model-mapping.md‎
Lines changed: 49 additions & 0 deletions b/‎.claude/skills/create-definitions-from-ohdsi/references/model-mapping.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎.claude/skills/create-definitions-from-ohdsi/scripts/ohdsi_cohort_to_definition.py‎
Lines changed: 147 additions & 0 deletions b/‎.claude/skills/create-definitions-from-ohdsi/scripts/ohdsi_cohort_to_definition.py‎
Lines changed: 147 additions & 0 deletions
diff --git a/‎.claude/skills/disease-classification/SKILL.md‎
Lines changed: 9 additions & 0 deletions b/‎.claude/skills/disease-classification/SKILL.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎.claude/skills/disease-trajectories/SKILL.md‎
Lines changed: 123 additions & 0 deletions b/‎.claude/skills/disease-trajectories/SKILL.md‎
Lines changed: 123 additions & 0 deletions
@@ -0,0 +1,41 @@
+---
+name: create-definitions-from-ohdsi
+description: Generate dismech definitions from OHDSI/ATLAS cohort definitions or other computable phenotype logic. Use when converting OMOP cohort JSON, drafting PheKB-style phenotype algorithms, or mapping FHIR/CQL/OMOP rules into dismech `definitions` blocks.
+---
+
+# Create Definitions From OHDSI
+
+Use this skill to convert OHDSI/ATLAS cohort definitions into dismech `definitions` blocks and to map FHIR/CQL logic into the same structure.
+
+## Quick start
+
+1. Export an ATLAS/WebAPI cohort definition JSON.
+2. Generate a YAML fragment:
+
+```bash
+uv run python .claude/skills/create-definitions-from-ohdsi/scripts/ohdsi_cohort_to_definition.py /path/to/cohort.json --wrap
+```
+
+3. Paste the fragment into the target disorder file under `definitions`.
+4. Normalize to dismech norms (add evidence snippets, scope, criteria set names, and any available term objects).
+5. Validate:
+
+```bash
+just validate kb/disorders/<Disease>.yaml
+```
+
+## Workflow guardrails
+
+- Keep logic concise: express cohort entry, inclusion rules, and exit criteria in `criteria_sets`.
+- Use `minimum_required` for numeric logic; put temporal logic in `description`.
+- Add evidence snippets from abstracts when the algorithm is derived from a publication.
+- Only add `term` objects when the CURIE is in a configured prefix (ICD10CM, NCIT, HP, etc.).
+
+## References
+
+- Mapping guide: `references/model-mapping.md` (FHIR/OHDSI/CQL to dismech)
+
+## Scripts
+
+- `scripts/ohdsi_cohort_to_definition.py`: Convert ATLAS/WebAPI cohort JSON to a dismech definition fragment.
+  - Use `--wrap` to emit a top-level `definitions` key.
@@ -0,0 +1,49 @@
+# Model Mapping Guide (FHIR / OHDSI / CQL -> dismech)
+
+Use this guide to translate computable phenotype logic into dismech `definitions`.
+
+## dismech target structure
+
+- `definitions[]`: Top-level phenotype definition or diagnostic criteria set.
+- `criteria_sets[]`: Named sub-blocks (primary criteria, inclusion rules, confirmation).
+- `inclusion_criteria[]`, `exclusion_criteria[]`, `core_clinical_characteristics[]`,
+  `imaging_requirements[]`, `laboratory_requirements[]`, `additional_requirements[]`.
+- `minimum_required`: Use for count thresholds (>=2 events, >=1 lab, etc.).
+- `notes`: Use for cohort exit and temporal logic if no dedicated slot.
+
+## OHDSI / OMOP cohort definition
+
+- **ConceptSet** -> `CriteriaItem` under `inclusion_criteria`.
+  - `preferred_term`: "Concept set: <name>".
+  - `description`: concept count, domain, and any constraints.
+- **PrimaryCriteria** (entry events) -> `criteria_sets` entry named "Primary criteria".
+- **InclusionRule** -> separate `criteria_sets` entry (one per rule).
+- **CorrelatedCriteria** -> `additional_requirements` or `imaging_requirements`.
+- **Cohort exit** -> `notes` or a `CriteriaItem` in `additional_requirements`.
+- **Temporal windows** (e.g., 31-365 days) -> `description` on the relevant `CriteriaItem`.
+
+Concept IDs:
+- If a concept maps to a configured prefix (ICD10CM, NCIT, HP), add a `term` object.
+- Otherwise keep the code in `description` and the label in `preferred_term`.
+
+## FHIR phenotype logic
+
+- **Condition** -> `inclusion_criteria` item.
+  - Use `term` if ICD-10-CM or other configured prefix.
+- **Observation** (lab/serology) -> `laboratory_requirements`.
+- **Procedure / Imaging** -> `imaging_requirements` or `additional_requirements`.
+- **MedicationRequest / MedicationAdministration** -> `additional_requirements`.
+- **Encounter** (setting) -> `scope` or `notes` on the definition.
+- **ValueSet** -> treat as a concept set (same mapping as OHDSI).
+
+## CQL phenotype logic
+
+- **define** statements -> `criteria_sets` or `CriteriaItem` entries.
+- **exists / count** -> `minimum_required` or describe in the criterion.
+- **temporal operators** (during, overlaps, before/after) -> describe in the criterion.
+- **negation** -> `exclusion_criteria`.
+
+## Notes
+
+- Keep the algorithm in plain language; do not copy vendor-specific syntax.
+- Evidence: attach at least one EvidenceItem if the algorithm is drawn from a paper.
@@ -0,0 +1,147 @@
+#!/usr/bin/env python3
+"""
+Convert an OHDSI/ATLAS cohort definition JSON into a dismech definitions YAML fragment.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+
+def _first_value(data: dict, keys: tuple[str, ...]) -> Any:
+    for key in keys:
+        if key in data:
+            return data[key]
+    return None
+
+
+def _find_container(data: dict, keys: tuple[str, ...]) -> dict:
+    for container in (data, data.get("expression", {}), data.get("Expression", {})):
+        if isinstance(container, dict) and any(k in container for k in keys):
+            return container
+    return {}
+
+
+def _get_concept_sets(data: dict) -> list[dict]:
+    container = _find_container(data, ("conceptSets", "ConceptSets", "concept_sets"))
+    for key in ("conceptSets", "ConceptSets", "concept_sets"):
+        value = container.get(key)
+        if isinstance(value, list):
+            return value
+    return []
+
+
+def _get_inclusion_rules(data: dict) -> list[dict]:
+    container = _find_container(data, ("inclusionRules", "InclusionRules"))
+    for key in ("inclusionRules", "InclusionRules"):
+        value = container.get(key)
+        if isinstance(value, list):
+            return value
+    return []
+
+
+def _count_concepts(concept_set: dict) -> int | None:
+    expression = concept_set.get("expression")
+    if not isinstance(expression, dict):
+        return None
+    items = expression.get("items")
+    if isinstance(items, list):
+        return len(items)
+    return None
+
+
+def _prune(value: Any) -> Any:
+    if isinstance(value, dict):
+        pruned = {k: _prune(v) for k, v in value.items()}
+        return {k: v for k, v in pruned.items() if v not in (None, [], {})}
+    if isinstance(value, list):
+        pruned_list = [_prune(v) for v in value]
+        return [v for v in pruned_list if v not in (None, [], {})]
+    return value
+
+
+def build_definition(data: dict, args: argparse.Namespace) -> dict:
+    name = args.name or _first_value(data, ("name", "Name")) or "OHDSI cohort definition"
+    description = args.description or _first_value(data, ("description", "Description"))
+    scope = args.scope or "OMOP CDM (OHDSI)"
+
+    concept_sets = _get_concept_sets(data)
+    inclusion_rules = _get_inclusion_rules(data)
+
+    concept_items = []
+    for concept_set in concept_sets:
+        cs_name = concept_set.get("name") or concept_set.get("Name") or "Concept set"
+        cs_id = concept_set.get("id") or concept_set.get("Id")
+        count = _count_concepts(concept_set)
+        details = []
+        if cs_id is not None:
+            details.append(f"id {cs_id}")
+        if count is not None:
+            details.append(f"{count} concept(s)")
+        description_bits = ", ".join(details) if details else None
+        concept_items.append({
+            "preferred_term": f"Concept set: {cs_name}",
+            "description": description_bits,
+        })
+
+    criteria_sets = []
+    if concept_items:
+        criteria_sets.append({
+            "name": "Primary criteria",
+            "description": "Concept sets used to define the cohort entry event.",
+            "inclusion_criteria": concept_items,
+        })
+
+    for rule in inclusion_rules:
+        rule_name = rule.get("name") or rule.get("Name") or "Inclusion rule"
+        criteria_sets.append({
+            "name": rule_name,
+            "description": "Inclusion rule from the cohort definition.",
+        })
+
+    definition = {
+        "name": name,
+        "definition_type": "PHENOTYPE_ALGORITHM",
+        "description": description,
+        "scope": scope,
+        "criteria_sets": criteria_sets,
+    }
+
+    return _prune(definition)
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Convert an OHDSI/ATLAS cohort JSON to a dismech definition fragment."
+    )
+    parser.add_argument("json_path", type=Path, help="Path to cohort JSON exported from ATLAS/WebAPI")
+    parser.add_argument("--name", help="Override definition name")
+    parser.add_argument("--description", help="Override definition description")
+    parser.add_argument("--scope", help="Override scope (default: OMOP CDM (OHDSI))")
+    parser.add_argument(
+        "--wrap",
+        action="store_true",
+        help="Wrap output in a top-level 'definitions' key",
+    )
+
+    args = parser.parse_args()
+    data = json.loads(args.json_path.read_text())
+    definition = build_definition(data, args)
+
+    if args.wrap:
+        payload = {"definitions": [definition]}
+    else:
+        payload = [definition]
+
+    yaml.safe_dump(payload, sys.stdout, sort_keys=False)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,9 @@
+---
+name: disease-classificatin
+description: >
+  Skill for populating the `classifications` top level portion of a dismech entry
+---
+
+# Adding classifications
+
+Classifications should be added carefully to 
@@ -0,0 +1,123 @@
+---
+name: disease-trajectories
+description: Mine Disease Trajectories (DT/DisTraj) outputs for comorbidity/trajectory candidates, including parsing DT JSON/TSV, extracting directed pairs, filtering by sex or significance, and mapping signals into dismech comorbidity YAML.
+---
+
+# Disease Trajectories Mining
+
+Use this skill when you need to mine DT (Disease Trajectories / DisTraj) artifacts and convert them into dismech comorbidity entries.
+
+## Quick start
+
+1) Locate a DT JSON file (often includes a `phase_dict` or edge list).
+2) Extract normalized edges with the script below.
+3) Pick candidate pairs and map to comorbidity YAML signals.
+
+Example:
+
+```
+python .claude/skills/disease-trajectories/scripts/dt_extract_edges.py path/to/dt.json --format tsv > /tmp/dt_edges.tsv
+```
+
+## Workflow
+
+### 1) Locate DT artifacts
+
+- Search for candidate files:
+  - `rg --files -g "*.json"` and look for names like `phase_dict`, `trajectories`, `edges`.
+- If the DT data is external, download and keep the raw file in a scratch location (do not edit in place).
+
+### 2) Inspect schema quickly
+
+Use a quick introspection to identify top-level keys:
+
+```
+python - <<'PY'
+import json
+from pathlib import Path
+p = Path("path/to/dt.json")
+obj = json.loads(p.read_text())
+print(type(obj))
+if isinstance(obj, dict):
+    print(list(obj.keys())[:20])
+PY
+```
+
+If there is a `phase_dict` mapping, it usually encodes pair keys like `ICD_A-ICD_B` and may include sex stratification.
+If there is an `edges`/`pairs` list, inspect the field names for A/B, sex, and directionality.
+
+### 3) Extract normalized edges
+
+Use the bundled script:
+
+```
+python .claude/skills/disease-trajectories/scripts/dt_extract_edges.py path/to/dt.json --format tsv > /tmp/dt_edges.tsv
+```
+
+What the script does:
+- Handles `phase_dict` mappings with pair keys like `E12-L28`.
+- Handles edge lists under `edges`, `links`, `pairs`, `data`, or `trajectories`.
+- Normalizes fields to a consistent row format with `disease_a_id`, `disease_b_id`, directionality metrics, sex, p-value, FDR, and source path.
+
+### 4) Filter candidate pairs
+
+Use standard tools on the TSV output (examples):
+
+- Filter for a specific ICD pair:
+  - `rg "^E12\tL28\t" /tmp/dt_edges.tsv`
+- Filter by directionality:
+  - `awk -F '\t' 'NR==1 || $11=="A_BEFORE_B"' /tmp/dt_edges.tsv`
+- Filter by sex:
+  - `awk -F '\t' 'NR==1 || $3=="male"' /tmp/dt_edges.tsv`
+
+### 5) Map to dismech comorbidity YAML
+
+Create or update a comorbidity file under `kb/comorbidities/`.
+
+Minimum signal mapping:
+
+- `source: DISEASE_TRAJECTORIES`
+- `method: EHR_TEMPORAL_COMORBIDITY`
+- `signal_disorder_a_id`: ICD code from DT
+- `signal_disorder_b_id`: ICD code from DT
+- `directionality`: map from DT (A_BEFORE_B / B_BEFORE_A / SAME_TIME / UNKNOWN)
+- `a_before_b`, `b_before_a`, `same_time`: preserve DT proportions if provided
+- `demographics.sex`: set if DT is stratified
+- `mapping_notes`: explain any ICD to dismech mapping or grouping
+
+Example snippet:
+
+```
+association_signals:
+  - source: DISEASE_TRAJECTORIES
+    method: EHR_TEMPORAL_COMORBIDITY
+    signal_disorder_a_id: ICD10:E12
+    signal_disorder_b_id: ICD10:L28
+    demographics:
+      sex: MALE
+    directionality: A_BEFORE_B
+    a_before_b: 1.0
+    b_before_a: 0.0
+    same_time: 0.0
+```
+
+### 6) Validate
+
+Run:
+
+```
+just validate-comorbidity kb/comorbidities/<file>.yaml
+```
+
+## Scripts
+
+- `scripts/dt_extract_edges.py`
+  - Input: DT JSON
+  - Output: TSV/CSV/JSONL with normalized edge fields
+  - Use when the DT format is unknown or mixed
+
+## Notes and cautions
+
+- Do not assume DT directionality is causal. Preserve `A_before_B`, `B_before_A`, and `same_time` metrics as reported.
+- If a DT pair uses grouped ICD codes (e.g., L28), record the grouping in `mapping_notes`.
+- Keep DT signals separate from literature signals; they can coexist under `association_signals`.