Skip to content

Commit 1444a1b

Browse files
authored
Merge pull request #230 from monarch-initiative/classifications-and-comorbidity
classifications and comorbidity
2 parents 9b12f89 + 6e19eb1 commit 1444a1b

File tree

555 files changed

+71480
-6054
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

555 files changed

+71480
-6054
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
name: create-definitions-from-ohdsi
3+
description: Generate dismech definitions from OHDSI/ATLAS cohort definitions or other computable phenotype logic. Use when converting OMOP cohort JSON, drafting PheKB-style phenotype algorithms, or mapping FHIR/CQL/OMOP rules into dismech `definitions` blocks.
4+
---
5+
6+
# Create Definitions From OHDSI
7+
8+
Use this skill to convert OHDSI/ATLAS cohort definitions into dismech `definitions` blocks and to map FHIR/CQL logic into the same structure.
9+
10+
## Quick start
11+
12+
1. Export an ATLAS/WebAPI cohort definition JSON.
13+
2. Generate a YAML fragment:
14+
15+
```bash
16+
uv run python .claude/skills/create-definitions-from-ohdsi/scripts/ohdsi_cohort_to_definition.py /path/to/cohort.json --wrap
17+
```
18+
19+
3. Paste the fragment into the target disorder file under `definitions`.
20+
4. Normalize to dismech norms (add evidence snippets, scope, criteria set names, and any available term objects).
21+
5. Validate:
22+
23+
```bash
24+
just validate kb/disorders/<Disease>.yaml
25+
```
26+
27+
## Workflow guardrails
28+
29+
- Keep logic concise: express cohort entry, inclusion rules, and exit criteria in `criteria_sets`.
30+
- Use `minimum_required` for numeric logic; put temporal logic in `description`.
31+
- Add evidence snippets from abstracts when the algorithm is derived from a publication.
32+
- Only add `term` objects when the CURIE is in a configured prefix (ICD10CM, NCIT, HP, etc.).
33+
34+
## References
35+
36+
- Mapping guide: `references/model-mapping.md` (FHIR/OHDSI/CQL to dismech)
37+
38+
## Scripts
39+
40+
- `scripts/ohdsi_cohort_to_definition.py`: Convert ATLAS/WebAPI cohort JSON to a dismech definition fragment.
41+
- Use `--wrap` to emit a top-level `definitions` key.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Model Mapping Guide (FHIR / OHDSI / CQL -> dismech)
2+
3+
Use this guide to translate computable phenotype logic into dismech `definitions`.
4+
5+
## dismech target structure
6+
7+
- `definitions[]`: Top-level phenotype definition or diagnostic criteria set.
8+
- `criteria_sets[]`: Named sub-blocks (primary criteria, inclusion rules, confirmation).
9+
- `inclusion_criteria[]`, `exclusion_criteria[]`, `core_clinical_characteristics[]`,
10+
`imaging_requirements[]`, `laboratory_requirements[]`, `additional_requirements[]`.
11+
- `minimum_required`: Use for count thresholds (>=2 events, >=1 lab, etc.).
12+
- `notes`: Use for cohort exit and temporal logic if no dedicated slot.
13+
14+
## OHDSI / OMOP cohort definition
15+
16+
- **ConceptSet** -> `CriteriaItem` under `inclusion_criteria`.
17+
- `preferred_term`: "Concept set: <name>".
18+
- `description`: concept count, domain, and any constraints.
19+
- **PrimaryCriteria** (entry events) -> `criteria_sets` entry named "Primary criteria".
20+
- **InclusionRule** -> separate `criteria_sets` entry (one per rule).
21+
- **CorrelatedCriteria** -> `additional_requirements` or `imaging_requirements`.
22+
- **Cohort exit** -> `notes` or a `CriteriaItem` in `additional_requirements`.
23+
- **Temporal windows** (e.g., 31-365 days) -> `description` on the relevant `CriteriaItem`.
24+
25+
Concept IDs:
26+
- If a concept maps to a configured prefix (ICD10CM, NCIT, HP), add a `term` object.
27+
- Otherwise keep the code in `description` and the label in `preferred_term`.
28+
29+
## FHIR phenotype logic
30+
31+
- **Condition** -> `inclusion_criteria` item.
32+
- Use `term` if ICD-10-CM or other configured prefix.
33+
- **Observation** (lab/serology) -> `laboratory_requirements`.
34+
- **Procedure / Imaging** -> `imaging_requirements` or `additional_requirements`.
35+
- **MedicationRequest / MedicationAdministration** -> `additional_requirements`.
36+
- **Encounter** (setting) -> `scope` or `notes` on the definition.
37+
- **ValueSet** -> treat as a concept set (same mapping as OHDSI).
38+
39+
## CQL phenotype logic
40+
41+
- **define** statements -> `criteria_sets` or `CriteriaItem` entries.
42+
- **exists / count** -> `minimum_required` or describe in the criterion.
43+
- **temporal operators** (during, overlaps, before/after) -> describe in the criterion.
44+
- **negation** -> `exclusion_criteria`.
45+
46+
## Notes
47+
48+
- Keep the algorithm in plain language; do not copy vendor-specific syntax.
49+
- Evidence: attach at least one EvidenceItem if the algorithm is drawn from a paper.
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Convert an OHDSI/ATLAS cohort definition JSON into a dismech definitions YAML fragment.
4+
"""
5+
6+
from __future__ import annotations
7+
8+
import argparse
9+
import json
10+
import sys
11+
from pathlib import Path
12+
from typing import Any
13+
14+
import yaml
15+
16+
17+
def _first_value(data: dict, keys: tuple[str, ...]) -> Any:
18+
for key in keys:
19+
if key in data:
20+
return data[key]
21+
return None
22+
23+
24+
def _find_container(data: dict, keys: tuple[str, ...]) -> dict:
25+
for container in (data, data.get("expression", {}), data.get("Expression", {})):
26+
if isinstance(container, dict) and any(k in container for k in keys):
27+
return container
28+
return {}
29+
30+
31+
def _get_concept_sets(data: dict) -> list[dict]:
32+
container = _find_container(data, ("conceptSets", "ConceptSets", "concept_sets"))
33+
for key in ("conceptSets", "ConceptSets", "concept_sets"):
34+
value = container.get(key)
35+
if isinstance(value, list):
36+
return value
37+
return []
38+
39+
40+
def _get_inclusion_rules(data: dict) -> list[dict]:
41+
container = _find_container(data, ("inclusionRules", "InclusionRules"))
42+
for key in ("inclusionRules", "InclusionRules"):
43+
value = container.get(key)
44+
if isinstance(value, list):
45+
return value
46+
return []
47+
48+
49+
def _count_concepts(concept_set: dict) -> int | None:
50+
expression = concept_set.get("expression")
51+
if not isinstance(expression, dict):
52+
return None
53+
items = expression.get("items")
54+
if isinstance(items, list):
55+
return len(items)
56+
return None
57+
58+
59+
def _prune(value: Any) -> Any:
60+
if isinstance(value, dict):
61+
pruned = {k: _prune(v) for k, v in value.items()}
62+
return {k: v for k, v in pruned.items() if v not in (None, [], {})}
63+
if isinstance(value, list):
64+
pruned_list = [_prune(v) for v in value]
65+
return [v for v in pruned_list if v not in (None, [], {})]
66+
return value
67+
68+
69+
def build_definition(data: dict, args: argparse.Namespace) -> dict:
70+
name = args.name or _first_value(data, ("name", "Name")) or "OHDSI cohort definition"
71+
description = args.description or _first_value(data, ("description", "Description"))
72+
scope = args.scope or "OMOP CDM (OHDSI)"
73+
74+
concept_sets = _get_concept_sets(data)
75+
inclusion_rules = _get_inclusion_rules(data)
76+
77+
concept_items = []
78+
for concept_set in concept_sets:
79+
cs_name = concept_set.get("name") or concept_set.get("Name") or "Concept set"
80+
cs_id = concept_set.get("id") or concept_set.get("Id")
81+
count = _count_concepts(concept_set)
82+
details = []
83+
if cs_id is not None:
84+
details.append(f"id {cs_id}")
85+
if count is not None:
86+
details.append(f"{count} concept(s)")
87+
description_bits = ", ".join(details) if details else None
88+
concept_items.append({
89+
"preferred_term": f"Concept set: {cs_name}",
90+
"description": description_bits,
91+
})
92+
93+
criteria_sets = []
94+
if concept_items:
95+
criteria_sets.append({
96+
"name": "Primary criteria",
97+
"description": "Concept sets used to define the cohort entry event.",
98+
"inclusion_criteria": concept_items,
99+
})
100+
101+
for rule in inclusion_rules:
102+
rule_name = rule.get("name") or rule.get("Name") or "Inclusion rule"
103+
criteria_sets.append({
104+
"name": rule_name,
105+
"description": "Inclusion rule from the cohort definition.",
106+
})
107+
108+
definition = {
109+
"name": name,
110+
"definition_type": "PHENOTYPE_ALGORITHM",
111+
"description": description,
112+
"scope": scope,
113+
"criteria_sets": criteria_sets,
114+
}
115+
116+
return _prune(definition)
117+
118+
119+
def main() -> int:
120+
parser = argparse.ArgumentParser(
121+
description="Convert an OHDSI/ATLAS cohort JSON to a dismech definition fragment."
122+
)
123+
parser.add_argument("json_path", type=Path, help="Path to cohort JSON exported from ATLAS/WebAPI")
124+
parser.add_argument("--name", help="Override definition name")
125+
parser.add_argument("--description", help="Override definition description")
126+
parser.add_argument("--scope", help="Override scope (default: OMOP CDM (OHDSI))")
127+
parser.add_argument(
128+
"--wrap",
129+
action="store_true",
130+
help="Wrap output in a top-level 'definitions' key",
131+
)
132+
133+
args = parser.parse_args()
134+
data = json.loads(args.json_path.read_text())
135+
definition = build_definition(data, args)
136+
137+
if args.wrap:
138+
payload = {"definitions": [definition]}
139+
else:
140+
payload = [definition]
141+
142+
yaml.safe_dump(payload, sys.stdout, sort_keys=False)
143+
return 0
144+
145+
146+
if __name__ == "__main__":
147+
raise SystemExit(main())
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
name: disease-classificatin
3+
description: >
4+
Skill for populating the `classifications` top level portion of a dismech entry
5+
---
6+
7+
# Adding classifications
8+
9+
Classifications should be added carefully to
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
name: disease-trajectories
3+
description: Mine Disease Trajectories (DT/DisTraj) outputs for comorbidity/trajectory candidates, including parsing DT JSON/TSV, extracting directed pairs, filtering by sex or significance, and mapping signals into dismech comorbidity YAML.
4+
---
5+
6+
# Disease Trajectories Mining
7+
8+
Use this skill when you need to mine DT (Disease Trajectories / DisTraj) artifacts and convert them into dismech comorbidity entries.
9+
10+
## Quick start
11+
12+
1) Locate a DT JSON file (often includes a `phase_dict` or edge list).
13+
2) Extract normalized edges with the script below.
14+
3) Pick candidate pairs and map to comorbidity YAML signals.
15+
16+
Example:
17+
18+
```
19+
python .claude/skills/disease-trajectories/scripts/dt_extract_edges.py path/to/dt.json --format tsv > /tmp/dt_edges.tsv
20+
```
21+
22+
## Workflow
23+
24+
### 1) Locate DT artifacts
25+
26+
- Search for candidate files:
27+
- `rg --files -g "*.json"` and look for names like `phase_dict`, `trajectories`, `edges`.
28+
- If the DT data is external, download and keep the raw file in a scratch location (do not edit in place).
29+
30+
### 2) Inspect schema quickly
31+
32+
Use a quick introspection to identify top-level keys:
33+
34+
```
35+
python - <<'PY'
36+
import json
37+
from pathlib import Path
38+
p = Path("path/to/dt.json")
39+
obj = json.loads(p.read_text())
40+
print(type(obj))
41+
if isinstance(obj, dict):
42+
print(list(obj.keys())[:20])
43+
PY
44+
```
45+
46+
If there is a `phase_dict` mapping, it usually encodes pair keys like `ICD_A-ICD_B` and may include sex stratification.
47+
If there is an `edges`/`pairs` list, inspect the field names for A/B, sex, and directionality.
48+
49+
### 3) Extract normalized edges
50+
51+
Use the bundled script:
52+
53+
```
54+
python .claude/skills/disease-trajectories/scripts/dt_extract_edges.py path/to/dt.json --format tsv > /tmp/dt_edges.tsv
55+
```
56+
57+
What the script does:
58+
- Handles `phase_dict` mappings with pair keys like `E12-L28`.
59+
- Handles edge lists under `edges`, `links`, `pairs`, `data`, or `trajectories`.
60+
- Normalizes fields to a consistent row format with `disease_a_id`, `disease_b_id`, directionality metrics, sex, p-value, FDR, and source path.
61+
62+
### 4) Filter candidate pairs
63+
64+
Use standard tools on the TSV output (examples):
65+
66+
- Filter for a specific ICD pair:
67+
- `rg "^E12\tL28\t" /tmp/dt_edges.tsv`
68+
- Filter by directionality:
69+
- `awk -F '\t' 'NR==1 || $11=="A_BEFORE_B"' /tmp/dt_edges.tsv`
70+
- Filter by sex:
71+
- `awk -F '\t' 'NR==1 || $3=="male"' /tmp/dt_edges.tsv`
72+
73+
### 5) Map to dismech comorbidity YAML
74+
75+
Create or update a comorbidity file under `kb/comorbidities/`.
76+
77+
Minimum signal mapping:
78+
79+
- `source: DISEASE_TRAJECTORIES`
80+
- `method: EHR_TEMPORAL_COMORBIDITY`
81+
- `signal_disorder_a_id`: ICD code from DT
82+
- `signal_disorder_b_id`: ICD code from DT
83+
- `directionality`: map from DT (A_BEFORE_B / B_BEFORE_A / SAME_TIME / UNKNOWN)
84+
- `a_before_b`, `b_before_a`, `same_time`: preserve DT proportions if provided
85+
- `demographics.sex`: set if DT is stratified
86+
- `mapping_notes`: explain any ICD to dismech mapping or grouping
87+
88+
Example snippet:
89+
90+
```
91+
association_signals:
92+
- source: DISEASE_TRAJECTORIES
93+
method: EHR_TEMPORAL_COMORBIDITY
94+
signal_disorder_a_id: ICD10:E12
95+
signal_disorder_b_id: ICD10:L28
96+
demographics:
97+
sex: MALE
98+
directionality: A_BEFORE_B
99+
a_before_b: 1.0
100+
b_before_a: 0.0
101+
same_time: 0.0
102+
```
103+
104+
### 6) Validate
105+
106+
Run:
107+
108+
```
109+
just validate-comorbidity kb/comorbidities/<file>.yaml
110+
```
111+
112+
## Scripts
113+
114+
- `scripts/dt_extract_edges.py`
115+
- Input: DT JSON
116+
- Output: TSV/CSV/JSONL with normalized edge fields
117+
- Use when the DT format is unknown or mixed
118+
119+
## Notes and cautions
120+
121+
- Do not assume DT directionality is causal. Preserve `A_before_B`, `B_before_A`, and `same_time` metrics as reported.
122+
- If a DT pair uses grouped ICD codes (e.g., L28), record the grouping in `mapping_notes`.
123+
- Keep DT signals separate from literature signals; they can coexist under `association_signals`.

0 commit comments

Comments
 (0)