METRIN-KG

Pipeline for generating the knowledge graph integrating enriched metabolite data originally used for ENPKG, traits data from TRY, and interaction data from GloBI.

Notes

If you want to build the METRIN-KG triples, skip to installation

If you just want build your own instance METRIN-KG SPARQL endpoint, skip to querying METRIN-KG

Pipeline Components

Wikidata Data Acquisition Fetches lineage and taxonomic data for up to 15 taxonomies from Wikidata using SPARQL.
Taxonomy Matching against fetched Wikidata records. Matches taxa from:

GloBI (Global Biotic Interactions)
TRY (Plant Trait Database)

Knowledge Graph Generation Generates RDF triples representing taxonomic alignments and traits for:

GloBI
TRY
EMI-KG (extension of ENPKG)

Installation

Clone the repository

git clone https://github.com/earth-metabolome-initiative/metrin-kg.git

Make sure you have pipenv installed. If not, install it via:

pip install pipenv

The code has been run only with python-3.12, but it may work with other versions of python-3.

Once pipenv is installed, install the dependencies:

pipenv install
pipenv shell

Usage

Download associated accessory data from METRIN-KG zenodo repository and verbatim-interactions.tsv.gz (only) from GloBI zenodo repository.

cd metrin-kg

# download METRIN-KG data
wget https://zenodo.org/records/15689186/files/metrin-kg.tar.gz?download=1
tar -xvf metrin-kg.tar.gz
mv metrin-kg-data data

# download GloBI data
wget https://zenodo.org/records/14640564/files/verbatim-interactions.tsv.gz?download=1
mv verbatim-interactions.tsv.gz data/raw/

# download TRY data
wget https://zenodo.org/records/17079465/files/TRYdb_40340.txt.gz?download=1
mv TRYdb_40340.txt.gz data/raw/

For supported arguments, run:

python main.py --help

Run the pipeline via command-line

python main.py [OPTIONS]

Command-Line Options

Option	Description
`--config`	Path to config file (default: `config.txt`)
`--run-wd-fetcher`	Fetch taxonomy data from Wikidata
`--run-ontology-match`	Match ontologies to GloBI or TRY terms
`--run-globi-match`	Match GloBI dataset with Wikidata taxonomies
`--run-trydb-match`	Match TRY dataset with Wikidata taxonomies
`--run-globi-kg`	Generate RDF Knowledge Graph for GloBI
`--run-trydb-kg`	Generate RDF Knowledge Graph for TRY

Run the full pipeline:

python main.py --run-wd-fetcher --run-globi-match --run-trydb-match --run-globi-kg --run-trydb-kg --config config.txt

Note: This might take a while. If you only want to reproduce the KG, skip to point-8 directly. Note that if you have copied the data from the METRIN-KG zenodo repository, all accessory files are already available.

Run only Wikidata fetcher:

python main.py --run-wd-fetcher --config config.txt

Note: If you just want to reproduce the KG, you don't need to perform this step because the data directory already has the relevant files (if the METRIN-KG zenodo contents are copied correctly).

Run only GloBI/TRY taxonomy matching:

python main.py --run-globi-match --config config.txt

python main.py --run-trydb-match --config config.txt

Note: If you just want to reproduce the KG, you don't need to perform this step because the data directory already has the relevant files (if the METRIN-KG zenodo contents are copied correctly).

Run only ontology matching

This can be done for any of the datasets from GloBI (body part, life stages, and biological sex) and TRY (unit names). Specify the input and output files under [ontology] header in config.txt

python main.py --run-ontology-match --config config.txt

Note: If you just want to reproduce the KG, you don't need to perform this step because the data directory already has the relevant files (if the METRIN-KG zenodo contents are copied correctly).

Generate knowledge graph - GloBI/TRY:

python main.py --run-globi-kg --config config.txt

python main.py --run-trydb-kg --config config.txt

Notes:

For generating the sub knowledge graph of metabolites, follow the instructions here

If you skip --run-wd-fetcher, make sure that the wd_* paths in config.txt point to valid, existing files. Each part of the pipeline can be run independently.

Outputs a) Fetched taxonomy files from Wikidata (*.json) b) Matched taxa files for GloBI and TRY (*.tsv) c) RDF files representing the final knowledge graphs (*.ttl, *.rdf, etc.)

Querying METRIN-KG

For querying METRIN-KG, you can use two methods:

a) the Qlever powered end-point hosted on earth-metabolome-initiative.org.

Want to generate your own instance of METRIN-KG SPARQL endpoint?

Follow the instructions on qlever-control and our fork of qlever-ui to install Qlever. You can find the qlever config file used to index METRIN-KG. Follow the commands below to generate your own instance of METRIN-KG on localhost.

qlever --qleverfile Qlever.metrin_kg get-data  # download full METRIN-KG graph
qlever --qleverfile Qlever.metrin_kg index --overwrite-existing --parallel-parsing false  # index KG
qlever --qleverfile Qlever.metrin_kg start  # start the server on local host

Once Qlever index is generated and the server started, you can query the endpoint using qlever-ui on your localhost. Once you are done querying METRIN-KG, don't forget to stop the server

qlever --qleverfile Qlever.metrin_kg stop

Notes:

Note that you will need Docker for running qlever. On Linux Docker runs natively and takes up only a small amount of RAM, whereas, on macOS, Docker runs in a virtual machine and thus, takes significant RAM. Therefore, on macOS, qlever index may fail sometimes, thus requiring more moemory./home/drishti/.local/bin

For indexing the METRIN-KG data (qlever index), atleast 31 GB RAM will be required - works on Linux, may require more on macOS.

The shell commands for qlever get-data inside the config file have been adapted for Ubuntu's terminal and macOS's iTerm2 default settings.

qlever get-data command will only download the triple (ttl.gz or ttl) and not the raw data used to generate the triples. For downloading the full METRIN-KG dataset including the raw data and the triples, please refer to Usage point-1.

b) the sparql-editor powered endpoint

This endpoint also provides direct access to class-overview (find the icon at the top-left corner). It also provides a way to suggest example queries to be accepted in the METRIN-KG examples set (find the icon 💾 at the top-left corner).

Note that for some queries, this endpoint might give a The quota has exceeded error. We are trying to resolve it. Updates soon...

Class-overview

For visualization of class overview and data schema, visit the sparql-editor powered endpoint and click on the class overview icon at the top-left corner of the page.

You can also open sparql_editor_metrin-kg.html in a browser and visualize the class-overview. For instructions on how to generate this file, refer to following github repos: sparql-editor, sparql-examples, and our fork of void-generator.

Contribute and Contact

Have a look at METRIN-KG wiki for how-to-use and how-to-contribute-to METRIN-KG.

For bugs, questions, or contributions, please open an issue or submit a pull request.

Citation

If you use METRIN-KG in your work, please cite

METRIN-KG: A knowledge graph integrating plant metabolites, traits and biotic interactions Disha Tandon, Tarcisio Mendes De Farias, Pierre-Marie Allard, Emmanuel Defossez bioRxiv 2025.08.20.671289; doi: https://doi.org/10.1101/2025.08.20.671289

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
case-studies		case-studies
metrics		metrics
metrin-kg-data/raw		metrin-kg-data/raw
ontology/data		ontology/data
src		src
turtle_custom		turtle_custom
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Pipfile		Pipfile
Qlever.metrin_kg		Qlever.metrin_kg
README.md		README.md
config.txt		config.txt
main.py		main.py
sparql_editor_index_metrin-kg.html		sparql_editor_index_metrin-kg.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

METRIN-KG

Pipeline Components

Installation

Usage

Querying METRIN-KG

a) the Qlever powered end-point hosted on earth-metabolome-initiative.org.

b) the sparql-editor powered endpoint

Class-overview

Contribute and Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

earth-metabolome-initiative/metrin-kg

Folders and files

Latest commit

History

Repository files navigation

METRIN-KG

Pipeline Components

Installation

Usage

Querying METRIN-KG

a) the Qlever powered end-point hosted on earth-metabolome-initiative.org.

b) the sparql-editor powered endpoint

Class-overview

Contribute and Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages