-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi,
I'm trying to test the spacedust to figure out the conserved gene clusters between the two example genomes.
- Creating databases
spacedust createsetdb listOfFastaFiles.tsv setDB tmpFolder --gff-dir examples/gff.txt --gff-type CDS
the listOfFastaFiles.tsv is:
examples/uvig_120081.fna
examples/uvig_255655.fna
-
Convert to structure sequence DB (the reference FoldseekDB
Alphafold/UniProthas been downloaded in~/database/FoldSeek/UniProt/and named asafdb.
spacedust aa2foldseek setDB ~/database/FoldSeek/UniProt/afdb tmpFolder
Here I got two databases,setDB_foldseekandsetDB_unmapped.
Q: I will analyze some virus genomes later, so full Foldseek structure searches against precomputed structures probably is a better choice than ProstT5? -
Search querySetDB against targetSetDB (using Foldseek and MMseqs)
spacedust clustersearch setDB setDB result.tsv tmpFolder --search-mode 1 --num-iterations 2 -
I got the result.tsv file here.
I am not sure whether I have run the tool correctly. I am also confused by the results, as I would expect to observe some conserved gene clusters between the two example genomes.
Q: Besides, what if I have many genomes and want to identify the conserved gene clusters between any of the genomes?
Thanks!
Best wishes!