-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Dear developer
Thanks for your such amazing work. Due to firewall restrictions, I cannot download directly using the commands provided by the software. Therefore, I attempted to manually download and extract the files via a VPN(https://foldseek.steineggerlab.workers.dev/pdb100.tar.gz ), and the md5sum check was ok. But when I clustersearch my faa_queryDB to it, met the error No datafile could be found for /Database/FoldSeekDB/PDB100_member_to_set! as bleow.
Additionally, it would be a great pity if such an excellent piece of work becomes unavailable due to network issues. Could you please provide an alternative data link for users in China, such as one hosted on https://zenodo.org/ (this platform offers 50GB of free storage space; for data exceeding 50GB, it may need to be split into smaller parts)? In any case, thank you again for developing such outstanding software, and I look forward to your reply.
- PDB database
(foldseek) [yut@io02 FoldSeekDB]$ ls -F PDB100*
PDB100 PDB100_h PDB100_seq_ca.0@ PDB100_seq_h.index PDB100_seq_ss.index
PDB100_ca PDB100_h.dbtype PDB100_seq_ca.1 PDB100_seq.index PDB100_seq_taxonomy@
PDB100_ca.dbtype PDB100_h.index PDB100_seq_ca.dbtype PDB100_seq.lookup@ PDB100.source
PDB100_ca.index PDB100.index PDB100_seq_ca.index PDB100_seq_mapping@ PDB100_ss
PDB100_clu PDB100.lookup PDB100_seq.dbtype PDB100_seq.source@ PDB100_ss.dbtype
PDB100_clu.dbtype PDB100_mapping PDB100_seq_h.0@ PDB100_seq_ss.0@ PDB100_ss.index
PDB100_clu.index PDB100_seq.0@ PDB100_seq_h.1 PDB100_seq_ss.1 PDB100_taxonomy
PDB100.dbtype PDB100_seq.1 PDB100_seq_h.dbtype PDB100_seq_ss.dbtype PDB100.version- error log
[spacedust]$ spacedust clustersearch queryDB /Database/FoldSeekDB/PDB100 spacedust_result.tsv tmpFolder
MMseqs Version: 2.e56c505
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 10
Seq. id. threshold 0
Min alignment length 30
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 128
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Use simple best hit true
Include sub-optimal hits with factor 0
Alpha 1
Aggregation mode 0
Filter self match false
Multihit P-value cutoff 0.01
Clustering and Ordering P-value cutoff 0.01
Maximum gene gaps 3
Minimal cluster size 2
Cluster weighting factor false
Database output true
Cluster search against profiles false
Cluster Search Mode 0
Path to Foldseek /Software/Miniconda3/envs/spacedust/bin/foldseek
besthitbyset queryDB /Database/FoldSeekDB/PDB100 tmpFolder/966050721878520555/result_prefixed tmpFolder/966050721878520555/aggregate --simple-best-hit 1 --suboptimal-hits 0 --threads 128 --compressed 0 -v 3
No datafile could be found for /Database/FoldSeekDB/PDB100_member_to_set!
Error: aggregate best hit failed