Skip to content

No datafile could be found for /Database/FoldSeekDB/PDB100_member_to_set! #13

@neptuneyt

Description

@neptuneyt

Dear developer
Thanks for your such amazing work. Due to firewall restrictions, I cannot download directly using the commands provided by the software. Therefore, I attempted to manually download and extract the files via a VPN(https://foldseek.steineggerlab.workers.dev/pdb100.tar.gz ), and the md5sum check was ok. But when I clustersearch my faa_queryDB to it, met the error No datafile could be found for /Database/FoldSeekDB/PDB100_member_to_set! as bleow.
Additionally, it would be a great pity if such an excellent piece of work becomes unavailable due to network issues. Could you please provide an alternative data link for users in China, such as one hosted on https://zenodo.org/ (this platform offers 50GB of free storage space; for data exceeding 50GB, it may need to be split into smaller parts)? In any case, thank you again for developing such outstanding software, and I look forward to your reply.

  • PDB database
(foldseek) [yut@io02 FoldSeekDB]$ ls -F PDB100*
PDB100             PDB100_h         PDB100_seq_ca.0@      PDB100_seq_h.index    PDB100_seq_ss.index
PDB100_ca          PDB100_h.dbtype  PDB100_seq_ca.1       PDB100_seq.index      PDB100_seq_taxonomy@
PDB100_ca.dbtype   PDB100_h.index   PDB100_seq_ca.dbtype  PDB100_seq.lookup@    PDB100.source
PDB100_ca.index    PDB100.index     PDB100_seq_ca.index   PDB100_seq_mapping@   PDB100_ss
PDB100_clu         PDB100.lookup    PDB100_seq.dbtype     PDB100_seq.source@    PDB100_ss.dbtype
PDB100_clu.dbtype  PDB100_mapping   PDB100_seq_h.0@       PDB100_seq_ss.0@      PDB100_ss.index
PDB100_clu.index   PDB100_seq.0@    PDB100_seq_h.1        PDB100_seq_ss.1       PDB100_taxonomy
PDB100.dbtype      PDB100_seq.1     PDB100_seq_h.dbtype   PDB100_seq_ss.dbtype  PDB100.version
  • error log
[spacedust]$ spacedust clustersearch queryDB /Database/FoldSeekDB/PDB100 spacedust_result.tsv tmpFolder

MMseqs Version:                         2.e56c505
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Add backtrace                           true
Alignment mode                          2
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       10
Seq. id. threshold                      0
Min alignment length                    30
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      0.8
Coverage mode                           2
Max sequence length                     65535
Compositional bias                      1
Compositional bias                      1
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Threads                                 128
Compressed                              0
Verbosity                               3
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             5.7
k-mer length                            0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max results per query                   300
Split database                          0
Split mode                              2
Split memory limit                      0
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Minimum diagonal score                  15
Selected taxa
Spaced k-mers                           1
Spaced k-mer pattern
Local temporary path
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Mask profile                            1
Profile E-value threshold               0.001
Global sequence weighting               false
Allow deletions                         false
Filter MSA                              1
Use filter only at N seqs               0
Maximum seq. id. threshold              0.9
Minimum seq. id.                        0.0
Minimum score per column                -20
Minimum coverage                        0
Select N most diverse seqs              1000
Pseudo count mode                       0
Gap pseudo count                        10
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1,2,3
Reverse frames                          1,2,3
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Add orf stop                            false
Overlap between sequences               0
Sequence split mode                     1
Header split mode                       0
Chain overlapping alignments            0
Merge query                             1
Search type                             0
Search iterations                       1
Start sensitivity                       4
Search steps                            1
Exhaustive search mode                  false
Filter results during exhaustive search 0
Strand selection                        1
LCA search mode                         false
Disk space limit                        0
MPI runner
Force restart with latest tmp           false
Remove temporary files                  false
Use simple best hit                     true
Include sub-optimal hits with factor    0
Alpha                                   1
Aggregation mode                        0
Filter self match                       false
Multihit P-value cutoff                 0.01
Clustering and Ordering P-value cutoff  0.01
Maximum gene gaps                       3
Minimal cluster size                    2
Cluster weighting factor                false
Database output                         true
Cluster search against profiles         false
Cluster Search Mode                     0
Path to Foldseek                        /Software/Miniconda3/envs/spacedust/bin/foldseek

besthitbyset queryDB /Database/FoldSeekDB/PDB100 tmpFolder/966050721878520555/result_prefixed tmpFolder/966050721878520555/aggregate --simple-best-hit 1 --suboptimal-hits 0 --threads 128 --compressed 0 -v 3

No datafile could be found for /Database/FoldSeekDB/PDB100_member_to_set!
Error: aggregate best hit failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions