-
Notifications
You must be signed in to change notification settings - Fork 695
Description
Hi!
I am trying to learn what is the minimum amino acid sequence length to run a ColabFold prediction using MMSeq2 for the MSA. I have seen responses that say that MMseqs2 uses k-mer of 6, therefore the minimum length of a hit can be 13 for the gapped and 12 for the ungapped. Also, I've seen responses saying that MMseq2 struggles with sequences <20 aminoacids.
Could you please be so kind an clear out this for me and let me know what are the steps/settings/arguments that specify those limits? For example, does ColabFold when using MMseq2 to run MSAs on the server / locally after constructing the databases use a default k-mer and what is the default k-mer? I am sorry if this question is trivial, but I was not able to find a clear specification.
Are there any tests showing performance over length or it based on experience?
Thank you very much!