Skip to content

k-mers and minimum recommended sequence length #806

@Ana-Mihaela-Lupan

Description

@Ana-Mihaela-Lupan

Hi!

I am trying to learn what is the minimum amino acid sequence length to run a ColabFold prediction using MMSeq2 for the MSA. I have seen responses that say that MMseqs2 uses k-mer of 6, therefore the minimum length of a hit can be 13 for the gapped and 12 for the ungapped. Also, I've seen responses saying that MMseq2 struggles with sequences <20 aminoacids.

Could you please be so kind an clear out this for me and let me know what are the steps/settings/arguments that specify those limits? For example, does ColabFold when using MMseq2 to run MSAs on the server / locally after constructing the databases use a default k-mer and what is the default k-mer? I am sorry if this question is trivial, but I was not able to find a clear specification.

Are there any tests showing performance over length or it based on experience?

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions