Skip to content

Some reads are almost entirely soft-clipped #477

@marcelm

Description

@marcelm

Strobealign sometimes produces alignments where only a couple of bases are aligned and the rest is soft clipped. For example, running the tests/compare-baseline.sh script with option -s (single ends) currently produces the file baseline/bam/acc4cffe5ac2c4db266c58d00b7b6462c6b4189c.se.bam, where read SRR6055476.83000 is mapped with CIGAR 2M149S and mapping quality 60, which doesn’t make sense. This particular case seems to be caused by a false positive hit.

It would be better to mark such extremely short hits as unmapped. The question is which minimum number of aligned bases we require. It should definitely be at least $k$ but could be higher, perhaps $2k$?

In this dataset, there are 40 alignments shorter than $k=20$ and 664 shorter than $2k=40$.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions