Further speedup in iVar Trim primer finding

**Is your feature request related to a problem? Please describe.**
I noticed that iVar Trim was recently optimized quite a bit (awesome stuff!), and I noticed that (assuming I'm understanding the release descriptions properly) one of the optimizations is that a faster sorting implementation was incorporated to sort primers, and if I recall correctly, the primers need to be sorted to enable binary search to find the end of a primer that spans a certain position (for primer trimming). However, you can actually avoid binary search entirely, which could dramatically speed things up

**Describe the solution you'd like**
Because viral genomes are so short, at the beginning of iVar Trim's execution, you can precompute an array where index `i` contains the end position of the primer covering position `i` of the reference genome. Then, while trimming a given read that mapped to position `i`, you can perform a constant-time lookup (rather than binary search) to find the primer end position. A general summary of the preprocessing algorithm is as follows:

```
primer_end = array of length genome size
primers = sorted list of (start, end) tuples
primers_ind = 0
curr_primers = empty queue
for i from 0 to genome size:
    while curr_primers.front()[1] < i:
        curr_primers.dequeue()
    while primers[primers_ind][0] >= i:
        curr_primers.enqueue(primers[primers_ind])
        primers_ind = primers_ind + 1
    primer_end[i] = curr_primers.back()[1]
```

This slightly overly complicated approach allows overlapping primers, but if you can assume primers do not overlap:

```
primer_end = array of length genome size
primers = sorted list of (start, end) tuples
for start, end in primers:
    for i from start to end:
        primer_end[i] = end
```

Then, in your code, if a read maps to position `i`, you primer trim until the base that mapped to position `primer_end[i]`

**Describe alternatives you've considered**
N/A

**Additional context**
@kga1978 told me to make a GitHub Issue; happy to chat with you folks about this approach if you're interested 😄 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further speedup in iVar Trim primer finding #162

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Further speedup in iVar Trim primer finding #162

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions