-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Is your feature request related to a problem? Please describe.
I noticed that iVar Trim was recently optimized quite a bit (awesome stuff!), and I noticed that (assuming I'm understanding the release descriptions properly) one of the optimizations is that a faster sorting implementation was incorporated to sort primers, and if I recall correctly, the primers need to be sorted to enable binary search to find the end of a primer that spans a certain position (for primer trimming). However, you can actually avoid binary search entirely, which could dramatically speed things up
Describe the solution you'd like
Because viral genomes are so short, at the beginning of iVar Trim's execution, you can precompute an array where index i contains the end position of the primer covering position i of the reference genome. Then, while trimming a given read that mapped to position i, you can perform a constant-time lookup (rather than binary search) to find the primer end position. A general summary of the preprocessing algorithm is as follows:
primer_end = array of length genome size
primers = sorted list of (start, end) tuples
primers_ind = 0
curr_primers = empty queue
for i from 0 to genome size:
while curr_primers.front()[1] < i:
curr_primers.dequeue()
while primers[primers_ind][0] >= i:
curr_primers.enqueue(primers[primers_ind])
primers_ind = primers_ind + 1
primer_end[i] = curr_primers.back()[1]
This slightly overly complicated approach allows overlapping primers, but if you can assume primers do not overlap:
primer_end = array of length genome size
primers = sorted list of (start, end) tuples
for start, end in primers:
for i from start to end:
primer_end[i] = end
Then, in your code, if a read maps to position i, you primer trim until the base that mapped to position primer_end[i]
Describe alternatives you've considered
N/A
Additional context
@kga1978 told me to make a GitHub Issue; happy to chat with you folks about this approach if you're interested 😄