Skip deduplication for libraries with UMI

We have some DNase Hi-C data being produced that has UMI information to identify PCR duplicates. My intention would be to deduplicate the libraries using the FastQ files then submitting those duplicate-free FastQ to distiller.

Is there a method that you suggest using to preprocess these data with distiller? There doesn't look to be a straightforward option but I had a look at the DSL1 Nextflow script and thought that duplicating the `merge_split` process to avoid the deduplication step and create empty files for the expected duplicate-relevant files may work? The choice of process can then be controlled by `--params.skip_dedup` in a `when` directive.

I gave it a go and it seemed to work but I am worried that I will have missed something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip deduplication for libraries with UMI #188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Skip deduplication for libraries with UMI #188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions