Skip to content

Skip deduplication for libraries with UMI #188

@ChristopherBarrington

Description

@ChristopherBarrington

We have some DNase Hi-C data being produced that has UMI information to identify PCR duplicates. My intention would be to deduplicate the libraries using the FastQ files then submitting those duplicate-free FastQ to distiller.

Is there a method that you suggest using to preprocess these data with distiller? There doesn't look to be a straightforward option but I had a look at the DSL1 Nextflow script and thought that duplicating the merge_split process to avoid the deduplication step and create empty files for the expected duplicate-relevant files may work? The choice of process can then be controlled by --params.skip_dedup in a when directive.

I gave it a go and it seemed to work but I am worried that I will have missed something.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions