Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
35da74a
Merge pull request #25 from nf-core/dev
atrull314 Oct 7, 2024
7e8db60
Merge pull request #39 from nf-core/dev
atrull314 Oct 11, 2024
500954e
Merge pull request #42 from nf-core/dev
atrull314 Mar 18, 2025
05d705a
Merge pull request #50 from nf-core/dev
atrull314 Jun 9, 2025
d9c226d
add "type" in samplesheet
ljwharbers Aug 6, 2025
b0c993b
Add 10X multiome whitelists
ljwharbers Aug 6, 2025
bed76ee
Wrap demultiplexing (blaze and flexiplex) in separate subworkflows
ljwharbers Aug 6, 2025
64e76b8
blaze subworkflow updates
ljwharbers Aug 6, 2025
8f1fbe9
add 10x multiome
ljwharbers Aug 6, 2025
e64e1a5
add flexiplex related modules
ljwharbers Aug 6, 2025
56f40a5
initial refactoring to include flexiplex
ljwharbers Aug 6, 2025
aed6f32
add whitelist_dna param
ljwharbers Aug 6, 2025
9180db1
change blaze ext args
ljwharbers Aug 6, 2025
9cd659d
added copper
ljwharbers Aug 6, 2025
90bd00f
replaced nanofilt with chopper
ljwharbers Aug 6, 2025
424ae35
update flexiplex version
ljwharbers Aug 7, 2025
c8b9368
add chopper and seqkit split2
ljwharbers Aug 7, 2025
27029d7
replacing blaze with flexiplex
ljwharbers Aug 7, 2025
b39352a
fixed flexiplex for cDNA libraries
ljwharbers Aug 7, 2025
99fea25
changed to .txt.gz whitelists
ljwharbers Aug 8, 2025
3b771ce
removing code that was unzipping and splitting fastq files for nanofi…
ljwharbers Aug 8, 2025
49c2de3
reverted flexiplex version to 1.02.3 because of a bug that will corru…
ljwharbers Aug 8, 2025
38eae51
adding modules
ljwharbers Aug 8, 2025
ad24d06
add dna_whitelist and cdna_whitelist params
ljwharbers Aug 8, 2025
399b8c8
adding mergebarcodefile module
ljwharbers Aug 8, 2025
8a1a0d8
adjusting modules.config for possible barcode formats
ljwharbers Aug 8, 2025
3bd6534
working version of demultiplex_flexiplex subworkflow
ljwharbers Aug 8, 2025
f7ad359
blaze and flexiplex adjustments and subworkflows integration
ljwharbers Aug 11, 2025
99fc266
add custom flexiplex barcode format option
ljwharbers Aug 11, 2025
2b09e88
minor changes for blaze to make it compatible with new potentila barc…
ljwharbers Aug 11, 2025
27b6b73
demultiplexing and channel structures for flexiplex and blaze compati…
ljwharbers Aug 11, 2025
216b2bb
adding DNA mapping and dedup subworkflow
ljwharbers Aug 11, 2025
1f42671
flipped split_amount between split BC and fastq to be more intuitive
ljwharbers Aug 12, 2025
3a8df1b
initial add of longread dna alignment subworkflow
ljwharbers Aug 14, 2025
cb6f92b
initial add of long read dna subworkflow in main workflow
ljwharbers Aug 14, 2025
3dc613c
formatting
ljwharbers Aug 14, 2025
029b066
fixed read counts to also work on flexiplex output
ljwharbers Aug 14, 2025
727aa1f
ability to add custom flexiplex barcode extraction
ljwharbers Aug 14, 2025
9381585
fixing some output publishdirs
ljwharbers Aug 14, 2025
29219a2
adding custom barcode options to schema
ljwharbers Aug 14, 2025
c018468
ensure whitelists are file objects
ljwharbers Aug 14, 2025
3d676c7
add flexiplex, seqkit and chopper to citations. remove nanofilt.
ljwharbers Aug 14, 2025
5fd1bf1
ensure whitelist is an empty channel, otherwise it complains
ljwharbers Aug 14, 2025
fc5f56a
update yaml for mergebarcodes
ljwharbers Aug 14, 2025
f537403
[automated] Fix code linting
nf-core-bot Aug 18, 2025
3954410
linting
ljwharbers Aug 18, 2025
9cfb142
Merge branch 'scdnalong' of https://github.com/ljwharbers/scnanoseq i…
ljwharbers Aug 18, 2025
9fc6dd9
linting
ljwharbers Aug 18, 2025
48f9309
linting
ljwharbers Aug 18, 2025
6f114b1
fixed flexiformatter docker url
ljwharbers Aug 18, 2025
e9cf356
fixed barcode outputs and publishdir
ljwharbers Aug 19, 2025
7ec8211
added flexiplex to output.md
ljwharbers Aug 19, 2025
f4e8d37
added usage docs
ljwharbers Aug 19, 2025
3e1b237
updated flexiplex version
ljwharbers Aug 19, 2025
e63fc6f
linting
ljwharbers Aug 19, 2025
213da77
Merge branch 'dev' into scdnalong
ljwharbers Aug 19, 2025
edf2e2f
linting
ljwharbers Aug 19, 2025
cfa7711
fixed mawk versioning
ljwharbers Sep 19, 2025
d52d140
fixed merging output
ljwharbers Sep 19, 2025
f68031c
update flexiformatter version to fix cb matching
ljwharbers Sep 19, 2025
ea939b2
added note to specify DNA is only compatible with flexiplex
ljwharbers Sep 19, 2025
52c042c
added dna and cdna specific demultiplex options
ljwharbers Sep 19, 2025
1d9df4e
added flexiformatter in dna subworkflow
ljwharbers Sep 19, 2025
a36b47f
fix versioning of mergebarcodecounts
ljwharbers Sep 19, 2025
99f1928
output docs updates
ljwharbers Sep 19, 2025
4485b66
Instantiating channels and updating parameter names
atrull314 Sep 24, 2025
34a81a4
fix demux tool references and update flexiformatter
ljwharbers Sep 25, 2025
1605c8c
merge
ljwharbers Sep 25, 2025
27c72f5
fix skip trimming
ljwharbers Oct 8, 2025
ad4142c
citations update
ljwharbers Dec 30, 2025
d87269d
add zenodo release
ljwharbers Dec 30, 2025
d851d33
pipeline lint
ljwharbers Dec 30, 2025
9c1ede6
update citations
ljwharbers Dec 30, 2025
63e01ff
merge dev
ljwharbers Dec 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 18 additions & 6 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,32 @@

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

> Langer BE, Amaral A, Baudement MO, Bonath F, Charles M, Chitneedi PK, Clark EL, Di Tommaso P, Djebali S, Ewels PA, Eynard S, Fellows Yates JA, Fischer D, Floden EW, Foissac S, Gabernet G, Garcia MU, Gillard G, Gundappa MK, Guyomar C, Hakkaart C, Hanssen F, Harrison PW, Hörtenhuber M, Kurylo C, Kühn C, Lagarrigue S, Lallias D, Macqueen DJ, Miller E, Mir-Pedrol J, Moreira GCM, Nahnsen S, Patel H, Peltzer A, Pitel F, Ramayo-Caldas Y, Ribeiro-Dantas MDC, Rocha D, Salavati M, Sokolov A, Espinosa-Carrasco J, Notredame C, Community TN. Empowering bioinformatics communities with Nextflow and nf-core. Genome Biol. 2025 Jul 29;26(1):228. doi: 10.1186/s13059-025-03673-9. PMID: 40731283; PMCID: PMC12309086.
> Langer BE, Amaral A, Baudement MO, Bonath F, Charles M, Chitneedi PK, Clark EL, Di Tommaso P, Djebali S, Ewels PA, Eynard S, Fellows Yates JA, Fischer D, Floden EW, Foissac S, Gabernet G, Garcia MU, Gillard G, Gundappa MK, Guyomar C, Hakkaart C, Hanssen F, Harrison PW, Hörtenhuber M, Kurylo C, Kühn C, Lagarrigue S, Lallias D, Macqueen DJ, Miller E, Mir-Pedrol J, Moreira GCM, Nahnsen S, Patel H, Peltzer A, Pitel F, Ramayo-Caldas Y, Ribeiro-Dantas MDC, Rocha D, Salavati M, Sokolov A, Espinosa-Carrasco J, Notredame C, Community TN. Empowering bioinformatics communities with Nextflow and nf-core. Genome Biol. 2025 Jul 29;26(1):228. doi: 10.1186/s13059-025-03673-9. PMID: 40731283; PMCID: PMC12309086.

## [nf-core/scnanoseq](https://doi.org/10.1093/bioinformatics/btaf487)

> Trull A, Worthey EA, Ianov L. scnanoseq: an nf-core pipeline for Oxford Nanopore single-cell RNA-sequencing. Bioinformatics. 2025 Sep 1;41(9):btaf487. doi: 10.1093/bioinformatics/btaf487. PMID: 40905625; PMCID: PMC12449243.
> Trull A, Worthey EA, Ianov L. scnanoseq: an nf-core pipeline for Oxford Nanopore single-cell RNA-sequencing. Bioinformatics. 2025 Sep 1;41(9):btaf487. doi: 10.1093/bioinformatics/btaf487. PMID: 40905625; PMCID: PMC12449243.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

Copy link
Collaborator

@atrull314 atrull314 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've added your flexiformatter tool which is a public github repo, might be worth adding the link to the repo here. We've got a couple tools like that ( e.g. pigz) that don't have a true citation to add here

## Pipeline tools

- [SeqKit2](https://pubmed.ncbi.nlm.nih.gov/38898985/)

> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. PMID: 38898985; PMCID: PMC11183193.

- [Flexiplex](https://pubmed.ncbi.nlm.nih.gov/38379414/)

> Cheng O, Ling MH, Wang C, Wu S, Ritchie ME, Göke J, Amin N, Davidson NM. Flexiplex: a versatile demultiplexer and search tool for omics data. Bioinformatics. 2024 Mar 4;40(3):btae102. doi: 10.1093/bioinformatics/btae102. PMID: 38379414; PMCID: PMC10914444.

- [Flexiformatter](https://github.com/ljwharbers/flexiformatter)

> Luuk Harbers. (2025). ljwharbers/flexiformatter: 1.0.6 (1.0.6). Zenodo. https://doi.org/10.5281/zenodo.18098066

- [BLAZE](https://pubmed.ncbi.nlm.nih.gov/37024980/)

> You Y, Prawer YDJ, De Paoli-Iseppi R, Hunt CPJ, Parish CL, Shim H, Clark MB. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. Genome Biol. 2023 Apr 6;24(1):66. doi: 10.1186/s13059-023-02907-y. PMID: 37024980; PMCID: PMC10077662.
Expand All @@ -40,9 +52,9 @@

> De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 2018 Aug 1; 34(15):2666-9 doi:10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.

- [Nanofilt](https://pubmed.ncbi.nlm.nih.gov/29547981/)
- [Chopper](https://pubmed.ncbi.nlm.nih.gov/37171891/)

> De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 2018 Aug 1; 34(15):2666-9 doi:10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.
> De Coster W, Rademakers R. NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics. 2023 May 4;39(5):btad311. doi: 10.1093/bioinformatics/btad311. PMID: 37171891; PMCID: PMC10196664.

- [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/)

Expand Down
6 changes: 6 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@
},
"cell_count": {
"type": "integer"
},
"type": {
"type": "string",
"enum": ["dna", "cdna"],
"default": "cdna",
"errorMessage": "Type must be either 'dna' or 'cdna'. Default is 'cdna'."
}
},
"required": ["sample", "fastq", "cell_count"]
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added assets/whitelist/737K-august-2016.txt.gz
Binary file not shown.
Binary file removed assets/whitelist/737K-august-2016.txt.zip
Binary file not shown.
Binary file not shown.
Binary file not shown.
63 changes: 28 additions & 35 deletions bin/generate_read_counts.sh
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@

get_fastqc_counts()
{
fastqc_file=$1
counts=$(unzip -p ${fastqc_file} $(basename ${fastqc_file} .zip)/fastqc_data.txt | \
counts=$(unzip -p "${fastqc_file}" "$(basename "${fastqc_file}" .zip)/fastqc_data.txt" | \
grep 'Total Sequences' | \
cut -f2 -d$'\t')
echo $counts

echo "$counts"
}

get_nanoplot_counts()
{
nanoplot_file=$1
counts=$(grep 'Number of reads' $nanoplot_file | awk '{print $NF}' | cut -f1 -d'.' | sed 's/,//g')
echo $counts
counts=$(grep 'Number of reads' "$nanoplot_file" | awk '{print $NF}' | cut -f1 -d'.' | sed 's/,//g')
echo "$counts"
}

output=""
Expand All @@ -22,27 +20,23 @@ input=""
while [[ $# -gt 0 ]]
do
flag=$1

case "${flag}" in
--input) input=$2; shift;;
--output) output=$2; shift;;
*) echo "Unknown option $1 ${reset}" && exit 1
*) echo "Unknown option $1" && exit 1
esac
shift
done

header=""
data=""

header="sample,base_fastq_counts,trimmed_read_counts,extracted_read_counts,corrected_read_counts"
echo "$header" > $output
echo "$header" > "$output"

for sample_name in $(for file in $(readlink -f $input)/*.tsv; do basename $file; done | cut -f1 -d'.' | sort -u)
do
###############
# INPUT_FILES #
###############
# Collect all sample names from both barcode file types
sample_names=$(find "$input" -type f -name "*.corrected_bc_umi.tsv" -o -name "*_known_barcodes.txt" | \
sed -E 's|.*/||' | sed -E 's/_known_barcodes\.txt$//; s/\.corrected_bc_umi\.tsv$//' | sort -u)

for sample_name in $sample_names
do
raw_fastqc="${sample_name}.raw_fastqc.zip"
raw_nanoplot="${sample_name}.raw_NanoStats.txt"

Expand All @@ -52,18 +46,18 @@ do
extract_fastqc="${sample_name}.extracted_fastqc.zip"
extract_nanoplot="${sample_name}.extracted_NanoStats.txt"

correct_csv="${sample_name}.corrected_bc_umi.tsv"
data="$(basename $sample_name)"
corrected_tsv="${sample_name}.corrected_bc_umi.tsv"
known_barcodes="${sample_name}_known_barcodes.txt"

data="$(basename "$sample_name")"

####################
# RAW FASTQ COUNTS #
####################
if [[ -s "$raw_fastqc" ]]
then
if [[ -s "$raw_fastqc" ]]; then
fastqc_counts=$(get_fastqc_counts "$raw_fastqc")
data="$data,$fastqc_counts"
elif [[ -s "$raw_nanoplot" ]]
then
elif [[ -s "$raw_nanoplot" ]]; then
nanoplot_counts=$(get_nanoplot_counts "$raw_nanoplot")
data="$data,$nanoplot_counts"
else
Expand All @@ -73,12 +67,10 @@ do
###############
# TRIM COUNTS #
###############
if [[ -s "$trim_fastqc" ]]
then
if [[ -s "$trim_fastqc" ]]; then
trim_counts=$(get_fastqc_counts "$trim_fastqc")
data="$data,$trim_counts"
elif [[ -s "$trim_nanoplot" ]]
then
elif [[ -s "$trim_nanoplot" ]]; then
nanoplot_counts=$(get_nanoplot_counts "$trim_nanoplot")
data="$data,$nanoplot_counts"
else
Expand All @@ -88,12 +80,10 @@ do
#####################
# PREEXTRACT COUNTS #
#####################
if [[ -s "$extract_fastqc" ]]
then
if [[ -s "$extract_fastqc" ]]; then
extract_counts=$(get_fastqc_counts "$extract_fastqc")
data="$data,$extract_counts"
elif [[ -s "$extract_nanoplot" ]]
then
elif [[ -s "$extract_nanoplot" ]]; then
nanoplot_counts=$(get_nanoplot_counts "$extract_nanoplot")
data="$data,$nanoplot_counts"
else
Expand All @@ -103,12 +93,15 @@ do
##################
# CORRECT COUNTS #
##################
if [[ -s $correct_csv ]]
then
correct_counts=$(cut -f6 $correct_csv | awk '{if ($0 != "") {print $0}}' | wc -l)
if [[ -s "$known_barcodes" ]]; then
correct_sum=$(awk -F'\t' '{if ($2 != "") sum += $2} END {print sum}' "$known_barcodes")
data="$data,$correct_sum"
elif [[ -s "$corrected_tsv" ]]; then
correct_counts=$(cut -f6 "$corrected_tsv" | awk '{if ($0 != "") print $0}' | wc -l)
data="$data,$correct_counts"
else
data="$data,"
fi
echo "$data" >> $output

echo "$data" >> "$output"
done
Loading