Build branch main with version main (1e1ffb3)

Build pipeline: vsh-ci-dev-jsbwk Source commit: 1e1ffb315f Source message: Merge pull request #17 from viash-hub/add_biobox_modules - Migrate a number of components to biobox - Fix tests - Reduce size of test resources - Prepare for Viash Hub
2024-09-13 07:41:13 +00:00
commit 1ebb61f1e8
557 changed files with 430700 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,5 @@
 .nextflow*
 work
 testData
 test_results
 target
--- a/README.md
+++ b/README.md
@@ -0,0 +1,136 @@
 # RNAseq.vsh
 <!-- README.md is generated by running 'quarto render README.qmd' -->
 A version of the [nf-core/rnaseq](https://github.com/nf-core/rnaseq)
 pipeline (version 3.14.0) in the [Viash framework](http://www.viash.io).
 ## Rationale
 We stick to the original nf-core pipeline as much as possible. This also
 means that we create a subworkflow for the 5 main stages of the pipeline
 as depicted in the [README](https://github.com/nf-core/rnaseq).
 ## Getting started
 As test data, we can use the small dataset nf-core provided with [their
 `test`
 profile](https://github.com/nf-core/test-datasets/blob/rnaseq3/samplesheet/v3.10/samplesheet_test.csv):
 <https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004>.
 A simple script has been provided to fetch those files from the github
 repository and store them under `testData/minimal_test` (the
 subdirectory is created to support `full_test` later as well):
 `bin/get_minimal_test_data.sh`.
 Additionally, a script has been provided to fetch some additional
 resources for unit testing the components. Thes will be stored under
 `testData/unit_test_resources`: `bin/get_unit test_data.sh`
 To get started, we need to:
 1.  [Install
    `nextflow`](https://www.nextflow.io/docs/latest/getstarted.html)
    system-wide
 2.  Fetch the test data:
 ``` bash
 bin/minimal_test.sh
 bin/get_minimal_test_data.sh
 ```
 ## Running the pipeline
 To actually run the pipeline, we first need to build the components and
 pipeline:
 ``` bash
 viash ns build --setup cb --parallel
 ```
 Now we can run the pipeline using the command:
 ``` bash
 nextflow run target/nextflow/workflows/pre_processing/main.nf \
  -profile docker \
  --id test \
  --input testData/minimal_test/SRR6357070_1.fastq.gz \
  --publish_dir testData/test_output/
 ```
 Alternatively, we can run the pipeline with a sample sheet using the
 built-in `--param_list` functionality: (Read file paths must be
 specified relative to the sample sheet’s path)
 ``` bash
 cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
 id,fastq_1,fastq_2,strandedness
 WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
 WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
 RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
 HERE
 nextflow run target/nextflow/workflows/rnaseq/main.nf \
  --param_list testData/minimal_test/input_fastq/sample_sheet.csv \
  --publish_dir "test_results/full_pipeline_test" \
  --fasta testData/minimal_test/reference/genome.fasta \
  --gtf testData/minimal_test/reference/genes.gtf.gz \
  --transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
  -profile docker
 ```
 ## Pipeline sub-workflows and components
 The pipeline has 5 sub-workflows that can be run separately.
 1.  Prepare genome: This is a workflow for preparing all the reference
    data required for downstream analysis, i.e., uncompress provided
    reference data or generate the required index files (for STAR,
    Salmon, Kallisto, RSEM, BBSplit).
 2.  Pre-processing: This is a workflow for performing quality control on
    the input reads It performs FastQC, extracts UMIs, trims adapters,
    and removes ribosomal RNA reads. Adapters can be trimmed using
    either Trim galore! or fastp (work in progress).
 3.  Genome alignment and quantification: This is a workflow for
    performing genome alignment using STAR and transcript quantification
    using Salmon or RSEM (using RSEM’s built-in support for STAR) (work
    in progress). Alignment sorting and indexing, as well as computation
    of statistics from the BAM files is performed using Samtools.
    UMI-based deduplication is also performed.
 4.  Post-processing: This is a workflow for duplicate read marking
    (picard MarkDuplicates), transcript assembly and quantification
    (StringTie), and creation of bigWig coverage files.
 5.  Pseudo alignment and quantification: This is a workflow for
    performing pseudo alignment and transcript quantification using
    Salmon or Kallisto.
 6.  Final QC: This is a workflow for performing extensive quality
    control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts).
    It presents QC for raw reads, alignments, gene biotype, sample
    similarity, and strand specificity (MultiQC).
 ## Reusing components from biobox
 At the moment, this pipeline makes use of the following components from
 [biobox](https://github.com/viash-hub/biobox):
 - `gffread`
 - `star/star_genome_generate`
 - `star/star_align_reads`
 - `salmon/salmon_index`
 - `salmon/salmon_quant`
 - `featurecounts`
 - `samtools/samtools_sort`
 - `samtools/samtools_index`
 - `samtools/samtools_stats`
 - `samtools/samtools_flagstat`
 - `samtools/samtools_idxstats`
 - `multiqc` (work in progress - updating `assets/multiqc_config.yaml`)
 - `fastp` (work in progress)
 - `rsem/rsem_prepare_reference` (work in progress)
 - `rsem/rsem_calculate_expression` (work in progress)
--- a/README.qmd
+++ b/README.qmd
@@ -0,0 +1,107 @@
 ---
 title: RNAseq.vsh
 format: gfm
 ---
 <!-- README.md is generated by running 'quarto render README.qmd' -->
 ```{r, echo = FALSE, message = FALSE, error = FALSE, warning = FALSE}
 library(tidyverse)
 ```
 A version of the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline (version 3.14.0) in the [Viash framework](http://www.viash.io).
 ## Rationale
 We stick to the original nf-core pipeline as much as possible. This also means that we create a subworkflow for the 5 main stages of the pipeline as depicted in the [README](https://github.com/nf-core/rnaseq).
 ## Getting started
 As test data, we can use the small dataset nf-core provided with [their `test` profile](https://github.com/nf-core/test-datasets/blob/rnaseq3/samplesheet/v3.10/samplesheet_test.csv): <https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004>.
 A simple script has been provided to fetch those files from the github repository and store them under `testData/minimal_test` (the subdirectory is created to support `full_test` later as well): `bin/get_minimal_test_data.sh`. 
 Additionally, a script has been provided to fetch some additional resources for unit testing the components. Thes will be stored under `testData/unit_test_resources`: `bin/get_unit test_data.sh`
 To get started, we need to:
 1.  [Install `nextflow`](https://www.nextflow.io/docs/latest/getstarted.html) system-wide
 2.  Fetch the test data:
 ``` bash
 bin/minimal_test.sh
 bin/get_minimal_test_data.sh
 ```
 ## Running the pipeline
 To actually run the pipeline, we first need to build the components and pipeline:
 ``` bash
 viash ns build --setup cb --parallel
 ```
 Now we can run the pipeline using the command:
 ``` bash
 nextflow run target/nextflow/workflows/pre_processing/main.nf \
  -profile docker \
  --id test \
  --fastq_1 testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz \
  --publish_dir testData/test_output/
 ```
 Alternatively, we can run the pipeline with a sample sheet using the built-in `--param_list` functionality: (Read file paths must be specified relative to the sample sheet's path)
 ``` bash
 cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
 id,fastq_1,fastq_2,strandedness
 WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
 WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
 RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
 HERE
 nextflow run target/nextflow/workflows/rnaseq/main.nf \
  --param_list testData/minimal_test/input_fastq/sample_sheet.csv \
  --publish_dir "test_results/full_pipeline_test" \
  --fasta testData/minimal_test/reference/genome.fasta \
  --gtf testData/minimal_test/reference/genes.gtf.gz \
  --transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
  -profile docker
 ```
 ## Pipeline sub-workflows and components
 The pipeline has 5 sub-workflows that can be run separately.
 1. Prepare genome: This is a workflow for preparing all the reference data required for downstream analysis, i.e., uncompress provided reference data or generate the required index files (for STAR, Salmon, Kallisto, RSEM, BBSplit).
 2. Pre-processing: This is a workflow for performing quality control on the input reads It performs FastQC, extracts UMIs, trims adapters, and removes ribosomal RNA reads. Adapters can be trimmed using either Trim galore! or fastp (work in progress).
 3. Genome alignment and quantification: This is a workflow for performing genome alignment using STAR and transcript quantification using Salmon or RSEM (using RSEM's built-in support for STAR) (work in progress). Alignment sorting and indexing, as well as computation of statistics from the BAM files is performed using Samtools. UMI-based deduplication is also performed. 
 4. Post-processing: This is a workflow for duplicate read marking (picard MarkDuplicates), transcript assembly and quantification (StringTie), and creation of bigWig coverage files.
 5. Pseudo alignment and quantification: This is a workflow for performing pseudo alignment and transcript quantification using Salmon or Kallisto. 
 6. Final QC: This is a workflow for performing extensive quality control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts). It presents QC for raw reads, alignments, gene biotype, sample similarity, and strand specificity (MultiQC).
 ## Reusing components from biobox
 At the moment, this pipeline makes use of the following components from [biobox](https://github.com/viash-hub/biobox):
 * `gffread`
 * `star/star_genome_generate`
 * `star/star_align_reads`
 * `salmon/salmon_index`
 * `salmon/salmon_quant`
 * `featurecounts`
 * `samtools/samtools_sort`
 * `samtools/samtools_index`
 * `samtools/samtools_stats`
 * `samtools/samtools_flagstat`
 * `samtools/samtools_idxstats`
 * `multiqc` (work in progress - updating `assets/multiqc_config.yaml`)
 * `fastp` (work in progress)
 * `rsem/rsem_prepare_reference` (work in progress)
 * `rsem/rsem_calculate_expression` (work in progress)
--- a/_viash.yaml
+++ b/_viash.yaml
@@ -0,0 +1,13 @@
 viash_version: 0.9.0
 source: src
 target: target
 info:
  test_resources:
    - path: gs://viash-hub-test-data/rnaseq/v1
      dest: testData
 config_mods: |
  .requirements.commands := ['ps']
  .runners[.type == 'nextflow'].directives.tag := '$id'
--- a/assets/methods_description_template.yml
+++ b/assets/methods_description_template.yml
@@ -0,0 +1,25 @@
 id: "rnaseq.vsh-methods-description"
 description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication."
 section_name: "nf-core/rnaseq Methods Description"
 section_href: "https://github.com/nf-core/rnaseq"
 plot_type: "html"
 data: |
  <h4>Methods</h4>
  <p>Data was processed using rnaseq.vsh which is a version of the nf-core/rnaseq (v.3.14.0) workflow wriiten using the Viash framework .</p>
  <p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
  <pre><code>${workflow.commandLine}</code></pre>
  <h4>References</h4>
  <ul>
    <li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. <a href="https://doi.org/10.1038/nbt.3820">https://doi.org/10.1038/nbt.3820</a></li>
    <li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. <a href="https://doi.org/10.1038/s41587-020-0439-x">https://doi.org/10.1038/s41587-020-0439-x</a></li>
    <li>VIASH</li>
  </ul>
  <div class="alert alert-info">
    <h5>Notes:</h5>
    <ul>
      ${nodoi_text}
      <li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
      <li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
    </ul>
  </div>
--- a/assets/multiqc/biotypes_header.txt
+++ b/assets/multiqc/biotypes_header.txt
@@ -0,0 +1,11 @@
 # id: 'biotype_counts'
 # section_name: 'Biotype Counts'
 # description: "shows reads overlapping genomic features of different biotypes,
 #     counted by <a href='http://bioinf.wehi.edu.au/featureCounts'>featureCounts</a>."
 # plot_type: 'bargraph'
 # anchor: 'featurecounts_biotype'
 # pconfig:
 #     id: "featurecounts_biotype_plot"
 #     title: "featureCounts: Biotypes"
 #     xlab: "# Reads"
 #     cpswitch_counts_label: "Number of Reads"
--- a/assets/multiqc/deseq2_clustering_header.txt
+++ b/assets/multiqc/deseq2_clustering_header.txt
@@ -0,0 +1,12 @@
 #id: 'deseq2_clustering'
 #section_name: 'DESeq2 sample similarity'
 #description: "is generated from clustering by Euclidean distances between
 #	       <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
 #              rlog values for each sample
 #              in the <a href='https://github.com/nf-core/rnaseq/blob/master/bin/deseq2_qc.r'><code>deseq2_qc.r</code></a> script."
 #plot_type: 'heatmap'
 #anchor: 'deseq2_clustering'
 #pconfig:
 #    title: 'DESeq2: Heatmap of the sample-to-sample distances'
 #    xlab: True
 #    reverseColors: True
--- a/assets/multiqc/deseq2_pca_header.txt
+++ b/assets/multiqc/deseq2_pca_header.txt
@@ -0,0 +1,11 @@
 #id: 'deseq2_pca'
 #section_name: 'DESeq2 PCA plot'
 #description: "PCA plot between samples in the experiment.
 #              These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
 #              in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/deseq2_qc.r'><code>deseq2_qc.r</code></a> script."
 #plot_type: 'scatter'
 #anchor: 'deseq2_pca'
 #pconfig:
 #    title: 'DESeq2: Principal component plot'
 #    xlab: PC1
 #    ylab: PC2
--- a/assets/multiqc_config.yml
+++ b/assets/multiqc_config.yml
@@ -0,0 +1,167 @@
 report_comment: >
  This report has been generated by the <a href="https://github.com/data-intuitive/rnaseq.vsh" </a>
  analysis pipeline. 
 report_section_order:
  "rnaseq.vsh-methods-description":
    order: -1000
  software_versions:
    order: -1001
  "rnaseq.vsh-summary":
    order: -1002
 export_plots: true
 # Run only these modules
 run_modules:
  - custom_content
  - fastqc
  - cutadapt
  - fastp
  - sortmerna
  - star
  # - hisat2
  - rsem
  - salmon
  - kallisto
  - samtools
  - picard
  - preseq
  - rseqc
  - qualimap
 # Order of modules
 top_modules:
  - "fail_trimming"
  - "fail_mapping"
  - "fail_strand"
  - "star_rsem_deseq2_pca"
  - "star_rsem_deseq2_clustering"
  - "star_salmon_deseq2_pca"
  - "star_salmon_deseq2_clustering"
  - "salmon_deseq2_pca"
  - "salmon_deseq2_clustering"
  - "kallisto_deseq2_pca"
  - "kallisto_deseq2_clustering"
  - "biotype_counts"
  - "dupradar"
 module_order:
  - fastqc:
      name: "FastQC (raw)"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "*.read_*.fastqc.zip"
  - cutadapt
  - fastp
  - fastqc:
      name: "FastQC (trimmed)"
      info: "This section of the report shows FastQC results after adapter trimming."
      path_filters:
        - "*.trimgalore.read_*.fastqc.zip"
 # Don't show % Dups in the General Stats table (we have this from Picard)
 table_columns_visible:
  fastqc:
    percent_duplicates: False
 extra_fn_clean_exts:
  - ".salmon_quant"
  - ".mapping_quality"
  - ".genome_sorted"
  - ".MarkDuplicates"
  - ".MarkDuplicates_flagstat"
  - ".MarkDuplicates_stats"
  - ".genome_sorted_MarkDuplicates"
  - ".star_aligned"  
  - ".read_1"  
  - ".read_2"   
 # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml
 custom_data:
  fail_trimming:
    section_name: "WARNING: Fail Trimming Check"
    description: "List of samples that failed the minimum trimmed reads threshold specified via the '--min_trimmed_reads' parameter, and hence were ignored for the downstream processing steps."
    plot_type: "table"
    pconfig:
      id: "fail_trimmed_samples_table"
      table_title: "Samples failed trimming threshold"
      namespace: "Samples failed trimming threshold"
      format: "{:.0f}"
  fail_mapping:
    section_name: "WARNING: Fail Alignment Check"
    description: "List of samples that failed the STAR minimum mapped reads threshold specified via the '--min_mapped_reads' parameter, and hence were ignored for the downstream processing steps."
    plot_type: "table"
    pconfig:
      id: "fail_mapped_samples_table"
      table_title: "Samples failed mapping threshold"
      namespace: "Samples failed mapping threshold"
      format: "{:.2f}"
  fail_strand:
    section_name: "WARNING: Fail Strand Check"
    description: "List of samples that failed the strandedness check between that provided in the samplesheet and calculated by the <a href='http://rseqc.sourceforge.net/#infer-experiment-py'>RSeQC infer_experiment.py</a> tool."
    plot_type: "table"
    pconfig:
      id: "fail_strand_check_table"
      table_title: "Samples failed strandedness check"
      namespace: "Samples failed strandedness check"
      format: "{:.2f}"
 # Customise the module search patterns to speed up execution time
 #  - Skip module sub-tools that we are not interested in
 #  - Replace file-content searching with filename pattern searching
 #  - Don't add anything that is the same as the MultiQC default
 # See https://multiqc.info/docs/#optimise-file-search-patterns for details
 sp:
  fastqc/zip:
    fn: "*.fastqc.zip"
  cutadapt:
    fn: "*.trimming_report.txt"
  fastp:
    fn: "*.fastp.json"
  sortmerna:
    fn: "*sortmerna*.log"
  star:
    fn: "*.star_aligned.log.final.out"
  # hisat2:
  #   fn: "*.hisat2.summary.log"
  salmon/meta:
    fn: "*meta_info.json"
  preseq:
    fn: "*.lc_extrap.txt"
  samtools/stats:
    fn: "*.stats"
  samtools/flagstat:
    fn: "*.flagstat"
  samtools/idxstats:
    fn: "*.idxstats*"
  rseqc/bam_stat:
    fn: "*.mapping_quality.txt"
  rseqc/junction_saturation:
    fn: "*.junction_saturation_plot.r"
  rseqc/junction_annotation:
    fn: "*.junction_annotation.log"
  rseqc/read_duplication_pos:
    fn: "*.duplication_rate_mapping.xls"
  rseqc/read_distribution:
    fn: "*.read_distribution.txt"
  rseqc/infer_experiment:
    fn: "*.strandedness.txt"
  rseqc/inner_distance:
    fn: "*.inner_distance_freq.txt"
  rseqc/tin:
    fn: "*.tin_summary.txt"
  picard/markdups:
    fn: "*.MarkDuplicates.metrics.txt"
 skip_versions_section: true
--- a/assets/rrna-db-defaults.txt
+++ b/assets/rrna-db-defaults.txt
@@ -0,0 +1,8 @@
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/rfam-5.8s-database-id98.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/rfam-5s-database-id98.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-arc-16s-id95.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-arc-23s-id98.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-bac-16s-id90.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-bac-23s-id98.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-euk-18s-id95.fasta
 https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-euk-28s-id98.fasta
--- a/bin/get_minimal_test_data.sh
+++ b/bin/get_minimal_test_data.sh
@@ -0,0 +1,105 @@
 #!/bin/bash
 CURR=`pwd`
 ### Get input fastq files for the minimal test
 DEST_FASTQ="testData/minimal_test/input_fastq"
 mkdir -p $DEST_FASTQ
 cd $DEST_FASTQ
 echo "Fetching FastQ files..."
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357070_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357070_2.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357071_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357071_2.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357072_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357072_2.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357073_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357074_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357075_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357076_1.fastq.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357076_2.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357070_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357070_2.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357071_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357071_2.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357072_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357072_2.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357073_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357074_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357075_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357076_1.fastq.gz
 wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz
 cd $CURR
 ### Get reference files for the minimal test
 DEST_REF="testData/minimal_test/reference"
 mkdir -p $DEST_REF
 cd $DEST_REF
 echo "Fetching reference data..."
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/bbsplit_fasta_list.txt
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genes.gff.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genes.gtf.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genome.fasta
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/gfp.fa.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/hisat2.tar.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/rsem.tar.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/salmon.tar.gz
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/transcriptome.fasta
 wget https://raw.githubusercontent.com/nf-core/rnaseq/3.12.0/assets/rrna-db-defaults.txt
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genome.fasta
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gtf.gz
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gff.gz
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/transcriptome.fasta
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/gfp.fa.gz
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/bbsplit_fasta_list.txt
 # wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/hisat2.tar.gz
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/salmon.tar.gz
 wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/rsem.tar.gz
 cd $CURR
 NEWDEST1_REF="$CURR/testData/minimal_test/reference/rRNA"
 mkdir -p $NEWDEST1_REF
 cd $NEWDEST1_REF
 for LINE in `cat ../rrna-db-defaults.txt`
 do
    wget $LINE
 done
 cd $CURR
 find $NEWDEST1_REF -type f > $DEST_REF/rrna-db-defaults.txt
 NEWDEST2_REF="$CURR/testData/minimal_test/reference/bbsplit_fasta"
 mkdir -p $NEWDEST2_REF
 while IFS=, read -r -a line; do
    url="${line[1]}"
    name="$NEWDEST2_REF/${line[0]}.fa"
    wget $url -O "$name"
    line+=("$name")
    IFS=','
    echo "${line[*]}" >> "$NEWDEST2_REF/tmp.txt"
 done < "$DEST_REF/bbsplit_fasta_list.txt"
 cut -d',' -f1,3 "$NEWDEST2_REF/tmp.txt" > "$DEST_REF/bbsplit_fasta_list.txt"
 rm "$NEWDEST2_REF/tmp.txt"
--- a/bin/get_unit_test_data.sh
+++ b/bin/get_unit_test_data.sh
@@ -0,0 +1,50 @@
 #!/bin/bash
 CURR=`pwd`
 DEST="testData/unit_test_resources"
 mkdir -p $DEST
 cd $DEST
 echo "Fetching unit test resources..."
 ## UMI_TOOLS
 # extract
 wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/slim.fastq.gz
 wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.1.gz
 wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.2.gz
 # dedup
 wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/chr19.bam
 wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/chr19.bam.bai
 # MultiQC
 wget https://multiqc.info/examples/rna-seq/data.zip
 # dupRadar
 wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/genes.gtf
 wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
 wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam.bai
 ### Resources from https://github.com/snakemake/snakemake-wrappers/tree/master/bio
 # DESeq2
 wget https://github.com/snakemake/snakemake-wrappers/raw/master/bio/deseq2/deseqdataset/test/dataset/counts.tsv
 # preseq lc_extrap
 wget https://github.com/snakemake/snakemake-wrappers/raw/master/bio/preseq/lc_extrap/test/samples/a.sorted.bed
 wget https://github.com/smithlabcode/preseq/raw/master/data/SRR1106616_5M_subset.bam
 ### nf-core test datasets
 # sarscov2
 mkdir -p sarscov2
 wget -O sarscov2/genome.sizes https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.sizes
 wget -O sarscov2/test.bedgraph https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/bedgraph/test.bedgraph
 wget -O sarscov2/genome.fasta https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta
 wget -O sarscov2/genome.fasta.fai https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta.fai
 wget -O sarscov2/test.paired_end.sorted.bam https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam
 wget -O sarscov2/test.paired_end.sorted.bam.bai https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam.bai
 wget -O sarscov2/test.bed https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/bed/test.bed
 wget -O sarscov2/test.bed12 https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/bed/test.bed12
 wget -O sarscov2/genome.gtf https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.gtf
 cd $CURR
--- a/main.nf
+++ b/main.nf
@@ -0,0 +1,3 @@
 workflow {
    print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
 }
--- a/nextflow.config
+++ b/nextflow.config
@@ -0,0 +1,27 @@
 // template nextflow.config for nested workflows
 manifest {
  nextflowVersion = '!>=20.12.1-edge'
 }
 docker {
  fixOwnership = true
 }
 // TODO 1: unquote and adapt `rootDir` according to relative path within project
 // params {
 //   rootDir = "$projectDir/../.."
 // }
 // 
 // workflowDir = "${params.rootDir}/workflows"
 // targetDir = "${params.rootDir}/target/nextflow"
 // TODO 2: insert custom imports here
 // TODO 3: unquote
 // docker {
 //   runOptions = "-v \$(realpath ${params.rootDir}):\$(realpath ${params.rootDir})"
 // }
--- a/src/bbmap_bbsplit/config.vsh.yaml
+++ b/src/bbmap_bbsplit/config.vsh.yaml
@@ -0,0 +1,89 @@
 name: "bbmap_bbsplit"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/bbmap/bbsplit/main.nf, modules/nf-core/bbmap/bbsplit/meta.yml]
    last_sha: 277bd337739a8b8f753fa7b5eda6743b9b6acb89
 description: |
  Split sequencing reads by mapping them to multiple references simultaneously.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--id"
    type: string
    description: Sample ID
  - name: "--paired"
    type: boolean 
    default: false
    description: Paired fastq files or not?
  - name: "--input"
    type: file
    multiple: true
    multiple_sep: ","
    description: Input fastq files, either one or two (paired)
    example: sample.fastq
  - name: "--primary_ref"
    type: file
    description: Primary reference FASTA
  - name: "--bbsplit_fasta_list"
    type: file
    description: Path to comma-separated file containing a list of reference genomes to filter reads against with BBSplit.
  - name: "--only_build_index"
    type: boolean
    description: true = only build index; false = mapping
  - name: "--built_bbsplit_index"
    type: file
    description: Directory with index files
 - name: "Output"
  arguments:
  - name: "--fastq_1"
    type: file
    required: false
    description: Output file for read 1.
    direction: output
    must_exist: false
    default: $id.$key.read_1.fastq
  - name: "--fastq_2"
    type: file
    required: false
    must_exist: false
    description: Output file for read 2.
    direction: output
    default: $id.$key.read_2.fastq
  - name: "--bbsplit_index"
    type: file
    description: Directory with index files
    direction: output
    must_exist: false
    default: BBSplit_index
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genome.fasta
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
  - path: /testData/minimal_test/reference/bbsplit_fasta/sarscov2.fa
  - path: /testData/minimal_test/reference/bbsplit_fasta/human.fa
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:
    - type: docker
      run: | 
        apt-get update && \
        apt-get install -y build-essential openjdk-17-jdk wget tar && \
        wget --no-check-certificate https://sourceforge.net/projects/bbmap/files/BBMap_39.01.tar.gz && \
        tar xzf BBMap_39.01.tar.gz && \
        cp -r bbmap/* /usr/local/bin
 runners:
  - type: executable
  - type: nextflow
--- a/src/bbmap_bbsplit/script.sh
+++ b/src/bbmap_bbsplit/script.sh
@@ -0,0 +1,65 @@
 #!/bin/bash
 set -eo pipefail
 function clean_up {
    rm -rf "$tmpdir"
 }
 trap clean_up EXIT 
 avail_mem=3072
 if [ ! -d "$par_built_bbsplit_index" ]; then
    other_refs=()
    while IFS="," read -r name path 
    do
        other_refs+=("ref_$name=$path")
    done < "$par_bbsplit_fasta_list"
 fi
 if $par_only_build_index; then
    if [ -f "$par_primary_ref" ] && [ ${#other_refs[@]} -gt 0 ]; then
        bbsplit.sh \
            -Xmx${avail_mem}M \
            ref_primary="$par_primary_ref" ${other_refs[@]} \
            path=$par_bbsplit_index \
            threads=${meta_cpus:-1}
    else
        echo "ERROR: Please specify as input a primary fasta file along with names and paths to non-primary fasta files."
    fi
 else
    IFS="," read -ra input <<< "$par_input"
    tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
    index_files=''
    if [ -d "$par_built_bbsplit_index" ]; then
        index_files="path=$par_built_bbsplit_index"
    elif [ -f "$par_primary_ref" ] && [ ${#other_refs[@]} -gt 0 ]; then
        index_files="ref_primary=$par_primary_ref ${other_refs[@]}"
    else
        echo "ERROR: Please either specify a BBSplit index as input or a primary fasta file along with names and paths to non-primary fasta files."
    fi
    if $par_paired; then
        bbsplit.sh \
            -Xmx${avail_mem}M \
            $index_files \
            threads=${meta_cpus:-1} \
            in=${input[0]} \
            in2=${input[1]} \
            basename=${tmpdir}/%_#.fastq \
            refstats=bbsplit_stats.txt
        read1=$(find $tmpdir/ -iname primary_1*)
        read2=$(find $tmpdir/ -iname primary_2*)
        cp $read1 $par_fastq_1
        cp $read2 $par_fastq_2
    else
        bbsplit.sh \
            -Xmx${avail_mem}M \
            $index_files \
            threads=${meta_cpus:-1} \
            in=${input[0]} \
            basename=${tmpdir}/%.fastq \
            refstats=bbsplit_stats.txt
        read1=$(find $tmpdir/ -iname primary*)
        cp $read1 $par_fastq_1
    fi
 fi
--- a/src/bbmap_bbsplit/test.sh
+++ b/src/bbmap_bbsplit/test.sh
@@ -0,0 +1,86 @@
 #!/bin/bash
 echo ">>> Test $meta_functionality_name"
 cat > bbsplit_fasta_list.txt << HERE
 sarscov2,$meta_resources_dir/sarscov2.fa
 human,$meta_resources_dir/human.fa
 HERE
 echo ">>> Building BBSplit index"
 "$meta_executable" \
  --primary_ref "$meta_resources_dir/genome.fasta" \
  --bbsplit_fasta_list "bbsplit_fasta_list.txt" \
  --only_build_index true \
  --bbsplit_index "BBSplit_index" 
 echo ">>> Check whether output exists"
 [ ! -d "BBSplit_index" ] && echo "BBSplit index does not exist!" && exit 1
 [ -z "$(ls -A 'BBSplit_index')" ] && echo "BBSplit index is empty!" && exit 1
 echo ">>> Filtering ribosomal RNA reads"
 echo ">>> Testing with single-end reads and primary/non-primary FASTA files"
 "$meta_executable" \
  --paired false \
  --input "$meta_resources_dir/SRR6357070_1.fastq.gz" \
  --only_build_index false \
  --primary_ref "$meta_resources_dir/genome.fasta" \
  --bbsplit_fasta_list "bbsplit_fasta_list.txt" \
  --fastq_1 "filtered_SRR6357070_1.fastq.gz"
 echo ">>> Check whether output exists"
 [ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file is empty!" && exit 1
 rm filtered_SRR6357070_1.fastq.gz
 echo ">>> Testing with paired-end reads and primary/non-primary FASTA files"
 "$meta_executable" \
  --paired true \
  --input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
  --only_build_index false \
  --primary_ref "$meta_resources_dir/genome.fasta" \
  --bbsplit_fasta_list "bbsplit_fasta_list.txt" \
  --fastq_1 "filtered_SRR6357070_1.fastq.gz" \
  --fastq_2 "filtered_SRR6357070_2.fastq.gz"
 echo ">>> Check whether output exists"
 [ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file is empty!" && exit 1
 [ ! -f "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file is empty!" && exit 1
 rm filtered_SRR6357070_1.fastq.gz filtered_SRR6357070_2.fastq.gz
 echo ">>> Testing with single-end reads and BBSplit index"
 "$meta_executable" \
  --paired false \
  --input "$meta_resources_dir/SRR6357070_1.fastq.gz" \
  --only_build_index false \
  --built_bbsplit_index "BBSplit_index" \
  --fastq_1 "filtered_SRR6357070_1.fastq.gz"
 echo ">>> Check whether output exists"
 [ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file is empty!" && exit 1
 echo ">>> Testing with paired-end reads and BBSplit index"
 "$meta_executable" \
  --paired true \
  --input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
  --only_build_index false \
  --built_bbsplit_index "BBSplit_index" \
  --fastq_1 "filtered_SRR6357070_1.fastq.gz" \
  --fastq_2 "filtered_SRR6357070_2.fastq.gz"
 echo ">>> Check whether output exists"
 [ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file is empty!" && exit 1
 [ ! -f "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file does not exist!" && exit 1
 [ ! -s "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file is empty!" && exit 1
 rm filtered_SRR6357070_1.fastq.gz filtered_SRR6357070_2.fastq.gz
 echo "All tests succeeded!"
 exit 0
--- a/src/bedtools_genomecov/config.vsh.yaml
+++ b/src/bedtools_genomecov/config.vsh.yaml
@@ -0,0 +1,56 @@
 name: bedtools_genomecov
 info: 
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/bedtools_genomecov.nf]
    last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
 description: Compute BEDGRAPH (-bg) summaries of feature coverage
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--strandedness"
    type: string
    choices: ["unstranded", "forward", "reverse", "auto"]
    description: Sample strand-specificity. 
  - name: "--bam"
    type: file
    description: Genome BAM file
  - name: "--extra_bedtools_args"
    type: string
    default: ''
 - name: "Output"
  arguments: 
  - name: "--bedgraph_forward"
    type: file
    default: $id.forward.bedgraph
    direction: output
  - name: "--bedgraph_reverse"
    type: file
    default: $id.reverse.bedgraph
    direction: output
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/chr19.bam
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: docker
        run: | 
          apt-get update && \
          apt-get install -y build-essential wget && \
          wget --no-check-certificate https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static && \
          mv bedtools.static /usr/local/bin/bedtools && \
          chmod a+x /usr/local/bin/bedtools
 runners:
  - type: executable
  - type: nextflow
--- a/src/bedtools_genomecov/script.sh
+++ b/src/bedtools_genomecov/script.sh
@@ -0,0 +1,25 @@
 #!/bin/bash
 set -eo pipefail
 prefix_forward="forward"
 prefix_reverse="reverse"
 if [ $par_strandedness == 'reverse' ]; then
    prefix_forward="reverse"
    prefix_reverse="forward"
 fi
 bedtools genomecov \
    -ibam $par_bam \
    -bg \
    -strand + \
    $par_extra_bedtools_args | bedtools sort > $prefix_forward.bedGraph
 bedtools genomecov \
    -ibam $par_bam \
    -bg \
    -strand - \
    $par_extra_bedtools_args | bedtools sort > $prefix_reverse.bedGraph
 mv $prefix_forward.bedGraph $par_bedgraph_forward
 mv $prefix_reverse.bedGraph $par_bedgraph_reverse
--- a/src/bedtools_genomecov/test.sh
+++ b/src/bedtools_genomecov/test.sh
@@ -0,0 +1,22 @@
 #!/bin/bash
 id="SRR6357070"
 echo ">>> Testing $meta_functionality_name"
 "$meta_executable" \
    --strandedness unstranded \
    --bam $meta_resources_dir/chr19.bam \
    --bedgraph_forward chr19_forward.bedgraph \
    --bedgraph_reverse chr19_reverse.bedgraph
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 # check whether output exists
 [ ! -f "chr19_forward.bedgraph" ] && echo "File 'chr19_forward.bedgraph' does not exist!" && exit 1
 [ ! -s "chr19_forward.bedgraph" ] && echo "File 'chr19_forward.bedgraph' is empty!" && exit 1
 [ ! -f "chr19_reverse.bedgraph" ] && echo "File 'chr19_reverse.bedgraph' does not exist!" && exit 1
 [ ! -s "chr19_reverse.bedgraph" ] && echo "File 'chr19_reverse.bedgraph' is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/cat_additional_fasta/config.vsh.yaml
+++ b/src/cat_additional_fasta/config.vsh.yaml
@@ -0,0 +1,54 @@
 name: "cat_additional_fasta"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/cat_additional_fasta.nf]
    last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
 description: |
  Concatenate addional fasta file to reference FASTA and GTF files.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--fasta"
    type: file 
    required: true
    description: Path to FASTA genome file.
  - name: "--gtf"
    type: file
    description: Path to GTF annotation file.
  - name: "--additional_fasta"
    type: file
    description: FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences.
  - name: "--biotype"
    type: string
    description: Biotype value to use when appending entries to GTF file when additional fasta file is provided.
 - name: "Output"
  arguments: 
  - name: "--fasta_output"
    type: file
    direction: output
    description: Concatenated FASTA file.
  - name: "--gtf_output"
    type: file
    direction: output
    description: Concatenated GTF file.
 resources:
  - type: python_script
    path: script.py
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genome.fasta
  - path: /testData/minimal_test/reference/genes.gtf.gz
  - path: /testData/minimal_test/reference/gfp.fa.gz
 engines:
  - type: docker
    image: python 
 runners:
  - type: executable
  - type: nextflow
--- a/src/cat_additional_fasta/script.py
+++ b/src/cat_additional_fasta/script.py
@@ -0,0 +1,80 @@
 #!/usr/bin/env python3
 """
 Read a custom fasta file and create a custom GTF containing each entry
 """
 from itertools import groupby
 import logging
 import os
 import sys
 ## VIASH START
 par = {
    "fasta": "testData/minimal_test/reference/genome.fasta",
    "gtf": "testData/minimal_test/reference/genes.gtf",
    "additional_fasta": "testData/minimal_test/reference/gfp.fa.gz",
    "biotype": "gene_biotype", 
    "fasta_output": "genome_gfp.fasta",
    "gtf_output": "genome_gfp.gtf",
 }
 meta = {
    "functionality_name": "cat_additonal_fasta"
 }
 ## VIASH END
 def fasta_iter(fasta_name):
    """
    modified from Brent Pedersen
    Correct Way To Parse A Fasta File In Python
    given a fasta file. yield tuples of header, sequence
    Fasta iterator from https://www.biostars.org/p/710/#120760
    """
    with open(fasta_name) as fh:
        # ditch the boolean (x[0]) and just keep the header or sequence since
        # we know they alternate.
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            # drop the ">"
            headerStr = header.__next__()[1:].strip()
            # join all sequence lines to one.
            seq = "".join(s.strip() for s in faiter.__next__())
            yield (headerStr, seq)
 def fasta2gtf(fasta, output, biotype):
    fiter = fasta_iter(fasta)
    # GTF output lines
    lines = []
    attributes = 'exon_id "{name}.1"; exon_number "1";{biotype} gene_id "{name}_gene"; gene_name "{name}_gene"; gene_source "custom"; transcript_id "{name}_gene"; transcript_name "{name}_gene";\n'
    line_template = "{name}\ttransgene\texon\t1\t{length}\t.\t+\t.\t" + attributes
    for ff in fiter:
        name, seq = ff
        # Use first ID as separated by spaces as the "sequence name"
        # (equivalent to "chromosome" in other cases)
        seqname = name.split()[0]
        # Remove all spaces
        name = seqname.replace(" ", "_")
        length = len(seq)
        biotype_attr = ""
        if biotype:
            biotype_attr = f' {biotype} "transgene";'
        line = line_template.format(name=name, length=length, biotype=biotype_attr)
        lines.append(line)
    with open(output, "w") as f:
        f.write("".join(lines))
 add_name = os.path.basename(par['additional_fasta'])
 output = os.path.splitext(add_name)[0] + ".gtf"
 fasta2gtf(par['additional_fasta'], output, par['biotype'])
 with open(par['fasta'], 'r') as f1:
    content1 = f1.read()
 with open(par['additional_fasta'], 'r') as f2:
    content2 = f2.read()
 with open(par['fasta_output'], 'w') as f_out:
    f_out.write(content1 + content2)
 with open(par['gtf'], 'r') as g1:
    g_content1 = g1.read()
 with open(output, 'r') as g2:
    g_content2 = g2.read()
 with open(par['gtf_output'], 'w') as g_out:
    g_out.write(g_content1 + g_content2)
--- a/src/cat_additional_fasta/test.sh
+++ b/src/cat_additional_fasta/test.sh
@@ -0,0 +1,26 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 gunzip "$meta_resources_dir/genes.gtf"
 gunzip "$meta_resources_dir/gfp.fa"
 "$meta_executable" \
  --fasta "$meta_resources_dir/genome.fasta" \
  --gtf "$meta_resources_dir/genes.gtf" \
  --additional_fasta "$meta_resources_dir/gfp.fa" \
  --biotype gene_biotype \
  --fasta_output genome_gfp.fasta \
  --gtf_output genome_gfp.gtf
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">>> Checking whether output exists"
 [ ! -f "genome_gfp.fasta" ] && echo "File 'genome_gfp.fasta' does not exist!" && exit 1
 [ ! -s "genome_gfp.fasta" ] && echo "File 'genome_gfp.fasta' is empty!" && exit 1
 [ ! -f "genome_gfp.gtf" ] && echo "File 'genome_gfp.gtf' does not exist!" && exit 1
 [ ! -s "genome_gfp.gtf" ] && echo "File 'genome_gfp.gtf' is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/cat_fastq/config.vsh.yaml
+++ b/src/cat_fastq/config.vsh.yaml
@@ -0,0 +1,54 @@
 name: "cat_fastq"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/cat/fastq/main.nf, modules/nf-core/cat/fastq/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: Concatenate multiple fastq files 
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--read_1"
    type: file
    multiple: true
    multiple_sep: ";"
    description: Read 1 fastq files to be concatenated
  - name: "--read_2"
    type: file
    multiple: true
    multiple_sep: ";"
    description: Read 2 fastq files to be concatenated
 - name: "Output"
  arguments:   
  - name: "--fastq_1"
    type: file
    direction: output
    default: $id.read_1.merged.fastq
    description: Concatenated read 1 fastq
  - name: "--fastq_2"
    type: file
    direction: output
    must_exist: false
    default: $id.read_2.merged.fastq
    description: Concatenated read 2 fastq
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz 
  - path: /testData/minimal_test/input_fastq/SRR6357071_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357071_2.fastq.gz
 engines:
  - type: docker
    image: ubuntu:22.04
 runners:
  - type: executable
  - type: nextflow
--- a/src/cat_fastq/script.sh
+++ b/src/cat_fastq/script.sh
@@ -0,0 +1,20 @@
 #!/bin/bash
 set -eo pipefail
 IFS=";" read -ra read_1 <<< $par_read_1
 IFS=";" read -ra read_2 <<< $par_read_2
 filename=$(basename -- "${read_1[0]}")
 if [ ${filename##*.} == "gz" ]; then
    command="zcat"
 else
    command="cat"
 fi
 if [ ${#read_1[@]} -gt 0 ]; then
    $command ${read_1[*]} > $par_fastq_1
 fi
 if [ ${#read_2[@]} -gt 0 ]; then
    $command ${read_2[*]} > $par_fastq_2
 fi
--- a/src/cat_fastq/test.sh
+++ b/src/cat_fastq/test.sh
@@ -0,0 +1,44 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 echo ">>> Testing paired-end read samples with multiple replicates"
 "$meta_executable" \
  --read_1 $meta_resources_dir/SRR6357070_1.fastq.gz\;$meta_resources_dir/SRR6357071_1.fastq.gz \
  --read_2 $meta_resources_dir/SRR6357070_2.fastq.gz\;$meta_resources_dir/SRR6357071_2.fastq.gz \
  --fastq_1 read_1.merged.fastq \
  --fastq_2 read_2.merged.fastq
 echo ">>> Checking whether output exists"
 [ ! -f "read_1.merged.fastq" ] && echo "Merged read 1 file does not exist!" && exit 1
 [ ! -s "read_1.merged.fastq" ] && echo "Merged read 1 file is empty!" && exit 1
 [ ! -f "read_2.merged.fastq" ] && echo "Merged read 2 file does not exist!" && exit 1
 [ ! -s "read_2.merged.fastq" ] && echo "Merged read 2 file is empty!" && exit 1
 echo ">>> Check number of reads"
 rep1_1=$(zcat $meta_resources_dir/SRR6357070_1.fastq.gz | echo $((`wc -l`/4)))
 rep1_2=$(zcat $meta_resources_dir/SRR6357070_2.fastq.gz | echo $((`wc -l`/4)))
 rep2_1=$(zcat $meta_resources_dir/SRR6357071_1.fastq.gz | echo $((`wc -l`/4)))
 rep2_2=$(zcat $meta_resources_dir/SRR6357071_2.fastq.gz | echo $((`wc -l`/4)))
 merged_1=$(cat read_1.merged.fastq | echo $((`wc -l`/4)))
 merged_2=$(cat read_2.merged.fastq | echo $((`wc -l`/4))) 
 [[ $(( $rep1_1 + $rep2_1 )) !=  $merged_1 ]] || [[ $(( $rep1_2 + $rep2_2 )) !=  $merged_2 ]] && echo "Concatenation unsuccessful!" && exit 1
 rm read_1.merged.fastq read_2.merged.fastq
 echo ">>> Testing single-end read samples with multiple replicates"
 "$meta_executable" \
  --read_1 $meta_resources_dir/SRR6357070_1.fastq.gz\;$meta_resources_dir/SRR6357071_1.fastq.gz \
  --fastq_1 read_1.merged.fastq 
 echo ">>> Checking whether output exists"
 [ ! -f "read_1.merged.fastq" ] && echo "Merged read 1 file does not exist!" && exit 1
 [ ! -s "read_1.merged.fastq" ] && echo "Merged read 1 file is empty!" && exit 1
 echo ">>> Check number of reads"
 rep1_1=$(zcat $meta_resources_dir/SRR6357070_1.fastq.gz | echo $((`wc -l`/4)))
 rep2_1=$(zcat $meta_resources_dir/SRR6357071_1.fastq.gz | echo $((`wc -l`/4)))
 merged_1=$(cat read_1.merged.fastq | echo $((`wc -l`/4)))
 [ $(( $rep1_1 + $rep2_1 )) !=  $merged_1 ] && echo "Concatenation unsuccessful!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/deseq2_qc/config.vsh.yaml
+++ b/src/deseq2_qc/config.vsh.yaml
@@ -0,0 +1,73 @@
 name: deseq2_qc
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/deseq2_qc.nf]
    last_sha: 92b2a7857de1dda9d1c19a088941fc81e2976ff7
 description: | 
  Run DESeq2, perform PCA, generate heatmaps and scatterplots for samples in the counts files
 argument_groups: 
 - name: "Input"
  arguments: 
  - name: "--counts"
    type: file
    description: Count file matrix where rows are genes and columns are samples
  - name: "--pca_header_multiqc"
    type: file
    default: assets/multiqc/deseq2_pca_header.txt
  - name: "--clustering_header_multiqc"
    type: file
    default: assets/multiqc/deseq2_clustering_header.txt
  - name: "--deseq2_vst"
    type: boolean
    default: true
    description: Use vst transformation instead of rlog with DESeq2
  - name: "--extra_args"
    type: string
    default: "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3"
  - name: "--extra_args2"
    type: string
    default: star_salmon
 - name: "Output"
  arguments: 
  - name: "--deseq2_output"
    type: file
    direction: output
    default: deseq2
  - name: "--pca_multiqc"
    type: file
    direction: output
    default: deseq2.pca.vals_mqc.tsv
  - name: "--dists_multiqc"
    type: file
    direction: output
    default: deseq2.sample.dists_mqc.tsv
 resources:
  - type: bash_script
    path: script.sh
  # copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/deseq2_qc.r
  - path: deseq2_qc.r
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/counts.tsv
  - path: /assets/multiqc/deseq2_pca_header.txt
  - path: /assets/multiqc/deseq2_clustering_header.txt
 engines: 
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [ r-base , libcurl4-openssl-dev, libssl-dev, libxml2-dev ]
      - type: r
        cran: [ optparse, ggplot2, RColorBrewer, pheatmap ]
        bioc: [ DESeq2 ]
        url: https://cran.r-project.org/src/contrib/Archive/matrixStats/matrixStats_1.1.0.tar.gz
 runners:
  - type: executable
  - type: nextflow
--- a/src/deseq2_qc/deseq2_qc.r
+++ b/src/deseq2_qc/deseq2_qc.r
@@ -0,0 +1,246 @@
 #!/usr/bin/env Rscript
 ################################################
 ################################################
 ## REQUIREMENTS                               ##
 ################################################
 ################################################
 ## PCA, HEATMAP AND SCATTERPLOTS FOR SAMPLES IN COUNTS FILE
 ## - SAMPLE NAMES HAVE TO END IN e.g. "_R1" REPRESENTING REPLICATE ID. LAST 3 CHARACTERS OF SAMPLE NAME WILL BE TRIMMED TO OBTAIN GROUP ID FOR DESEQ2 COMPARISONS.
 ## - PACKAGES BELOW NEED TO BE AVAILABLE TO LOAD WHEN RUNNING R
 ################################################
 ################################################
 ## LOAD LIBRARIES                             ##
 ################################################
 ################################################
 library(optparse)
 library(DESeq2)
 library(ggplot2)
 library(RColorBrewer)
 library(pheatmap)
 ################################################
 ################################################
 ## PARSE COMMAND-LINE PARAMETERS              ##
 ################################################
 ################################################
 option_list <- list(
    make_option(c("-i", "--count_file"), type="character", default=NULL, metavar="path", help="Count file matrix where rows are genes and columns are samples."),
    make_option(c("-f", "--count_col"), type="integer", default=3, metavar="integer", help="First column containing sample count data."),
    make_option(c("-d", "--id_col"), type="integer", default=1, metavar="integer", help="Column containing identifiers to be used."),
    make_option(c("-r", "--sample_suffix"), type="character", default='', metavar="string", help="Suffix to remove after sample name in columns e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'."),
    make_option(c("-p", "--outprefix"), type="character", default='deseq2', metavar="string" , help="Output prefix."),
    make_option(c("-v", "--vst"), type="logical", default=FALSE, metavar="boolean", help="Run vst transform instead of rlog."),
    make_option(c("-c", "--cores"), type="integer", default=1, metavar="integer", help="Number of cores."), 
    make_option(c("-o", "--outdir"), type="character", default="./", metavar="path", help="Output directory.")
 )
 opt_parser <- OptionParser(option_list=option_list)
 opt        <- parse_args(opt_parser)
 if (is.null(opt$count_file)){
    print_help(opt_parser)
    stop("Please provide a counts file.", call.=FALSE)
 }
 ################################################
 ################################################
 ## READ IN COUNTS FILE                        ##
 ################################################
 ################################################
 count.table           <- read.delim(file=opt$count_file,header=TRUE, row.names=NULL)
 rownames(count.table) <- count.table[,opt$id_col]
 count.table           <- count.table[,opt$count_col:ncol(count.table),drop=FALSE]
 colnames(count.table) <- gsub(opt$sample_suffix,"",colnames(count.table))
 colnames(count.table) <- gsub(pattern='\\.$', replacement='', colnames(count.table))
 ################################################
 ################################################
 ## RUN DESEQ2                                 ##
 ################################################
 ################################################
 if (file.exists(opt$outdir) == FALSE) {
    dir.create(opt$outdir, recursive=TRUE)
 }
 setwd(opt$outdir)
 samples.vec     <- colnames(count.table)
 name_components <- strsplit(samples.vec, "_")
 n_components    <- length(name_components[[1]])
 decompose       <- n_components!=1 && all(sapply(name_components, length)==n_components)
 coldata         <- data.frame(samples.vec, sample=samples.vec, row.names=1)
 if (decompose) {
    groupings        <- as.data.frame(lapply(1:n_components, function(i) sapply(name_components, "[[", i)))
    n_distinct       <- sapply(groupings, function(grp) length(unique(grp)))
    groupings        <- groupings[n_distinct!=1 & n_distinct!=length(samples.vec)]
    if (ncol(groupings)!=0) {
        names(groupings) <- paste0("Group", 1:ncol(groupings))
        coldata <- cbind(coldata, groupings)
    } else {
        decompose <- FALSE
    }
 }
 DDSFile <- paste(opt$outprefix,".dds.RData",sep="")
 counts  <- count.table[,samples.vec,drop=FALSE]
 dds     <- DESeqDataSetFromMatrix(countData=round(counts), colData=coldata, design=~ 1)
 dds     <- estimateSizeFactors(dds)
 if (min(dim(count.table))<=1)  { # No point if only one sample, or one gene
    save(dds,file=DDSFile)
    saveRDS(dds, file=sub("\\.dds\\.RData$", ".rds", DDSFile))
    warning("Not enough samples or genes in counts file for PCA.", call.=FALSE)
    quit(save = "no", status = 0, runLast = FALSE)
 }
 if (!opt$vst) {
    vst_name <- "rlog"
    rld      <- rlog(dds)
 } else {
    vst_name <- "vst"
    rld      <- varianceStabilizingTransformation(dds)
 }
 assay(dds, vst_name) <- assay(rld)
 save(dds,file=DDSFile)
 saveRDS(dds, file=sub("\\.dds\\.RData$", ".rds", DDSFile))
 ################################################
 ################################################
 ## PLOT QC                                    ##
 ################################################
 ################################################
 ##' PCA pre-processeor
 ##'
 ##' Generate all the necessary information to plot PCA from a DESeq2 object
 ##' in which an assay containing a variance-stabilised matrix of counts is
 ##' stored. Copied from DESeq2::plotPCA, but with additional ability to
 ##' say which assay to run the PCA on.
 ##'
 ##' @param object The DESeq2DataSet object.
 ##' @param ntop number of top genes to use for principla components, selected by highest row variance.
 ##' @param assay the name or index of the assay that stores the variance-stabilised data.
 ##' @return A data.frame containing the projected data alongside the grouping columns.
 ##' A 'percentVar' attribute is set which includes the percentage of variation each PC explains,
 ##' and additionally how much the variation within that PC is explained by the grouping variable.
 ##' @author Gavin Kelly
 plotPCA_vst <- function (object,  ntop = 500, assay=length(assays(object))) {
    rv         <- rowVars(assay(object, assay))
    select     <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
    pca        <- prcomp(t(assay(object, assay)[select, ]), center=TRUE, scale=FALSE)
    percentVar <- pca$sdev^2/sum(pca$sdev^2)
    df         <- cbind( as.data.frame(colData(object)), pca$x)
    #Order points so extreme samples are more likely to get label
    ord        <- order(abs(rank(df$PC1)-median(df$PC1)), abs(rank(df$PC2)-median(df$PC2)))
    df         <- df[ord,]
    attr(df, "percentVar") <- data.frame(PC=seq(along=percentVar), percentVar=100*percentVar)
    return(df)
 }
 PlotFile <- paste(opt$outprefix,".plots.pdf",sep="")
 pdf(file=PlotFile, onefile=TRUE, width=7, height=7)
 ## PCA
 ntop <- c(500, Inf)
 for (n_top_var in ntop) {
    pca.data      <- plotPCA_vst(dds, assay=vst_name, ntop=n_top_var)
    percentVar    <- round(attr(pca.data, "percentVar")$percentVar)
    plot_subtitle <- ifelse(n_top_var==Inf, "All genes", paste("Top", n_top_var, "genes"))
    pl <- ggplot(pca.data, aes(PC1, PC2, label=paste0(" ", sample, " "))) +
        geom_point() +
        geom_text(check_overlap=TRUE, vjust=0.5, hjust="inward") +
        xlab(paste0("PC1: ",percentVar[1],"% variance")) +
        ylab(paste0("PC2: ",percentVar[2],"% variance")) +
        labs(title = paste0("First PCs on ", vst_name, "-transformed data"), subtitle = plot_subtitle) +
        theme(legend.position="top",
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            panel.border = element_rect(colour = "black", fill=NA, size=1))
    print(pl)
    if (decompose) {
        pc_names <- paste0("PC", attr(pca.data, "percentVar")$PC)
        long_pc <- reshape(pca.data, varying=pc_names, direction="long", sep="", timevar="component", idvar="pcrow")
        long_pc <- subset(long_pc, component<=5)
        long_pc_grp <- reshape(long_pc, varying=names(groupings), direction="long", sep="", timevar="grouper")
        long_pc_grp <- subset(long_pc_grp, grouper<=5)
        long_pc_grp$component <- paste("PC", long_pc_grp$component)
        long_pc_grp$grouper <- paste0(long_pc_grp$grouper, c("st","nd","rd","th","th")[long_pc_grp$grouper], " prefix")
        pl <- ggplot(long_pc_grp, aes(x=Group, y=PC)) +
            geom_point() +
            stat_summary(fun=mean, geom="line", aes(group = 1)) +
            labs(x=NULL, y=NULL, subtitle = plot_subtitle, title="PCs split by sample-name prefixes") +
            facet_grid(component~grouper, scales="free_x") +
            scale_x_discrete(guide = guide_axis(n.dodge = 3))
        print(pl)
    }
 } # at end of loop, we'll be using the user-defined ntop if any, else all genes
 ## WRITE PC1 vs PC2 VALUES TO FILE
 pca.vals           <- pca.data[,c("PC1","PC2")]
 colnames(pca.vals) <- paste0(colnames(pca.vals), ": ", percentVar[1:2], '% variance')
 pca.vals           <- cbind(sample = rownames(pca.vals), pca.vals)
 write.table(pca.vals, file = paste(opt$outprefix, ".pca.vals.txt", sep=""),
            row.names = FALSE, col.names = TRUE, sep = "\t", quote = TRUE)
 ## SAMPLE CORRELATION HEATMAP
 sampleDists      <- dist(t(assay(dds, vst_name)))
 sampleDistMatrix <- as.matrix(sampleDists)
 colors           <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
 pheatmap(
    sampleDistMatrix,
    clustering_distance_rows=sampleDists,
    clustering_distance_cols=sampleDists,
    col=colors,
    main=paste("Euclidean distance between", vst_name, "of samples")
 )
 ## WRITE SAMPLE DISTANCES TO FILE
 write.table(cbind(sample = rownames(sampleDistMatrix), sampleDistMatrix),file=paste(opt$outprefix, ".sample.dists.txt", sep=""),
            row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
 dev.off()
 ################################################
 ################################################
 ## SAVE SIZE FACTORS                          ##
 ################################################
 ################################################
 SizeFactorsDir <- "size_factors/"
 if (file.exists(SizeFactorsDir) == FALSE) {
    dir.create(SizeFactorsDir, recursive=TRUE)
 }
 NormFactorsFile <- paste(SizeFactorsDir,opt$outprefix, ".size_factors.RData", sep="")
 normFactors <- sizeFactors(dds)
 save(normFactors, file=NormFactorsFile)
 for (name in names(sizeFactors(dds))) {
    sizeFactorFile <- paste(SizeFactorsDir,name, ".txt", sep="")
    write(as.numeric(sizeFactors(dds)[name]), file=sizeFactorFile)
 }
 ################################################
 ################################################
 ## R SESSION INFO                             ##
 ################################################
 ################################################
 RLogFile <- "R_sessionInfo.log"
 sink(RLogFile)
 a <- sessionInfo()
 print(a)
 sink()
 ################################################
 ################################################
 ################################################
 ################################################
--- a/src/deseq2_qc/script.sh
+++ b/src/deseq2_qc/script.sh
@@ -0,0 +1,48 @@
 #!/bin/sh
 set -eo pipefail
 if $par_deseq2_vst; then 
    par_extra_args+=" --vst TRUE"
 fi
 tolower() {
    case $1 in
        *[[:upper:]]*)
            printf "%s\n" "$1" | tr '[:upper:]' '[:lower:]'
            ;;
        *)
            printf "%s\n" "$1"
            ;;
    esac
 }
 toupper() {
    case $1 in
        *[[:lower:]]*)
            printf "%s\n" "$1" | tr '[:lower:]' '[:upper:]'
            ;;
        *)
            printf "%s\n" "$1"
            ;;
    esac
 }
 label_lower=$(tolower "$par_extra_args2")
 label_upper=$(toupper "$par_extra_args2")
 Rscript "$meta_resources_dir/deseq2_qc.r" \
    --count_file $par_counts \
    --outdir $par_deseq2_output \
    --cores ${meta_cpus:-1} \
    $par_extra_args
 if [ -f "$par_deseq2_output/R_sessionInfo.log" ]; then
    sed "s/deseq2_pca/${label_lower}_deseq2_pca/g" < $par_pca_header_multiqc > tmp.txt
    sed -i -e "s/DESeq2 PCA/${label_upper} DESeq2 PCA/g" tmp.txt
    cat tmp.txt $par_deseq2_output/*.pca.vals.txt > $par_pca_multiqc
    sed "s/deseq2_clustering/${label_lower}_deseq2_clustering/g" < $par_clustering_header_multiqc > tmp.txt
    sed -i -e "s/DESeq2 sample/${label_upper} DESeq2 sample/g" tmp.txt
    cat tmp.txt $par_deseq2_output/*.sample.dists.txt > $par_dists_multiqc
 fi
--- a/src/deseq2_qc/test.sh
+++ b/src/deseq2_qc/test.sh
@@ -0,0 +1,28 @@
 #!/bin/bash
 # Run executable
 echo "> Running $meta_functionality_name"
 "$meta_executable" \
    --counts $meta_resources_dir/counts.tsv \
    --pca_header_multiqc $meta_resources_dir/deseq2_pca_header.txt \
    --clustering_header_multiqc $meta_resources_dir/deseq2_clustering_header.txt \
    --extra_args "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 2" \
    --extra_args2 "test" \
    --deseq2_output "deseq2/" \
    --pca_multiqc pca.vals_mqc.tsv \
    --dists_multiqc sample.dists_mqc.tsv
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Check whether output exists"
 [ ! -d "deseq2" ] && echo "deseq2 was not created" && exit 1
 [ -z "$(ls -A 'deseq2')" ] && echo "deseq2 is empty" && exit 1
 [ ! -f "pca.vals_mqc.tsv" ] && echo "pca.vals_mqc.tsv was not created" && exit 1
 [ ! -s "pca.vals_mqc.tsv" ] && echo "pca.vals_mqc.tsv is empty" && exit 1
 [ ! -f "sample.dists_mqc.tsv" ] && echo "sample.dists_mqc.tsv was not created" && exit 1
 [ ! -s "sample.dists_mqc.tsv" ] && echo "sample.dists_mqc.tsv is empty" && exit 1
 exit 0
--- a/src/dupradar/config.vsh.yaml
+++ b/src/dupradar/config.vsh.yaml
@@ -0,0 +1,118 @@
 name: "dupradar"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/dupradar.nf]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Assessment of duplication rates in RNA-Seq datasets
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--id"
    type: string
    description: Sample ID
  - name: "--input"
    type: file 
    required: true
    description: path to input alignment file in BAM format
  - name: "--gtf_annotation"
    type: file 
    required: true
    description: path to GTF annotation file.
  - name: "--paired"
    type: boolean
    description: add flag if input alignment file consists of paired reads
  - name: "--strandedness"
    type: string
    required: false
    choices: ["forward", "reverse", "unstranded"]
    description: strandedness of input bam file reads (forward, reverse or unstranded (default, applicable to paired reads))
 - name: "Output"
  arguments: 
  - name: "--output_dupmatrix"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.dup_matrix.txt
    description: path to output file (txt) of duplicate tag counts
  - name: "--output_dup_intercept_mqc"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.dup_intercept_mqc.txt
    description: path to output file (txt) of multiqc intercept value DupRadar
  - name: "--output_duprate_exp_boxplot"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.duprate_exp_boxplot.pdf
    description: path to output file (pdf) of distribution of expression box plot
  - name: "--output_duprate_exp_densplot"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.duprate_exp_densityplot.pdf
    description: path to output file (pdf) of 2D density scatter plot of duplicate tag counts
  - name: "--output_duprate_exp_denscurve_mqc"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.duprate_exp_density_curve_mqc.txt
    description: path to output file (pdf) of density curve of gene duplication multiqc
  - name: "--output_expression_histogram"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.expression_hist.pdf
    description: path to output file (pdf) of distribution of RPK values per gene histogram
  - name: "--output_intercept_slope"
    type: file
    direction: output
    required: false
    must_exist: true
    default: $id.intercept_slope.txt
    description: output file (txt) with progression of duplication rate value 
 resources:
  - type: bash_script
    path: script.sh
  # Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/dupradar.r
  - path: dupradar.r
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
  - path: /testData/unit_test_resources/genes.gtf
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:   
      - type: apt
        packages: [ r-base ]
      - type: r
        bioc: [ dupRadar ]
 runners:
  - type: executable  
  - type: nextflow
--- a/src/dupradar/dupradar.r
+++ b/src/dupradar/dupradar.r
@@ -0,0 +1,154 @@
 #!/usr/bin/env Rscript
 # Command line argument processing
 args = commandArgs(trailingOnly=TRUE)
 if (length(args) < 5) {
    stop("Usage: dupRadar.r <input.bam> <sample_id> <annotation.gtf> <strandDirection:0=unstranded/1=forward/2=reverse> <paired/single> <nbThreads> <R-package-location (optional)>", call.=FALSE)
 }
 message("paired_end is", args[5])
 message("the type is is", class(args[5]))
 input_bam <- args[1]
 output_prefix <- args[2]
 annotation_gtf <- args[3]
 stranded <- as.numeric(args[4])
 paired_end <- ifelse(args[5] == "true", TRUE, FALSE)
 threads <- as.numeric(args[6])
 bamRegex <- "(.+)\\.bam$"
 if(!(grepl(bamRegex, input_bam) && file.exists(input_bam) &&  (!file.info(input_bam)$isdir))) stop("First argument '<input.bam>' must be an existing file (not a directory) with '.bam' extension...")
 if(!(file.exists(annotation_gtf) &&  (!file.info(annotation_gtf)$isdir))) stop("Third argument '<annotation.gtf>' must be an existing file (and not a directory)...")
 if(is.na(stranded) || (!(stranded %in% (0:2)))) stop("Fourth argument <strandDirection> must be a numeric value in 0(unstranded)/1(forward)/2(reverse)...")
 if(is.na(threads) || (threads<=0)) stop("Fifth argument <nbThreads> must be a strictly positive numeric value...")
 # Debug messages (stderr)
 message("Input bam      (Arg 1): ", input_bam)
 message("Output basename(Arg 2): ", output_prefix)
 message("Input gtf      (Arg 3): ", annotation_gtf)
 message("Strandness     (Arg 4): ", c("unstranded", "forward", "reverse")[stranded+1])
 message("paired_end     (Arg 5): ", paired_end)
 message("Nb threads     (Arg 6): ", threads)
 message("R package loc. (Arg 7): ", ifelse(length(args) > 4, args[5], "Not specified"))
 # Load / install packages
 if (length(args) > 5) { .libPaths( c( args[6], .libPaths() ) ) }
 if (!require("dupRadar")){
    source("http://bioconductor.org/biocLite.R")
    biocLite("dupRadar", suppressUpdates=TRUE)
    library("dupRadar")
 }
 if (!require("parallel")) {
    install.packages("parallel", dependencies=TRUE, repos='http://cloud.r-project.org/')
    library("parallel")
 }
 # Duplicate stats
 dm <- analyzeDuprates(input_bam, annotation_gtf, stranded, paired_end, threads)
 write.table(dm, file=paste(output_prefix, "_dupMatrix.txt", sep=""), quote=F, row.name=F, sep="\t")
 # 2D density scatter plot
 pdf(paste0(output_prefix, "_duprateExpDens.pdf"))
 duprateExpDensPlot(DupMat=dm)
 title("Density scatter plot")
 mtext(output_prefix, side=3)
 dev.off()
 fit <- duprateExpFit(DupMat=dm)
 cat(
    paste("- dupRadar Int (duprate at low read counts):", fit$intercept),
    paste("- dupRadar Sl (progression of the duplication rate):", fit$slope),
    fill=TRUE, labels=output_prefix,
    file=paste0(output_prefix, "_intercept_slope.txt"), append=FALSE
 )
 # Create a multiqc file dupInt
 sample_name <- gsub("Aligned.sortedByCoord.out.markDups", "", output_prefix)
 line="#id: DupInt
 #plot_type: 'generalstats'
 #pconfig:
 #    dupRadar_intercept:
 #        title: 'dupInt'
 #        namespace: 'DupRadar'
 #        description: 'Intercept value from DupRadar'
 #        max: 100
 #        min: 0
 #        scale: 'RdYlGn-rev'
 #        format: '{:.2f}%'
 Sample dupRadar_intercept"
 write(line,file=paste0(output_prefix, "_dup_intercept_mqc.txt"),append=TRUE)
 write(paste(sample_name, fit$intercept),file=paste0(output_prefix, "_dup_intercept_mqc.txt"),append=TRUE)
 # Get numbers from dupRadar GLM
 curve_x <- sort(log10(dm$RPK))
 curve_y = 100*predict(fit$glm, data.frame(x=curve_x), type="response")
 # Remove all of the infinite values
 infs = which(curve_x %in% c(-Inf,Inf))
 curve_x = curve_x[-infs]
 curve_y = curve_y[-infs]
 # Reduce number of data points
 curve_x <- curve_x[seq(1, length(curve_x), 10)]
 curve_y <- curve_y[seq(1, length(curve_y), 10)]
 # Convert x values back to real counts
 curve_x = 10^curve_x
 # Write to file
 line="#id: dupradar
 #section_name: 'DupRadar'
 #section_href: 'bioconductor.org/packages/release/bioc/html/dupRadar.html'
 #description: \"provides duplication rate quality control for RNA-Seq datasets. Highly expressed genes can be expected to have a lot of duplicate reads, but high numbers of duplicates at low read counts can indicate low library complexity with technical duplication.
 #    This plot shows the general linear models - a summary of the gene duplication distributions. \"
 #pconfig:
 #    title: 'DupRadar General Linear Model'
 #    xLog: True
 #    xlab: 'expression (reads/kbp)'
 #    ylab: '% duplicate reads'
 #    ymax: 100
 #    ymin: 0
 #    tt_label: '<b>{point.x:.1f} reads/kbp</b>: {point.y:,.2f}% duplicates'
 #    xPlotLines:
 #        - color: 'green'
 #          dashStyle: 'LongDash'
 #          label:
 #                style: {color: 'green'}
 #                text: '0.5 RPKM'
 #                verticalAlign: 'bottom'
 #                y: -65
 #          value: 0.5
 #          width: 1
 #        - color: 'red'
 #          dashStyle: 'LongDash'
 #          label:
 #                style: {color: 'red'}
 #                text: '1 read/bp'
 #                verticalAlign: 'bottom'
 #                y: -65
 #          value: 1000
 #          width: 1"
 write(line,file=paste0(output_prefix, "_duprateExpDensCurve_mqc.txt"),append=TRUE)
 write.table(
    cbind(curve_x, curve_y),
    file=paste0(output_prefix, "_duprateExpDensCurve_mqc.txt"),
    quote=FALSE, row.names=FALSE, col.names=FALSE, append=TRUE,
 )
 # Distribution of expression box plot
 pdf(paste0(output_prefix, "_duprateExpBoxplot.pdf"))
 duprateExpBoxplot(DupMat=dm)
 title("Percent Duplication by Expression")
 mtext(output_prefix, side=3)
 dev.off()
 # Distribution of RPK values per gene
 pdf(paste0(output_prefix, "_expressionHist.pdf"))
 expressionHist(DupMat=dm)
 title("Distribution of RPK values per gene")
 mtext(output_prefix, side=3)
 dev.off()
 # Print sessioninfo to standard out
 print(output_prefix)
 citation("dupRadar")
 sessionInfo()
--- a/src/dupradar/script.sh
+++ b/src/dupradar/script.sh
@@ -0,0 +1,28 @@
 #!/bin/bash
 set -exo pipefail 
 function num_strandness {
    if [ $par_strandedness == 'unstranded' ]; then echo 0
    elif [ $par_strandedness == 'forward' ]; then echo 1
    elif [ $par_strandedness == 'reverse' ]; then echo 2
    else echo "strandedness must be unstranded, forward or reverse." && \
        exit 1
    fi
 }
 Rscript "$meta_resources_dir/dupradar.r" \
    $par_input \
    $par_id \
    $par_gtf_annotation \
    $(num_strandness) \
    $par_paired \
    ${meta_cpus:-1}
 mv "$par_id"_dupMatrix.txt $par_output_dupmatrix
 mv "$par_id"_dup_intercept_mqc.txt $par_output_dup_intercept_mqc
 mv "$par_id"_duprateExpBoxplot.pdf $par_output_duprate_exp_boxplot
 mv "$par_id"_duprateExpDens.pdf $par_output_duprate_exp_densplot
 mv "$par_id"_duprateExpDensCurve_mqc.txt $par_output_duprate_exp_denscurve_mqc
 mv "$par_id"_expressionHist.pdf $par_output_expression_histogram
 mv "$par_id"_intercept_slope.txt $par_output_intercept_slope
--- a/src/dupradar/test.sh
+++ b/src/dupradar/test.sh
@@ -0,0 +1,51 @@
 #!/bin/bash
 # define input and output for script
 input_bam="$meta_resources_dir/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam"
 input_gtf="$meta_resources_dir/genes.gtf"
 output_dupmatrix="dup_matrix.txt"
 output_dup_intercept_mqc="dup_intercept_mqc.txt"
 output_duprate_exp_boxplot="duprate_exp_boxplot.pdf"
 output_duprate_exp_densplot="duprate_exp_densityplot.pdf"
 output_duprate_exp_denscurve_mqc="duprate_exp_density_curve_mqc.pdf"
 output_expression_histogram="expression_hist.pdf"
 output_intercept_slope="intercept_slope.txt"
 # Run executable
 echo "> Running $meta_functionality_name for unpaired reads, writing to tmpdir $tmpdir."
 "$meta_executable" \
    --input "$input_bam" \
    --id "test" \
    --gtf_annotation "$input_gtf" \
    --strandedness "forward" \
    --paired false \
    --output_dupmatrix $output_dupmatrix \
    --output_dup_intercept_mqc $output_dup_intercept_mqc \
    --output_duprate_exp_boxplot $output_duprate_exp_boxplot \
    --output_duprate_exp_densplot $output_duprate_exp_densplot \
    --output_duprate_exp_denscurve_mqc $output_duprate_exp_denscurve_mqc \
    --output_expression_histogram $output_expression_histogram \
    --output_intercept_slope $output_intercept_slope
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> asserting output has been created for paired read input"
 [ ! -f "$output_dupmatrix" ] && echo "$output_dupmatrix was not created" && exit 1
 [ ! -s "$output_dupmatrix" ] && echo "$output_dupmatrix is empty" && exit 1
 [ ! -f "$output_dup_intercept_mqc" ] && echo "$output_dup_intercept_mqc was not created" && exit 1
 [ ! -s "$output_dup_intercept_mqc" ] && echo "$output_dup_intercept_mqc is empty" && exit 1
 [ ! -f "$output_duprate_exp_boxplot" ] && echo "$output_duprate_exp_boxplot was not created" && exit 1
 [ ! -s "$output_duprate_exp_boxplot" ] && echo "$output_duprate_exp_boxplot is empty" && exit 1
 [ ! -f "$output_duprate_exp_densplot" ] && echo "$output_duprate_exp_densplot was not created" && exit 1
 [ ! -s "$output_duprate_exp_densplot" ] && echo "$output_duprate_exp_densplot is empty" && exit 1
 [ ! -f "$output_duprate_exp_denscurve_mqc" ] && echo "$output_duprate_exp_denscurve_mqc was not created" && exit 1
 [ ! -s "$output_duprate_exp_denscurve_mqc" ] && echo "$output_duprate_exp_denscurve_mqc is empty" && exit 1
 [ ! -f "$output_expression_histogram" ] && echo "$output_expression_histogram was not created" && exit 1
 [ ! -s "$output_expression_histogram" ] && echo "$output_expression_histogram is empty" && exit 1
 [ ! -f "$output_intercept_slope" ] && echo "$output_intercept_slope was not created" && exit 1
 [ ! -s "$output_intercept_slope" ] && echo "$output_intercept_slope is empty" && exit 1
 exit 0
--- a/src/fastqc/config.vsh.yaml
+++ b/src/fastqc/config.vsh.yaml
@@ -0,0 +1,71 @@
 name: "fastqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/fastqc/main.nf, modules/nf-core/fastqc/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Fastqc component, please see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. This component can take one or more files (by means of shell globbing) or a complete directory.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--paired"
    type: boolean 
    required: false
    default: false
    description: Paired fastq files or not?
  - name: "--input"
    type: file
    required: true
    multiple: true
    multiple_sep: ","
    description: Input fastq files, either one or two (paired)
    example: sample.fastq
 - name: "Output"
  arguments:   
  - name: "--fastqc_html_1"
    type: file
    direction: output
    description: FastQC HTML report for read 1.
    default: $id.read_1.fastqc.html
  - name: "--fastqc_html_2"
    type: file
    direction: output
    description: FastQC HTML report for read 2.
    required: false
    must_exist: false
    default: $id.read_2.fastqc.html
  - name: "--fastqc_zip_1"
    type: file
    direction: output
    description: FastQC report archive for read 1.
    default: $id.read_1.fastqc.zip
  - name: "--fastqc_zip_2"
    type: file
    direction: output
    description: FastQC report archive for read 2.
    required: false
    must_exist: false
    default: $id.read_2.fastqc.zip
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [ fastqc ]
 runners:
  - type: executable
  - type: nextflow
--- a/src/fastqc/script.sh
+++ b/src/fastqc/script.sh
@@ -0,0 +1,39 @@
 #!/bin/bash
 set -eo pipefail
 function clean_up {
  rm -rf "$tmpdir"
 }
 trap clean_up EXIT
 tmpdir=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXXXX")
 IFS="," read -ra input <<< $par_input
 count=${#input[@]}
 if $par_paired; then
  echo "Paired - $count"
  if [ $count -ne 2 ]; then
    echo "Paired end input requires two files"
    exit 1
  fi
 else
  echo "Not Paired - $count"
  if [ $count -ne 1 ]; then
    echo "Single end input requires one file"
    exit 1
  fi
 fi
 fastqc -o $tmpdir ${input[*]} 
 file1=$(basename -- "${input[0]}")
 read1="${file1%.fastq*}"
 file2=$(basename -- "${input[1]}")
 read2="${file2%.fastq*}"
 [[ -e "${tmpdir}/${read1}_fastqc.html" ]] && cp "${tmpdir}/${read1}_fastqc.html" $par_fastqc_html_1
 [[ -e "${tmpdir}/${read2}_fastqc.html" ]] && cp "${tmpdir}/${read2}_fastqc.html" $par_fastqc_html_2
 [[ -e "${tmpdir}/${read1}_fastqc.zip" ]] && cp "${tmpdir}/${read1}_fastqc.zip" $par_fastqc_zip_1
 [[ -e "${tmpdir}/${read2}_fastqc.zip" ]] && cp "${tmpdir}/${read2}_fastqc.zip" $par_fastqc_zip_2
--- a/src/fastqc/test.sh
+++ b/src/fastqc/test.sh
@@ -0,0 +1,35 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 echo ">>> Testing for paired-end reads"
 "$meta_executable" \
    --paired true \
    --input $meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz \
    --fastqc_html_1  SRR6357070_1.html \
    --fastqc_html_2  SRR6357070_2.html \
    --fastqc_zip_1  SRR6357070_1.zip \
    --fastqc_zip_2  SRR6357070_2.zip
 echo ">> Checking if the correct files are present"
 [[ ! -f "SRR6357070_1.html" ]] || [[ ! -f "SRR6357070_2.html" ]] && echo "Report file missing" && exit 1
 [[ ! -s "SRR6357070_1.html" ]] || [[ ! -s "SRR6357070_2.html" ]] && echo "Report file empty" && exit 1
 [[ ! -f "SRR6357070_1.zip" ]] || [[ ! -f "SRR6357070_2.zip" ]] && echo "Zip file missing" && exit 1
 rm SRR6357070_1.html SRR6357070_2.html SRR6357070_1.zip SRR6357070_2.zip
 echo ">>> Testing for single-end reads"
 "$meta_executable" \
    --paired false \
    --input $meta_resources_dir/SRR6357070_1.fastq.gz \
    --fastqc_html_1 SRR6357070_1.html \
    --fastqc_zip_1 SRR6357070_1.zip 
 echo ">> Checking if the correct files are present"
 [ ! -f "SRR6357070_1.html" ] && echo "Report file missing" && exit 1
 [ ! -s "SRR6357070_1.html" ] && echo "Report file empty" && exit 1
 [ ! -f "SRR6357070_1.zip" ] && echo "Zip file missing" && exit 1
 echo ">>> Test finished successfully"
 exit 0
--- a/src/fq_subsample/config.vsh.yaml
+++ b/src/fq_subsample/config.vsh.yaml
@@ -0,0 +1,66 @@
 name: "fq_subsample"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/fq/subsample/main.nf, modules/nf-core/fq/subsample/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: | 
  fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args
 argument_groups: 
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file
    description: Input fastq files to subsample
    multiple: true
    multiple_sep: ";"
  - name: "--extra_args"
    type: string
    default: ""
    description: Extra arguments to pass to fq subsample
 - name: "Input"
  arguments: 
  - name: "--output_1"
    type: file
    direction: output
    default: $id.read_1.subsampled.fastq
    description: Sampled read 1 fastq files
  - name: "--output_2"
    type: file
    must_exist: false
    direction: output
    default: $id.read_2.subsampled.fastq
    description: Sampled read 2 fastq files
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
 engines:
  - type: docker
    image: ubuntu:22.04
    setup: 
      - type: docker
        env: 
            - TZ=Europe/Brussels
        run: | 
          ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \
          apt-get update && \
          apt-get install -y --no-install-recommends build-essential git-all curl && \
          curl https://sh.rustup.rs -sSf | sh -s -- -y && \
          . "$HOME/.cargo/env" && \
          git clone --depth 1 --branch v0.12.0 https://github.com/stjude-rust-labs/fq.git && \
          mv fq /usr/local/ && cd /usr/local/fq && \
          cargo install --locked --path . && \
          mv /usr/local/fq/target/release/fq /usr/local/bin/
 runners:
  - type: executable
  - type: nextflow
--- a/src/fq_subsample/script.sh
+++ b/src/fq_subsample/script.sh
@@ -0,0 +1,23 @@
 #!/bin/bash
 set -eo pipefail
 IFS=";" read -ra input <<< $par_input
 n_fastq=${#input[@]}
 required_args=("-p" "--probability" "-n" "--read-count")
 for arg in "${required_args[@]}"; do
    if [[ "$par_extra_args" == *"$arg"* ]]; then
        echo "FQ/SUBSAMPLE requires either --probability (-p) or --record-count (-n) to be specified with --extra_args"
        exit 1
    fi
 done
 if [ $n_fastq -eq 1 ]; then
    fq subsample $par_extra_args ${input[*]} --r1-dst $par_output_1
 elif [ $n_fastq -eq 2 ]; then
    fq subsample $par_extra_args ${input[*]} --r1-dst $par_output_1 --r2-dst $par_output_2
 else 
    echo "FQ/SUBSAMPLE only accepts 1 or 2 FASTQ files!"
    exit 1
 fi
--- a/src/fq_subsample/test.sh
+++ b/src/fq_subsample/test.sh
@@ -0,0 +1,32 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 echo ">>> Testing for paired-end reads"
 "$meta_executable" \
    --input "$meta_resources_dir/SRR6357070_1.fastq.gz;$meta_resources_dir/SRR6357070_2.fastq.gz" \
    --extra_args  '--record-count 1000000 --seed 1' \
    --output_1  SRR6357070_1.subsampled.fastq.gz \
    --output_2  SRR6357070_2.subsampled.fastq.gz 
 echo ">> Checking if the correct files are present"
 [ ! -f "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file for read 1 is missing!" && exit 1
 [ ! -s "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty!" && exit 1
 [ ! -f "SRR6357070_2.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file for read 2 is missing" && exit 1
 [ ! -s "SRR6357070_2.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty" && exit 1
 rm SRR6357070_1.subsampled.fastq.gz SRR6357070_2.subsampled.fastq.gz
 echo ">>> Testing for single-end reads"
 "$meta_executable" \
    --input $meta_resources_dir/SRR6357070_1.fastq.gz \
    --extra_args  '--record-count 1000000 --seed 1' \
    --output_1  SRR6357070_1.subsampled.fastq.gz 
 echo ">> Checking if the correct files are present"
 [ ! -f "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is missing" && exit 1
 [ ! -s "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty" && exit 1
 echo ">>> Tests finished successfully"
 exit 0
--- a/src/getchromsizes/config.vsh.yaml
+++ b/src/getchromsizes/config.vsh.yaml
@@ -0,0 +1,57 @@
 name: "getchromsizes"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/custom/getchromsizes/main.nf, modules/nf-core/custom/getchromsizes/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: | 
  Generates a FASTA file of chromosome sizes and a fasta index file.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--fasta"
    type: file
    description: Genome fasta files
 - name: "Output"
  arguments: 
  - name: "--sizes"
    type: file
    direction: output
    description: File containing chromosome lengths
  - name: "--fai"
    type: file
    description: FASTA index file
    direction: output
  - name: "--gzi" # optional
    type: file
    description: Optional gzip index file for compressed inputs
    direction: output
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genome.fasta
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:
    - type: docker
      run: | 
        apt-get update && \
        apt-get install -y autoconf automake make gcc perl zlib1g-dev libbz2-dev liblzma-dev libcurl4-gnutls-dev libssl-dev libncurses5-dev curl bzip2 && \
        curl -fsSL https://github.com/samtools/samtools/releases/download/1.18/samtools-1.18.tar.bz2 -o samtools-1.18.tar.bz2 && \
        tar -xjf samtools-1.18.tar.bz2 && \
        rm samtools-1.18.tar.bz2 && \
        cd samtools-1.18 && \
        ./configure && \
        make && \
        make install
 runners:
  - type: executable
  - type: nextflow
--- a/src/getchromsizes/script.sh
+++ b/src/getchromsizes/script.sh
@@ -0,0 +1,9 @@
 #!/bin/bash
 set -eo pipefail
 filename="$(basename -- $par_fasta)"
 samtools faidx $par_fasta
 cut -f 1,2 "$par_fasta.fai" > $par_sizes
 mv "$par_fasta.fai" $par_fai
--- a/src/getchromsizes/test.sh
+++ b/src/getchromsizes/test.sh
@@ -0,0 +1,16 @@
 #!/bin/bash
 echo "Testing $meta_functionality_name"
 "$meta_executable" \
  --fasta "$meta_resources_dir/genome.fasta" \
  --sizes genome.fasta.sizes \
  --fai genome.fasta.fai
 echo ">>> Checking whether output exists"
 [ ! -f "genome.fasta.sizes" ] && echo "Chromosome lengths file does not exist!" && exit 1
 [ ! -s "genome.fasta.sizes" ] && echo "Chromosome lengths file is empty!" && exit 1
 [ ! -f "genome.fasta.fai" ] && echo "FASTA index file does not exist!" && exit 1
 [ ! -s "genome.fasta.fai" ] && echo "FASTA index file does is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/gtf2bed/config.vsh.yaml
+++ b/src/gtf2bed/config.vsh.yaml
@@ -0,0 +1,45 @@
 name: "gtf2bed"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/gtf2bed.nf]
    last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
 description: |
  Create BED annotation file from GTF.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--gtf"
    type: file 
    required: true
    description: A reference file in GTF format.
 - name: " Output"
  arguments:  
  - name: "--bed_output"
    type: file
    direction: output
    required: true
    description: BED file resulting from the conversion of the GTF input file.
 resources:
  - type: bash_script
    path: script.sh
  # Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/gtf2bed
  - path: gtf2bed.pl
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genes.gtf.gz
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:  
      - type: apt
        packages: [perl]
 runners:
  - type: executable
  - type: nextflow
--- a/src/gtf2bed/gtf2bed.pl
+++ b/src/gtf2bed/gtf2bed.pl
@@ -0,0 +1,122 @@
 #!/usr/bin/env perl
 # Copyright (c) 2011 Erik Aronesty (erik@q32.com)
 #
 # Permission is hereby granted, free of charge, to any person obtaining a copy
 # of this software and associated documentation files (the "Software"), to deal
 # in the Software without restriction, including without limitation the rights
 # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 # copies of the Software, and to permit persons to whom the Software is
 # furnished to do so, subject to the following conditions:
 #
 # The above copyright notice and this permission notice shall be included in
 # all copies or substantial portions of the Software.
 #
 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 # THE SOFTWARE.
 #
 # ALSO, IT WOULD BE NICE IF YOU LET ME KNOW YOU USED IT.
 use Getopt::Long;
 my $extended;
 GetOptions("x"=>\$extended);
 $in = shift @ARGV;
 my $in_cmd =($in =~ /\.gz$/ ? "gunzip -c $in|" : $in =~ /\.zip$/ ? "unzip -p $in|" : "$in") || die "Can't open $in: $!\n";
 open IN, $in_cmd;
 while (<IN>) {
    $gff = 2 if /^##gff-version 2/;
    $gff = 3 if /^##gff-version 3/;
    next if /^#/ && $gff;
    s/\s+$//;
    # 0-chr 1-src 2-feat 3-beg 4-end 5-scor 6-dir 7-fram 8-attr
    my @f = split /\t/;
    if ($gff) {
        # most ver 2's stick gene names in the id field
        ($id) = $f[8]=~ /\bID="([^"]+)"/;
        # most ver 3's stick unquoted names in the name field
        ($id) = $f[8]=~ /\bName=([^";]+)/ if !$id && $gff == 3;
    } else {
        ($id) = $f[8]=~ /transcript_id "([^"]+)"/;
    }
    next unless $id && $f[0];
    if ($f[2] eq 'exon') {
        die "no position at exon on line $." if ! $f[3];
        # gff3 puts :\d in exons sometimes
        $id =~ s/:\d+$// if $gff == 3;
        push @{$exons{$id}}, \@f;
        # save lowest start
        $trans{$id} = \@f if !$trans{$id};
    } elsif ($f[2] eq 'start_codon') {
        #optional, output codon start/stop as "thick" region in bed
        $sc{$id}->[0] = $f[3];
    } elsif ($f[2] eq 'stop_codon') {
        $sc{$id}->[1] = $f[4];
    } elsif ($f[2] eq 'miRNA' ) {
        $trans{$id} = \@f if !$trans{$id};
        push @{$exons{$id}}, \@f;
    }
 }
 for $id (
    # sort by chr then pos
    sort {
        $trans{$a}->[0] eq $trans{$b}->[0] ?
        $trans{$a}->[3] <=> $trans{$b}->[3] :
        $trans{$a}->[0] cmp $trans{$b}->[0]
    } (keys(%trans)) ) {
        my ($chr, undef, undef, undef, undef, undef, $dir, undef, $attr, undef, $cds, $cde) = @{$trans{$id}};
        my ($cds, $cde);
        ($cds, $cde) = @{$sc{$id}} if $sc{$id};
        # sort by pos
        my @ex = sort {
            $a->[3] <=> $b->[3]
        } @{$exons{$id}};
        my $beg = $ex[0][3];
        my $end = $ex[-1][4];
        if ($dir eq '-') {
            # swap
            $tmp=$cds;
            $cds=$cde;
            $cde=$tmp;
            $cds -= 2 if $cds;
            $cde += 2 if $cde;
        }
        # not specified, just use exons
        $cds = $beg if !$cds;
        $cde = $end if !$cde;
        # adjust start for bed
        --$beg; --$cds;
        my $exn = @ex;												# exon count
        my $exst = join ",", map {$_->[3]-$beg-1} @ex;				# exon start
        my $exsz = join ",", map {$_->[4]-$_->[3]+1} @ex;			# exon size
        my $gene_id;
        my $extend = "";
        if ($extended) {
            ($gene_id) = $attr =~ /gene_name "([^"]+)"/;
            ($gene_id) = $attr =~ /gene_id "([^"]+)"/ unless $gene_id;
            $extend="\t$gene_id";
        }
        # added an extra comma to make it look exactly like ucsc's beds
        print "$chr\t$beg\t$end\t$id\t0\t$dir\t$cds\t$cde\t0\t$exn\t$exsz,\t$exst,$extend\n";
 }
 close IN;
--- a/src/gtf2bed/script.sh
+++ b/src/gtf2bed/script.sh
@@ -0,0 +1,5 @@
 #!/bin/bash
 set -eo pipefail 
 perl "$meta_resources_dir/gtf2bed.pl" $par_gtf > $par_bed_output
--- a/src/gtf2bed/test.sh
+++ b/src/gtf2bed/test.sh
@@ -0,0 +1,15 @@
 #!/bin/bash
 gunzip "$meta_resources_dir/genes.gtf.gz"
 echo ">>> Testing $meta_functionality_name"
 "$meta_executable" \
  --gtf "$meta_resources_dir/genes.gtf" \
  --bed_output genes.bed
 echo ">>> Check whether output exists"
 [ ! -f "genes.bed" ] && echo "BED output file does not exist!" && exit 1
 [ ! -s "genes.bed" ] && echo "BED output file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/gtf_filter/config.vsh.yaml
+++ b/src/gtf_filter/config.vsh.yaml
@@ -0,0 +1,45 @@
 name: "gtf_filter"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/gtf_filter.nf]
    last_sha: 1c6012ecbb087014ea4b8f0f3d39b874850277a8
 description: | 
  Filters a GTF file based on sequence names in a FASTA file.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--fasta"
    type: file
    description: Genome fasta file
  - name: "--gtf"
    type: file
    description: GTF file
  - name: "--skip_transcript_id_check"
    type: boolean_true
    description: Skip checking for transcript IDs in the GTF file.
 - name: " Output"
  arguments:
  - name: "--filtered_gtf"
    type: file
    direction: output
    description: Filtered GTF file containing only sequences in the FASTA file
 resources:
  - type: python_script
    path: script.py
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genome.fasta
  - path: /testData/minimal_test/reference/genes.gtf.gz
 engines:
  - type: docker
    image: python 
 runners:
  - type: executable
  - type: nextflow
--- a/src/gtf_filter/script.py
+++ b/src/gtf_filter/script.py
@@ -0,0 +1,47 @@
 # Adapted from https://github.com/nf-core/rnaseq/blob/3.14.0/bin/filter_gtf.py
 import os
 import sys
 import re
 import statistics
 from typing import Set
 def extract_fasta_seq_names(fasta_name: str) -> Set[str]:
    """Extracts the sequence names from a FASTA file."""
    with open(fasta_name) as fasta:
        return {line[1:].split(None, 1)[0] for line in fasta if line.startswith(">")}
 def tab_delimited(file: str) -> float:
    """Check if file is tab-delimited and return median number of tabs."""
    with open(file, "r") as f:
        data = f.read(102400)
        return statistics.median(line.count("\t") for line in data.split("\n"))
 def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
    """Filter GTF file based on FASTA sequence names."""
    if tab_delimited(gtf_in) != 8:
        raise ValueError("Invalid GTF file: Expected 9 tab-separated columns.")
    seq_names_in_genome = extract_fasta_seq_names(fasta)
    print(f"Extracted chromosome sequence names from {fasta}")
    print("All sequence IDs from FASTA: " + ", ".join(sorted(seq_names_in_genome)))
    seq_names_in_gtf = set()
    try:
        with open(gtf_in) as gtf, open(filtered_gtf_out, "w") as out:
            line_count = 0
            for line in gtf:
                seq_name = line.split("\t")[0]
                seq_names_in_gtf.add(seq_name)  # Add sequence name to the set
                if seq_name in seq_names_in_genome:
                    if skip_transcript_id_check or re.search(r'transcript_id "([^"]+)"', line):
                        out.write(line)
                        line_count += 1
            if line_count == 0:
                raise ValueError("All GTF lines removed by filters")
    except IOError as e:
        print(f"File operation failed: {e}")
        return
    print("All sequence IDs from GTF: " + ", ".join(sorted(seq_names_in_gtf)))
    print(f"Extracted {line_count} matching sequences from {gtf_in} into {filtered_gtf_out}")
 filter_gtf(par["fasta"], par["gtf"], par["filtered_gtf"], par["skip_transcript_id_check"])
--- a/src/gtf_filter/test.sh
+++ b/src/gtf_filter/test.sh
@@ -0,0 +1,16 @@
 #!/bin/bash
 gunzip "$meta_resources_dir/genes.gtf.gz"
 echo ">>>Testing $metat_functionality_name"
 "$meta_executable" \
  --fasta "$meta_resources_dir/genome.fasta" \
  --gtf "$meta_resources_dir/genes.gtf" \
  --filtered_gtf filtered_genes.gtf
 echo ">>> Check whether output exists"
 [ ! -f "filtered_genes.gtf" ] && echo "Filtered GTF file does not exist!" && exit 1
 [ ! -s "filtered_genes.gtf" ] && echo "Filtered GTF file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/gunzip/config.vsh.yaml
+++ b/src/gunzip/config.vsh.yaml
@@ -0,0 +1,42 @@
 name: "gunzip"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/gunzip/main.nf, modules/nf-core/gunzip/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Compress or uncompress a file or list of files.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--input"
    type: file 
    required: true
    description: Path of file to be uncompressed
 - name: "Output"
  arguments:  
  - name: "--output"
    type: file
    direction: output
    required: true
    description: Decompressed file. 
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/genes.gff.gz
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [ gzip ]
 runners:
  - type: executable
  - type: nextflow
--- a/src/gunzip/script.sh
+++ b/src/gunzip/script.sh
@@ -0,0 +1,11 @@
 #!/bin/bash
 set -eo pipefail
 filename="$(basename -- "$par_input")"
 if [ ${filename##*.} == "gz" ]; then
    gunzip -c $par_input > $par_output
 else
    cat $par_input > $par_output
 fi
--- a/src/gunzip/test.sh
+++ b/src/gunzip/test.sh
@@ -0,0 +1,22 @@
 #!/bin/bash
 # define input and output for script
 input="$meta_resources_dir/genes.gff.gz"
 output="genes.gff"
 # run executable and tests
 echo "> Running $meta_functionality_name."
 "$meta_executable" \
    --input "$input" \
    --output "$output"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Checking whether output can be found and has content"
 [ ! -f "$output" ] && echo "$output file missing" && exit 1
 [ ! -s "$output" ] && echo "$output file is empty" && exit 1
 exit 0
--- a/src/kallisto/kallisto_index/config.vsh.yaml
+++ b/src/kallisto/kallisto_index/config.vsh.yaml
@@ -0,0 +1,49 @@
 name: kallisto_index
 namespace: kallisto
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/kallisto/index/main.nf, modules/nf-core/kallisto/index/meta.yml]
    last_sha: c0816976384d5e7ee6079c29c45958df1ffa0ee4
 description: | 
  Create Kallisto index.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--transcriptome_fasta"
    type: file
  - name: "--pseudo_aligner_kmer_size"
    type: integer
    description: Kmer length passed to indexing step of pseudoaligners.
 - name: "Output"
  arguments:
  - name: "--kallisto_index"
    type: file
    direction: output
    default: Kallisto_index
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/transcriptome.fasta
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: docker
        run: |
          apt-get update && \
          apt-get install -y --no-install-recommends wget && \
          wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
          tar -xzf kallisto_linux-v0.50.1.tar.gz && \
          mv kallisto/kallisto /usr/local/bin/
 runners:
  - type: executable
  - type: nextflow  
--- a/src/kallisto/kallisto_index/script.sh
+++ b/src/kallisto/kallisto_index/script.sh
@@ -0,0 +1,8 @@
 #!/bin/bash
 set -eo pipefail
 kallisto index \
    ${par_pseudo_aligner_kmer_size:+-k $par_pseudo_aligner_kmer_size} \
    -i $par_kallisto_index \
    $par_transcriptome_fasta
--- a/src/kallisto/kallisto_index/test.sh
+++ b/src/kallisto/kallisto_index/test.sh
@@ -0,0 +1,14 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 "$meta_executable" \
  --transcriptome_fasta "$meta_resources_dir/transcriptome.fasta" \
  --kallisto_index Kallisto 
 echo ">>> Checking whether output exists"
 [ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
 [ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/kallisto/kallisto_quant/config.vsh.yaml
+++ b/src/kallisto/kallisto_quant/config.vsh.yaml
@@ -0,0 +1,88 @@
 name: kallisto_quant
 namespace: kallisto
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/kallisto/quant/main.nf, modules/nf-core/kallisto/quant/meta.yml]
    last_sha: aff1d2e02717247831644769fc3ba84868c3fdde
 description: | 
  Computes equivalence classes for reads and quantifies abundances.
 argument_groups: 
 - name: "Input"
  arguments:
  - name: "--input"
    type: file
    multiple: true
    multiple_sep: ","
    description: List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
  - name: "--paired"
    type: boolean
    description: Paired reads or not.
  - name: "--strandedness"
    type: string
    description: Sample strand-specificity.
  - name: "--index"
    type: file
    description: Kallisto genome index.
  - name: "--gtf"
    type: file
    description: Optional gtf file for translation of transcripts into genomic coordinates.
  - name: "--chromosomes"
    type: file
    description: Optional tab separated file with chromosome names and lengths.
  - name: "--fragment_length"
    type: integer
    description: For single-end mode only, the estimated average fragment length.
  - name: "--fragment_length_sd"
    type: integer
    description: For single-end mode only, the estimated standard deviation of the fragment length.
 - name: "Output"
  arguments:
  - name: "--output"
    type: file
    description: Kallisto quant results
    default: "$id.kallisto_quant_results"
    direction: output
  - name: "--log"
    type: file
    description: File containing log information from running kallisto quant
    default: "$id.kallisto_quant.log.txt"
    direction: output
  - name: "--run_info"
    type: file
    description: A json file containing information about the run
    default: "$id.run_info.json"
    direction: output 
  - name: "--quant_results_file"
    type: file
    description: TSV file containing abundance estimates from Kallisto
    direction: output
    default: $id.abundance.tsv
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/transcriptome.fasta
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: docker
        run: |
          apt-get update && \
          apt-get install -y --no-install-recommends wget && \
          wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
          tar -xzf kallisto_linux-v0.50.1.tar.gz && \
          mv kallisto/kallisto /usr/local/bin/
 runners:
  - type: executable
  - type: nextflow  
--- a/src/kallisto/kallisto_quant/script.sh
+++ b/src/kallisto/kallisto_quant/script.sh
@@ -0,0 +1,49 @@
 #!/bin/bash
 set -eo pipefail
 IFS="," read -ra input <<< $par_input
 single_end_params=''
 if [ $par_paired == "false" ]; then
    if [[ $par_fragment_length < 0 ]] || [[ ! $fragment_length_sd < 0 ]]; then
        echo "fragment_length and fragment_length_sd must be set for single-end data"
        exit 1
    fi
    single_end_params="--single --fragment-length $par_fragment_length --sd $par_fragment_length_sd"
 fi
 strandedness=''
 if [[ "$par_extra_args" != *"--fr-stranded"* ]] && [[ "$par_extra_args" != *"--rf-stranded"* ]]; then
    if [ "$par_strandedness" == 'forward' ]; then
        strandedness='--fr-stranded'
    elif [ "$par_strandedness" == 'reverse' ]; then
        strandedness='--rf-stranded'
    fi
 fi
 mkdir -p $par_output
 echo "kallisto quant \
    ${meta_cpus:+--threads $meta_cpus} \
    --index $par_index \
    ${par_gtf:+--gtf $par_gtf} \
    ${par_chromosomes:+--chromosomes $par_chromosomes} \
    $single_end_params \
    $strandedness \
    $par_extra_args \
    -o $par_output \
    ${input[*]} 2> >(tee -a ${par_output}/kallisto_quant.log >&2)"
 kallisto quant \
    ${meta_cpus:+--threads $meta_cpus} \
    --index $par_index \
    ${par_gtf:+--gtf $par_gtf} \
    ${par_chromosomes:+--chromosomes $par_chromosomes} \
    $single_end_params \
    $strandedness \
    $par_extra_args \
    -o $par_output \
    ${input[*]} 2> >(tee -a ${par_output}/kallisto_quant.log >&2)
 mv ${par_output}/kallisto_quant.log ${par_log}
 mv ${par_output}/run_info.json ${par_run_info}
 cp ${par_output}/abundance.tsv ${par_quant_results_file}
--- a/src/kallisto/kallisto_quant/test.sh
+++ b/src/kallisto/kallisto_quant/test.sh
@@ -0,0 +1,55 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 echo ">>> Generating Kallisto index"
 kallisto index \
    -i index \
    $meta_resources_dir/transcriptome.fasta
 echo ">>> Testing for paired-end reads"
 "$meta_executable" \
  --index index \
  --paired true \
  --strandedness reverse \
  --output paired_end_test \
  --input "SRR6357070_1.fastq.gz,SRR6357070_2.fastq.gz" \
  --log quant_pe.log \
  --run_info pe_run_info.json 
 echo ">>> Checking whether output exists"
 [ ! -d "paired_end_test" ] && echo "Kallisto results do not exist!" && exit 1
 [ ! -f "quant_pe.log" ] && echo "quant_pe.log does not exist!" && exit 1
 [ ! -s "quant_pe.log" ] && echo "quant_pe.log is empty!" && exit 1
 [ ! -f "pe_run_info.json" ] && echo "pe_run_info.json does not exist!" && exit 1
 [ ! -s "pe_run_info.json" ] && echo "pe_run_info.json is empty!" && exit 1
 [ ! -f "paired_end_test/abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
 [ ! -s "paired_end_test/abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
 [ ! -f "paired_end_test/abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
 [ ! -s "paired_end_test/abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
 echo ">>> Testing for single-end reads"
 "$meta_executable" \
  --index index \
  --paired false \
  --strandedness "reverse" \
  --output single_end_test \
  --input "SRR6357070_1.fastq.gz" \
  --log quant_se.log \
  --run_info se_run_info.json \
  --fragment_length 101 \
  --fragment_length_sd 50
 echo ">>> Checking whether output exists"
 [ ! -d "single_end_test" ] && echo "Kallisto results do not exist!" && exit 1
 [ ! -f "quant_se.log" ] && echo "quant_se.log does not exist!" && exit 1
 [ ! -s "quant_se.log" ] && echo "quant_se.log is empty!" && exit 1
 [ ! -f "se_run_info.json" ] && echo "se_run_info.json does not exist!" && exit 1
 [ ! -s "se_run_info.json" ] && echo "se_run_info.json is empty!" && exit 1
 [ ! -f "single_end_test/abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
 [ ! -s "single_end_test/abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
 [ ! -f "single_end_test/abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
 [ ! -s "single_end_test/abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/multiqc_custom_biotype/config.vsh.yaml
+++ b/src/multiqc_custom_biotype/config.vsh.yaml
@@ -0,0 +1,46 @@
 name: "multiqc_custom_biotype"
 info: 
  migration_info: 
 description: Calculate features percentage for biotype counts
 argument_groups: 
 - name: "Input"
  arguments:
  - name: "--biocounts"
    type: file
    description: File with all biocounts
  - name: "--id"
    type: string
    description: Sample name
    default: $id
  - name: "--biotypes_header"
    type: file
    default: assets/multiqc/biotypes_header.txt
 - name: "Output"
  arguments:
  - name: '--featurecounts_multiqc'
    type: file
    direction: output
    default: $id.biotype_counts_mqc.tsv
  - name: '--featurecounts_rrna_multiqc'
    type: file
    direction: output
    default: $id.biotype_counts_rrna_mqc.tsv
 resources:
  - type: bash_script
    path: script.sh
  # Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/mqc_features_stat.py
  - path: mqc_features_stat.py
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [pip]
      - type: python
 runners: 
  - type: executable
  - type: nextflow
--- a/src/multiqc_custom_biotype/mqc_features_stat.py
+++ b/src/multiqc_custom_biotype/mqc_features_stat.py
@@ -0,0 +1,89 @@
 #!/usr/bin/env python3
 import argparse
 import logging
 import os
 # Create a logger
 logging.basicConfig(format="%(name)s - %(asctime)s %(levelname)s: %(message)s")
 logger = logging.getLogger(__file__)
 logger.setLevel(logging.INFO)
 mqc_main = """#id: 'biotype-gs'
 #plot_type: 'generalstats'
 #pconfig:"""
 mqc_pconf = """#    percent_{ft}:
 #        title: '% {ft}'
 #        namespace: 'Biotype Counts'
 #        description: '% reads overlapping {ft} features'
 #        max: 100
 #        min: 0
 #        scale: 'RdYlGn-rev'
 #        format: '{{:.2f}}%'"""
 def mqc_feature_stat(bfile, features, outfile, sname=None):
    # If sample name not given use file name
    if not sname:
        sname = os.path.splitext(os.path.basename(bfile))[0]
    # Try to parse and read biocount file
    fcounts = {}
    try:
        with open(bfile, "r") as bfl:
            for ln in bfl:
                if ln.startswith("#"):
                    continue
                ft, cn = ln.strip().split("\t")
                fcounts[ft] = float(cn)
    except:
        logger.error("Trouble reading the biocount file {}".format(bfile))
        return
    total_count = sum(fcounts.values())
    if total_count == 0:
        logger.error("No biocounts found, exiting")
        return
    # Calculate percentage for each requested feature
    fpercent = {f: (fcounts[f] / total_count) * 100 if f in fcounts else 0 for f in features}
    if len(fpercent) == 0:
        logger.error("Any of given features '{}' not found in the biocount file".format(", ".join(features), bfile))
        return
    # Prepare the output strings
    out_head, out_value, out_mqc = ("Sample", "'{}'".format(sname), mqc_main)
    for ft, pt in fpercent.items():
        out_head = "{}\tpercent_{}".format(out_head, ft)
        out_value = "{}\t{}".format(out_value, pt)
        out_mqc = "{}\n{}".format(out_mqc, mqc_pconf.format(ft=ft))
    # Write the output to a file
    with open(outfile, "w") as ofl:
        out_final = "\n".join([out_mqc, out_head, out_value]).strip()
        ofl.write(out_final + "\n")
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="""Calculate features percentage for biotype counts""")
    parser.add_argument("biocount", type=str, help="File with all biocounts")
    parser.add_argument(
        "-f",
        "--features",
        dest="features",
        required=True,
        nargs="+",
        help="Features to count",
    )
    parser.add_argument("-s", "--sample", dest="sample", type=str, help="Sample Name")
    parser.add_argument(
        "-o",
        "--output",
        dest="output",
        default="biocount_percent.tsv",
        type=str,
        help="Sample Name",
    )
    args = parser.parse_args()
    mqc_feature_stat(args.biocount, args.features, args.output, args.sample)
--- a/src/multiqc_custom_biotype/script.sh
+++ b/src/multiqc_custom_biotype/script.sh
@@ -0,0 +1,11 @@
 #!/bin/bash
 set -eo pipefail
 cut -f 1,7 $par_biocounts | tail -n +3 | cat $par_biotypes_header - >> $par_featurecounts_multiqc
 python3 "$meta_resources_dir/mqc_features_stat.py" \
    $par_featurecounts_multiqc \
    -s $par_id \
    -f rRNA \
    -o $par_featurecounts_rrna_multiqc
--- a/src/picard_markduplicates/config.vsh.yaml
+++ b/src/picard_markduplicates/config.vsh.yaml
@@ -0,0 +1,69 @@
 name: "picard_markduplicates"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/picard/markduplicates/main.nf, modules/nf-core/picard/markduplicates/meta.yml]
    last_sha: 55398de6ab7577acfe9b1180016a93d7af7eb859
 description: | 
  Locate and tag duplicate reads in a BAM file
 argument_groups: 
 - name: "Input"
  arguments:
  - name: "--bam"
    type: file
    description: Input BAM file
  - name: "--fasta"
    type: file
    description: Reference genome FASTA file
  - name: "--fai"
    type: file
    description: Reference genome FASTA index
  - name: "--extra_picard_args"
    type: string
    description: Additional argument to be passed to Picard MarkDuplicates
    default: '--ASSUME_SORTED true --REMOVE_DUPLICATES false --VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp'
 - name: "Output"
  arguments:
  - name: "--output_bam"
    type: file
    direction: output
    description: BAM file with duplicate reads marked/removed
    default: $id.MarkDuplicates.bam
  - name: "--bai"
    type: file
    direction: output
    description: An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag
    default: $id.MarkDuplicates.bam.bai
    must_exist: false
  - name: "--metrics"
    type: file
    direction: output
    description: Duplicate metrics file generated by picard 
    default: $id.MarkDuplicates.metrics.txt 
 resources: 
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/genome.fasta
 engines: 
  - type: docker
    image: ubuntu:22.04
    setup: 
      - type: docker
        run: | 
          apt-get update && \
          apt-get install -y build-essential openjdk-17-jdk wget && \
          wget --no-check-certificate https://github.com/broadinstitute/picard/releases/download/3.1.1/picard.jar && \
          mv picard.jar /usr/local/bin 
        env: [ PICARD=/usr/local/bin/picard.jar ]
 runners:
  - type: executable
  - type: nextflow
--- a/src/picard_markduplicates/script.sh
+++ b/src/picard_markduplicates/script.sh
@@ -0,0 +1,17 @@
 #!/bin/bash
 set -eo pipefail
 avail_mem=3072
 if [ ! $meta_memory_mb ]; then
    echo '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.'
 else
    avail_mem=$(( $meta_memory_mb*0.8 ))
 fi
 java -Xmx${avail_mem}M -jar $PICARD MarkDuplicates \
    $par_extra_picard_args \
    --INPUT $par_bam \
    --OUTPUT $par_output_bam \
    --REFERENCE_SEQUENCE $par_fasta \
    --METRICS_FILE $par_metrics
--- a/src/picard_markduplicates/test.sh
+++ b/src/picard_markduplicates/test.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 "$meta_executable" \
  --bam "$meta_resources_dir/test.paired_end.sorted.bam" \
  --fasta "$meta_resources_dir/genome.fasta" \
  --extra_picard_args "--REMOVE_DUPLICATES false" \
  --output_bam "test.MarkDuplicates.genome.bam" \
  --metrics "test.MarkDuplicates.metrics.txt"
 echo ">>> Check whether output exists"
 [ ! -f "test.MarkDuplicates.genome.bam" ] && echo "MarkDuplicates output BAM file does not exist!" && exit 1
 [ ! -s "test.MarkDuplicates.genome.bam" ] && echo "MarkDuplicates output BAM file is empty!" && exit 1
 [ ! -f "test.MarkDuplicates.metrics.txt" ] && echo "MarkDuplicates output metrics file does not exist!" && exit 1
 [ ! -s "test.MarkDuplicates.metrics.txt" ] && echo "MarkDuplicates output metrics file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/prepare_multiqc_input/config.vsh.yaml
+++ b/src/prepare_multiqc_input/config.vsh.yaml
@@ -0,0 +1,146 @@
 name: "prepare_multiqc_input"
 description: |
  Prepare directory with all the input files for MultiQC.
 argument_groups: 
  - name: "Input"
    arguments:
      - name: "--fail_trimming_multiqc"
        type: string
      - name: "--fail_mapping_multiqc"
        type: string
      - name: "--fail_strand_multiqc"
        type: string
      - name: "--fastqc_raw_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--fastqc_trim_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--trim_log_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--sortmerna_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--star_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      # - name: "--hisat2_multiqc"
      #   type: file
      # - name: "--rsem_multiqc"
      #   type: file
      - name: "--salmon_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--samtools_stats"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--samtools_flagstat"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--samtools_idxstats"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--markduplicates_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--pseudo_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--featurecounts_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--featurecounts_rrna_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--aligner_pca_multiqc"
        type: file
      - name: "--aligner_clustering_multiqc"
        type: file
      - name: "--pseudo_aligner_pca_multiqc"
        type: file
      - name: "--pseudo_aligner_clustering_multiqc"
        type: file
      - name: "--preseq_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--qualimap_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--dupradar_output_dup_intercept_mqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--dupradar_output_duprate_exp_denscurve_mqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--bamstat_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--inferexperiment_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--innerdistance_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--junctionannotation_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--junctionsaturation_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--readdistribution_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--readduplication_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--tin_multiqc"
        type: file
        multiple: true
        multiple_sep: ","
      - name: "--multiqc_config"
        type: file
  - name: "Ouput"
    arguments:
      - name: "--output"
        type: file
        direction: output
        default: multiqc_input
 resources:
  - type: bash_script
    path: script.sh
 engines:
  - type: docker
    image: ubuntu:22.04
 runners:
  - type: executable
  - type: nextflow
--- a/src/prepare_multiqc_input/script.sh
+++ b/src/prepare_multiqc_input/script.sh
@@ -0,0 +1,74 @@
 #!/bin/bash
 set -eo pipefail
 mkdir -p $par_output
 echo $par_fail_trimming_multiqc > $par_output/fail_trimming_mqc.tsv
 echo $par_fail_mapping_multiqc > $par_output/fail_mapping_mqc.tsv
 echo $par_fail_strand_multiqc > $par_output/fail_strand_mqc.tsv
 IFS="," read -ra fastqc_raw_multiqc <<< $par_fastqc_raw_multiqc && for file in "${fastqc_raw_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra fastqc_trim_multiqc <<< $par_fastqc_trim_multiqc && for file in "${fastqc_trim_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra trim_log_multiqc <<< $par_trim_log_multiqc && for file in "${trim_log_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra sortmerna_multiqc <<< $par_sortmerna_multiqc && for file in "${sortmerna_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra star_multiqc <<< $par_star_multiqc && for file in "${star_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 # IFS="," read -ra hisat2_multiqc <<< $par_hisat2_multiqc && for file in "${hisat2_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra rsem_multiqc <<< $par_rsem_multiqc && for file in "${rsem_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra salmon_multiqc <<< $par_salmon_multiqc && for file in "${salmon_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra samtools_stats <<< $par_samtools_stats && for file in "${samtools_stats[@]}"; do [ -e "$file" ] && cp -r "$file" $par_output/; done
 IFS="," read -ra samtools_flagstat <<< $par_samtools_flagstat && for file in "${samtools_flagstat[@]}"; do [ -e "$file" ] && cp -r "$file" $par_output/; done
 IFS="," read -ra samtools_idxstats <<< $par_samtools_idxstats && for file in "${samtools_idxstats[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra markduplicates_multiqc <<< $par_markduplicates_multiqc && for file in "${markduplicates_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra pseudo_multiqc <<< $par_pseudo_multiqc && for file in "${pseudo_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra featurecounts_multiqc <<< $par_featurecounts_multiqc && for file in "${featurecounts_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra featurecounts_rrna_multiqc <<< $par_featurecounts_rrna_multiqc&& for file in "${featurecounts_rrna_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 [ -e "$par_aligner_pca_multiqc" ] && cp -r "$par_aligner_pca_multiqc" "$par_output/"
 [ -e "$par_aligner_clustering_multiqc" ] && cp -r $par_aligner_clustering_multiqc "$par_output/"
 [ -e "$par_pseudo_aligner_pca_multiqc" ] && cp -r $par_pseudo_aligner_pca_multiqc "$par_output/"
 [ -e "$par_pseudo_aligner_clustering_multiqc" ] && cp -r $par_pseudo_aligner_clustering_multiqc "$par_output/"
 IFS="," read -ra preseq_multiqc <<< $par_preseq_multiqc && for file in "${preseq_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra qualimap_multiqc <<< $par_qualimap_multiqc && for file in "${qualimap_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra dupradar_output_dup_intercept_mqc <<< $par_dupradar_output_dup_intercept_mqc && for file in "${dupradar_output_dup_intercept_mqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra dupradar_output_duprate_exp_denscurve_mqc <<< $par_dupradar_output_duprate_exp_denscurve_mqc && for file in "${dupradar_output_duprate_exp_denscurve_mqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra bamstat_multiqc <<< $par_bamstat_multiqc && for file in "${bamstat_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra inferexperiment_multiqc <<< $par_inferexperiment_multiqc && for file in "${inferexperiment_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra innerdistance_multiqc <<< $par_innerdistance_multiqc && for file in "${innerdistance_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra junctionannotation_multiqc <<< $par_junctionannotation_multiqc && for file in "${junctionannotation_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra junctionsaturation_multiqc <<< $par_junctionsaturation_multiqc && for file in "${junctionsaturation_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra readdistribution_multiqc <<< $par_readdistribution_multiqc && for file in "${readdistribution_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra readduplication_multiqc <<< $par_readduplication_multiqc && for file in "${readduplication_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 IFS="," read -ra tin_multiqc <<< $par_tin_multiqc && for file in "${tin_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
 [ -e "$par_multiqc_config" ] && cp -r $par_multiqc_config "$par_output/"
--- a/src/preprocess_transcripts_fasta/config.vsh.yaml
+++ b/src/preprocess_transcripts_fasta/config.vsh.yaml
@@ -0,0 +1,40 @@
 name: "preprocess_transcripts_fasta"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/preprocess_transcripts_fasta_gencode.nf]
    last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
 description: |
  Process transcripts FASTA if GTF file is GENOCODE format
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--transcript_fasta"
    type: file 
    required: true
    description: Path of transcripts FASTA file
 - name: "Output"
  arguments:    
  - name: "--output"
    type: file
    direction: output
    required: true
    description: Path of processed output FASTA file.
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/reference/transcriptome.fasta
 engines:
  - type: docker
    image: ubuntu:22.04
 runners:
  - type: executable
  - type: nextflow
--- a/src/preprocess_transcripts_fasta/script.sh
+++ b/src/preprocess_transcripts_fasta/script.sh
@@ -0,0 +1,11 @@
 #!/bin/bash
 set -eo pipefail
 filename="$(basename -- "$par_transcript_fasta")"
 if [ ${filename##*.} == "gz" ]; then
    zcat $par_transcript_fasta | cut -d "|" -f1 > $par_output
 else 
    cat $par_transcript_fasta | cut -d "|" -f1 > $par_output
 fi
--- a/src/preprocess_transcripts_fasta/test.sh
+++ b/src/preprocess_transcripts_fasta/test.sh
@@ -0,0 +1,14 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 "$meta_executable" \
  --transcript_fasta "$meta_resources_dir/transcriptome.fasta" \
  --output "processed_transcriptome.fasta" 
 echo ">>> Check whether output exists"
 [ ! -f "processed_transcriptome.fasta" ] && echo "Processed FASTA file does not exist!" && exit 1
 [ ! -s "processed_transcriptome.fasta" ] && echo "Processed FASTA file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/preseq_lcextrap/config.vsh.yaml
+++ b/src/preseq_lcextrap/config.vsh.yaml
@@ -0,0 +1,68 @@
 name: "preseq_lcextrap"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/preseq/lcextrap/main.nf, modules/nf-core/preseq/lcextrap/meta.yml]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: Computing the expected future yield of distinct reads and bounds on the number of total distinct reads in the library and the associated confidence intervals.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file
    description: Input genome BAM/BED file
  - name: "--extra_preseq_args"
    type: string
  - name: "--paired"
    type: boolean
    description: Paired-end reads?
 - name: "Output"
  arguments:
  - name: "--output"
    type: file
    direction: output
    default: $id.lc_extrap.txt
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/a.sorted.bed
  - path: /testData/unit_test_resources/SRR1106616_5M_subset.bam
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:
    - type: apt
      packages: [ curl, bzip2, build-essential, wget, gcc, autoconf, automake, make, libz-dev, libbz2-dev, zlib1g-dev, libncurses5-dev, libncursesw5-dev, liblzma-dev, pip ]
    - type: docker
      run: | 
        cd /usr/bin && \
        wget --no-check-certificate https://github.com/smithlabcode/preseq/releases/download/v3.2.0/preseq-3.2.0.tar.gz && \
        wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 && \
        wget --no-check-certificate https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static && \
        curl -fsSL https://github.com/samtools/samtools/releases/download/1.18/samtools-1.18.tar.bz2 -o samtools-1.18.tar.bz2 && \
        tar -xjf samtools-1.18.tar.bz2 && rm samtools-1.18.tar.bz2 && \
        tar -xzf preseq-3.2.0.tar.gz && rm preseq-3.2.0.tar.gz && \
        tar -vxjf htslib-1.9.tar.bz2 && rm htslib-1.9.tar.bz2 && \
        mv bedtools.static /usr/local/bin/bedtools && \
        chmod a+x /usr/local/bin/bedtools && \
        cd samtools-1.18 && \
        ./configure && \
        make && \
        make install && \
        cd /usr/bin && cd htslib-1.9 && \
        make && \
        cd /usr/bin && cd preseq-3.2.0 && \
        mkdir build && cd build && \
        ../configure && \
        make && make install && make HAVE_HTSLIB=1 all 
 runners:
 - type: executable
 - type: nextflow
--- a/src/preseq_lcextrap/script.sh
+++ b/src/preseq_lcextrap/script.sh
@@ -0,0 +1,29 @@
 #!/bin/bash
 set -eo pipefail
 file=$(basename -- "$par_input")
 filename="${file%.*}"
 if [ "${file##*.}" == "bam" ]; then 
    samtools sort -o sorted_$filename.bam -n $par_input
    bedtools bamtobed -i sorted_$filename.bam > $filename.bed
    bedtools sort -i $filename.bed > sorted_$filename.bed
 elif [ "${file##*.}" == "bed" ]; then
    bedtools sort -i $par_input > sorted_$filename.bed
 else 
    echo "Invalid input file format!"
    exit 1
 fi
 if $par_paired; then
    paired="-pe"
 else
    paired=""
 fi
 preseq lc_extrap \
    sorted_$filename.bed \
    $paired \
    $par_extra_preseq_args \
    -o $par_output
--- a/src/preseq_lcextrap/test.sh
+++ b/src/preseq_lcextrap/test.sh
@@ -0,0 +1,28 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 echo ">>> Testing with BAM input"
 "$meta_executable" \
  --paired false \
  --input "$meta_resources_dir/SRR1106616_5M_subset.bam" \
  --output lc_extrap.txt 
 echo ">>> Check whether output exists"
 [ ! -f "lc_extrap.txt" ] && echo "Output file does not exist!" && exit 1
 [ ! -s "lc_extrap.txt" ] && echo "Output file is empty!" && exit 1
 rm lc_extrap.txt
 echo ">>> Testing with BED input"
 "$meta_executable" \
  --paired false \
  --input "$meta_resources_dir/a.sorted.bed" \
  --output lc_extrap.txt 
 echo ">>> Check whether output exists"
 [ ! -f "lc_extrap.txt" ] && echo "Output file does not exist!" && exit 1
 [ ! -s "lc_extrap.txt" ] && echo "Output file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/qualimap/config.vsh.yaml
+++ b/src/qualimap/config.vsh.yaml
@@ -0,0 +1,118 @@
 name: "qualimap"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/qualimap/rnaseq/main.nf]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  RNA-seq QC analysis using the qualimap 
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file
    required: true
    description: path to input mapping file in BAM format.
  - name: "--gtf"
    type: file
    required: true
    description: path to annotations file in Ensembl GTF format.
 - name: "Output"
  arguments: 
  - name: "--output_dir"
    direction: output
    type: file
    required: false
    default: $id.qualimap_output
    description: path to output directory for raw data and report.
  - name: "--output_pdf"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.report.pdf
    description: path to output file for pdf report.
  - name: "--output_format"
    type: string
    required: false
    default: html
    description: Format of the output report (PDF or HTML, default is HTML)
 - name: "Optional"
  arguments: 
  - name: "--pr_bases"
    type: integer
    required: false
    default: 100
    min: 1
    description: Number of upstream/downstream nucleotide bases to compute 5'-3' bias (default = 100).
  - name: "--tr_bias"
    type: integer
    required: false
    default: 1000
    min: 1
    description: Number of top highly expressed transcripts to compute 5'-3' bias (default = 1000).
  - name: "--algorithm"
    type: string
    required: false
    default: uniquely-mapped-reads
    description: Counting algorithm (uniquely-mapped-reads (default) or proportional).
  - name: "--sequencing_protocol"
    type: string
    required: false
    choices: ["non-strand-specific", "strand-specific-reverse", "strand-specific-forward"]
    default: non-strand-specific
    description: Sequencing library protocol (strand-specific-forward, strand-specific-reverse or non-strand-specific (default)).
  - name: "--paired"
    type: boolean_true
    description: Setting this flag for paired-end experiments will result in counting fragments instead of reads.
  - name: "--sorted"
    type: boolean_true
    description: Setting this flag indicates that the input file is already sorted by name. If flag is not set, additional sorting by name will be performed. Only requiredfor paired-end analysis.
  - name: "--java_memory_size"
    type: string
    required: false
    default: 4G 
    description: maximum Java heap memory size, default = 4G.
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
  - path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam.bai
  - path: /testData/unit_test_resources/genes.gtf
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ r-base, unzip, wget, openjdk-8-jdk, libxml2-dev, libcurl4-openssl-dev ]
    - type: docker
      run: |
        wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.3.zip && \
        unzip qualimap_v2.3.zip && \
        cp -a qualimap_v2.3/. usr/bin && \
        unset DISPLAY && \
        mkdir -p tmp && \
        export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp
    - type: r
      bioc: [ NOISeqr ]
      cran: [ optparse ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/qualimap/script.sh
+++ b/src/qualimap/script.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 set -eo pipefail
 mkdir -p $par_output_dir
 qualimap rnaseq \
    --java-mem-size=$par_java_memory_size \
    --algorithm $par_algorithm \
    --num-pr-bases $par_pr_bases \
    --num-tr-bias $par_tr_bias \
    --sequencing-protocol $par_sequencing_protocol \
    -bam $par_input \
    -gtf $par_gtf \
    ${par_paired:+-pe} \
    ${par_sorted:+-s} \
    -outdir $par_output_dir \
    -outformat $par_output_format 
--- a/src/qualimap/test.sh
+++ b/src/qualimap/test.sh
@@ -0,0 +1,24 @@
 echo "> Running $meta_functionality_name."
 # define input and output for script
 input_bam="$meta_resources_dir/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam"
 input_gtf="$meta_resources_dir/genes.gtf"
 output_dir="qualimap_output"
 "$meta_executable" \
    --input "$input_bam" \
    --gtf "$input_gtf" \
    --output_dir "$output_dir"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Checking whether output dir and files exists"
 [ ! -d "$output_dir" ] && echo "Output dir could not be found!" && exit 1
 [ ! -d "$output_dir/raw_data_qualimapReport" ] && echo "Raw data folder could not be found!" && exit 1
 [ -z $(ls -A "$output_dir/raw_data_qualimapReport") ] && echo "Raw data folder is missing output files" && exit 1
 [ ! -f "$output_dir/qualimapReport.html" ] && echo "Qualimap report was not found" && exit 1
 [ ! -s "$output_dir/qualimapReport.html" ] && echo "Qualimap report is empty" && exit 1
 exit 0
--- a/src/rsem/rsem_calculate_expression/config.vsh.yaml
+++ b/src/rsem/rsem_calculate_expression/config.vsh.yaml
@@ -0,0 +1,138 @@
 name: "rsem_calculate_expression"
 namespace: "rsem"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rsem/calculateexpression/main.nf, modules/nf-core/rsem/calculateexpression/meta.yml]
    last_sha: 92b2a7857de1dda9d1c19a088941fc81e2976ff7
 description: | 
  Calculate expression with RSEM.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--id"
    type: string
    description: Sample ID.
  - name: "--strandedness"
    type: string
    description: Sample strand-specificity. Must be one of unstranded, forward, reverse
    choices: [forward, reverse, unstranded]
  - name: "--paired"
    type: boolean
    description: Paired-end reads or not?
  - name: "--input"
    type: file
    description: Input reads for quantification.
    multiple: true
    multiple_sep: ","
  - name: "--index"
    type: file
    description: RSEM index.  
  - name: "--extra_args"
    type: string
    description: Extra rsem-calculate-expression arguments in addition to the defaults.
  - name: "--versions"
    type: file
    must_exist: false
 - name: "Output"
  arguments:
  - name: "--counts_gene"
    type: file
    description: Expression counts on gene level
    example: sample.genes.results
    direction: output
  - name: "--counts_transcripts"
    type: file
    description: Expression counts on transcript level
    example: sample.isoforms.results
    direction: output
  - name: "--stat"
    type: file
    description: RSEM statistics
    example: sample.stat
    direction: output
  - name: "--logs"
    type: file
    description: RSEM logs
    example: sample.log
    direction: output
  - name: "--bam_star"
    type: file
    description: BAM file generated by STAR (optional)
    example: sample.STAR.genome.bam
    direction: output
  - name: "--bam_genome"
    type: file
    description: Genome BAM file (optional)
    example: sample.genome.bam
    direction: output
  - name: "--bam_transcript"
    type: file
    description: Transcript BAM file (optional)
    example: sample.transcript.bam
    direction: output
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
  - path: /testData/minimal_test/reference/rsem.tar.gz
 # TODO: Install bowtie/bowtie2 
 engines:
  - type: docker
    image: ubuntu:22.04
    setup:
    - type: apt
      packages: 
        - build-essential 
        - gcc 
        - g++ 
        - make 
        - wget 
        - zlib1g-dev 
        - unzip 
        - xxd 
        - perl 
        - r-base
        - bowtie2
        - python3-pip 
        - git
    - type: docker
      env: 
        - STAR_VERSION=2.7.11b
        - RSEM_VERSION=1.3.3
        - TZ=Europe/Brussels
      run: |
        ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \
        cd /tmp && \
        wget --no-check-certificate https://github.com/alexdobin/STAR/archive/refs/tags/${STAR_VERSION}.zip && \
        unzip ${STAR_VERSION}.zip && \
        cd STAR-${STAR_VERSION}/source && \
        make STARstatic CXXFLAGS_SIMD=-std=c++11 && \
        cp STAR /usr/local/bin && \
        cd /tmp && \
        wget --no-check-certificate https://github.com/deweylab/RSEM/archive/refs/tags/v${RSEM_VERSION}.zip && \
        unzip v${RSEM_VERSION}.zip && \
        cd RSEM-${RSEM_VERSION} && \
        make && \
        make install && \
        rm -rf /tmp/STAR-${STAR_VERSION} /tmp/${STAR_VERSION}.zip && \
        rm -rf /tmp/RSEM-${RSEM_VERSION} /tmp/v${RSEM_VERSION}.zip && \
        cd && \
        apt-get clean && \
        echo 'export PATH=$PATH:/usr/local/bin' >> /etc/profile && \
        echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc && \
        /bin/bash -c "source /etc/profile && source ~/.bashrc && echo $PATH && which STAR"
 runners:
  - type: executable
  - type: nextflow
--- a/src/rsem/rsem_calculate_expression/script.sh
+++ b/src/rsem/rsem_calculate_expression/script.sh
@@ -0,0 +1,32 @@
 #!/bin/bash
 set -eo pipefail
 function clean_up {
    rm -rf "$tmpdir"
 }
 trap clean_up EXIT
 tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
 if [ $par_strandedness == 'forward' ]; then
    strandedness='--strandedness forward'
 elif [ $par_strandedness == 'reverse' ]; then
    strandedness='--strandedness reverse'
 else
    strandedness=''
 fi
 IFS="," read -ra input <<< $par_input
 INDEX=`find -L $meta_resources_dir/ -name "*.grp" | sed 's/\.grp$//'`
 rsem-calculate-expression \
    ${meta_cpus:+--num-theads $meta_cpus} \
    $strandedness \
    ${par_paired:+--paired-end} \
    $par_extra_args \
    ${input[*]} \
    $INDEX \
    $par_id
--- a/src/rsem/rsem_calculate_expression/test.sh
+++ b/src/rsem/rsem_calculate_expression/test.sh
@@ -0,0 +1,26 @@
 #!/bin/bash
 echo ">>> Testing $meta_functionality_name"
 tar -xavf $meta_resources_dir/rsem.tar.gz
 echo ">>> Calculating expression"
 "$meta_executable" \
  --id WT_REP1 \
  --strandedness reverse \
  --paired true \
  --input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
  --index rsem \
  --extra_args "--star --star-output-genome-bam --star-gzipped-read-file --estimate-rspd --seed 1" \
  --counts_gene WT_REP1.genes.results \
  --counts_transctips WT_REP1.isoforms.results \
  --logs WT_REP1.log 
 echo ">>> Checking whether output exists"
 [ ! -f "WT_REP1.genes.results" ] && echo "Gene level expression counts file does not exist!" && exit 1
 [ ! -s "WT_REP1.genes.results" ] && echo "Gene level expression counts file is empty!" && exit 1
 [ ! -f "WT_REP1.log" ] && echo "Log file does not exist!" && exit 1
 [ ! -s "WT_REP1.log" ] && echo "Log file is empty!" && exit 1
 echo "All tests succeeded!"
 exit 0
--- a/src/rsem/rsem_merge_counts/config.vsh.yaml
+++ b/src/rsem/rsem_merge_counts/config.vsh.yaml
@@ -0,0 +1,68 @@
 name: "rsem_merge_counts"
 namespace: "rsem"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/local/rsem_merge_counts/main.nf]
    last_sha: 311279532694ce7520164ce4d65a388c0cd11f60
 description: | 
  Merge the transcript quantification results obtained from rsem calculate-expression across all samples.
 argument_groups:
 - name: "Input"
  arguments:
  - name: "--counts_gene"
    type: file
    description: Expression counts on gene level (genes)
  - name: "--counts_transcripts"
    type: file
    description: Expression counts on transcript level (isoforms)
  - name: "--versions"
    type: file
    must_exist: false
 - name: "Output"
  arguments: 
  - name: "--merged_gene_counts"
    type: file
    description: File containing gene counts across all samples.
    default: rsem.merged.gene_counts.tsv
    direction: output
  - name: "--merged_gene_tpm"
    type: file
    description: File containing gene TPM across all samples.
    default: rsem.merged.gene_tpm.tsv
    direction: output
  - name: "--merged_transcript_counts"
    type: file
    description: File containing transcript counts across all samples.
    default: rsem.merged.transcript_counts.tsv
    direction: output
  - name: "--merged_transcript_tpm"
    type: file
    description: File containing transcript TPM across all samples.
    default: rsem.merged.transcript_tpm.tsv
    direction: output
  - name: "--updated_versions"
    type: file
    default: versions.yml
    direction: output
 resources:
  - type: bash_script
    path: script.sh
 # test_resources:
 #   - type: bash_script
 #     path: test.sh
  # - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
  # - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
 engines:
  - type: docker
    image: ubuntu:22.04
 runners:
  - type: executable
  - type: nextflow
--- a/src/rsem/rsem_merge_counts/script.sh
+++ b/src/rsem/rsem_merge_counts/script.sh
@@ -0,0 +1,28 @@
 #!/bin/bash
 set -ep pipefail
 mkdir -p tmp/genes
 # cut -f 1,2 `ls $par_count_genes/*` | head -n 1` > gene_ids.txt
 for file_id in ${par_count_genes[*]}; do
    samplename=`basename $file_id | sed s/\\.genes.results\$//g`
    echo $samplename > tmp/genes/${samplename}.counts.txt
    cut -f 5 ${file_id} | tail -n+2 >> tmp/genes/${samplename}.counts.txt
    echo $samplename > tmp/genes/${samplename}.tpm.txt
    cut -f 6 ${file_id} | tail -n+2 >> tmp/genes/${samplename}.tpm.txt
 done
 mkdir -p tmp/isoforms
 # cut -f 1,2 `ls $par_counts_transcripts/*` | head -n 1` > transcript_ids.txt
 for file_id in ${par_counts_transcripts[*]}; do
    samplename=`basename $file_id | sed s/\\.isoforms.results\$//g`
    echo $samplename > tmp/isoforms/${samplename}.counts.txt
    cut -f 5 ${file_id} | tail -n+2 >> tmp/isoforms/${samplename}.counts.txt
    echo $samplename > tmp/isoforms/${samplename}.tpm.txt
    cut -f 6 ${file_id} | tail -n+2 >> tmp/isoforms/${samplename}.tpm.txt
 done
 paste gene_ids.txt tmp/genes/*.counts.txt > $par_merged_gene_counts
 paste gene_ids.txt tmp/genes/*.tpm.txt > $par_merged_gene_tpm
 paste transcript_ids.txt tmp/isoforms/*.counts.txt > $par_merged_transcript_counts
 paste transcript_ids.txt tmp/isoforms/*.tpm.txt > $par_merged_transcript_tpm
--- a/src/rseqc/rseqc_bamstat/config.vsh.yaml
+++ b/src/rseqc/rseqc_bamstat/config.vsh.yaml
@@ -0,0 +1,53 @@
 name: "rseqc_bamstat"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/bamstat/main.nf]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Generate statistics from a bam file.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
 - name: "Output"
  arguments: 
  - name: "--output"
    type: file
    direction: output
    required: false
    default: $id.mapping_quality.txt
    description: output file (txt) with mapping quality statistics
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ python3-pip ]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_bamstat/script.sh
+++ b/src/rseqc/rseqc_bamstat/script.sh
@@ -0,0 +1,8 @@
 #!/bin/bash
 set -eo pipefail 
 bam_stat.py \
    --input $par_input \
    --mapq $par_map_qual \
 > $par_output
--- a/src/rseqc/rseqc_bamstat/test.sh
+++ b/src/rseqc/rseqc_bamstat/test.sh
@@ -0,0 +1,23 @@
 #!/bin/bash
 # define input and output for script
 input_bam="test.paired_end.sorted.bam"
 output_summary="mapping_quality.txt"
 # run executable and tests
 echo "> Running $meta_functionality_name."
 "$meta_executable" \
    --input "$meta_resources_dir/$input_bam" \
    --output "$output_summary"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Checking whether output can be found and has content"
 [ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
 [ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_inferexperiment/config.vsh.yaml
+++ b/src/rseqc/rseqc_inferexperiment/config.vsh.yaml
@@ -0,0 +1,67 @@
 name: "rseqc_inferexperiment"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/inferexperiment/main.nf]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Infer strandedness from sequencing reads
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--refgene"
    type: file 
    required: true
    description: Reference gene model in bed format
  - name: "--sample_size"
    type: integer
    required: false
    default: 200000
    min: 1
    description: Numer of reads sampled from SAM/BAM file, default = 200000.
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
 - name: "Output"
  arguments: 
  - name: "--output"
    type: file
    direction: output
    required: false
    default: $id.strandedness.txt
    description: output file (txt) of strandness report
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/test.bed12
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ python3-pip ]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_inferexperiment/script.sh
+++ b/src/rseqc/rseqc_inferexperiment/script.sh
@@ -0,0 +1,10 @@
 #!/bin/bash
 set -eo pipefail 
 infer_experiment.py \
    -i $par_input \
    -r $par_refgene \
    -s $par_sample_size \
    -q $par_map_qual \
 > $par_output
--- a/src/rseqc/rseqc_inferexperiment/test.sh
+++ b/src/rseqc/rseqc_inferexperiment/test.sh
@@ -0,0 +1,24 @@
 #!/bin/bash
 # define input and output for script
 input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
 input_bed="$meta_resources_dir/test.bed12"
 output="strandedness.txt"
 # run executable and tests
 echo "> Running $meta_functionality_name."
 "$meta_executable" \
    --input "$input_bam" \
    --refgene "$input_bed" \
    --output "$output"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Checking whether output can be found and has content"
 [ ! -f "$output" ] && echo "$output is missing" && exit 1
 [ ! -s "$output" ] && echo "$output is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_innerdistance/config.vsh.yaml
+++ b/src/rseqc/rseqc_innerdistance/config.vsh.yaml
@@ -0,0 +1,117 @@
 name: "rseqc_innerdistance"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/innerdistance/main.nf]
    last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
 description: |
  Calculate inner distance between read pairs. 
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--refgene"
    type: file 
    required: true
    description: Reference gene model in bed format
  - name: "--sample_size"
    type: integer
    required: false
    default: 200000
    min: 1
    description: Numer of reads sampled from SAM/BAM file, default = 200000.
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
  - name: "--lower_bound_size"
    type: integer
    required: false
    default: -250 
    description: Lower bound of inner distance (bp). This option is used for ploting histograme, default=-250.
  - name: "--upper_bound_size"
    type: integer
    required: false
    default: 250 
    description: Upper bound of inner distance (bp). This option is used for ploting histograme, default=250.
  - name: "--step_size"
    type: integer
    required: false
    default: 5 
    description: Step size (bp) of histograme. This option is used for plotting histogram, default=5.
 - name: "Output"
  arguments: 
  - name: "--output_stats"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.inner_distance.stats
    description: output file (txt) with summary statistics of inner distances of paired reads
  - name: "--output_dist"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.inner_distance.txt
    description: output file (txt) with inner distances of all paired reads
  - name: "--output_freq"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.inner_distance_freq.txt
    description: output file (txt) with frequencies of inner distances of all paired reads
  - name: "--output_plot"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.inner_distance_plot.pdf
    description: output file (pdf) with histogram plot of of inner distances of all paired reads
  - name: "--output_plot_r"
    type: file
    direction: output
    required: false
    must_exist: false
    default: $id.inner_distance_plot.r
    description: output file (R) with script of histogram plot of of inner distances of all paired reads
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/test.bed12
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [python3-pip, r-base]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_innerdistance/script.sh
+++ b/src/rseqc/rseqc_innerdistance/script.sh
@@ -0,0 +1,23 @@
 #!/bin/bash
 set -exo pipefail 
 prefix=$(openssl rand -hex 8)
 inner_distance.py \
    -i $par_input \
    -r $par_refgene \
    -o $prefix \
    -k $par_sample_size \
    -l $par_lower_bound_size \
    -u $par_upper_bound_size \
    -s $par_step_size \
    -q $par_map_qual \
 > stdout.txt
 head -n 2 stdout.txt > $par_output_stats
 [[ -f "$prefix.inner_distance.txt" ]] && mv $prefix.inner_distance.txt $par_output_dist
 [[ -f "$prefix.inner_distance_plot.pdf" ]] && mv $prefix.inner_distance_plot.pdf $par_output_plot
 [[ -f "$prefix.inner_distance_plot.r" ]] && mv $prefix.inner_distance_plot.r $par_output_plot_r
 [[ -f "$prefix.inner_distance_freq.txt" ]] && mv $prefix.inner_distance_freq.txt $par_output_freq
--- a/src/rseqc/rseqc_innerdistance/test.sh
+++ b/src/rseqc/rseqc_innerdistance/test.sh
@@ -0,0 +1,43 @@
 #!/bin/bash
 gunzip "$meta_resources_dir/hg19_RefSeq.bed.gz"
 # define input and output for script
 input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
 input_bed="$meta_resources_dir/test.bed12"
 output_stats="inner_distance_stats.txt"
 output_dist="inner_distance.txt"
 output_plot="inner_distance_plot.pdf"
 output_plot_r="inner_distance_plot.r"
 output_freq="inner_distance_freq.txt"
 # Run executable
 echo "> Running $meta_functionality_name"
 "$meta_executable" \
    --input $input_bam \
    --refgene $input_bed \
    --output_stats $output_stats \
    --output_dist $output_dist \
    --output_plot $output_plot \
    --output_plot_r $output_plot_r \
    --output_freq $output_freq
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> asserting output has been created for paired read input"
 [ ! -f "$output_stats" ] && echo "$output_stats was not created" && exit 1
 [ ! -s "$output_stats" ] && echo "$output_stats is empty" && exit 1
 [ ! -f "$output_dist" ] && echo "$output_dist was not created" && exit 1
 [ ! -s "$output_dist" ] && echo "$output_dist is empty" && exit 1
 [ ! -f "$output_plot" ] && echo "$output_plot was not created" && exit 1
 [ ! -s "$output_plot" ] && echo "$output_plot is empty" && exit 1
 [ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
 [ ! -s "$output_plot_r" ] && echo "$output_plot_r is empty" && exit 1
 [ ! -f "$output_freq" ] && echo "$output_freq was not created" && exit 1
 [ ! -s "$output_freq" ] && echo "$output_freq is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_junctionannotation/config.vsh.yaml
+++ b/src/rseqc/rseqc_junctionannotation/config.vsh.yaml
@@ -0,0 +1,108 @@
 name: "rseqc_junctionannotation"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/junctionannotation/main.nf]
    last_sha: 
 description: |
  Compare detected splice junctions to reference gene model.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--refgene"
    type: file 
    required: true
    description: Reference gene model in bed format
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
  - name: "--min_intron"
    type: integer
    required: false
    default: 50
    min: 1 
    description: Minimum intron length (bp), default = 50.
 - name: "Output"
  arguments: 
  - name: "--output_log"
    type: file
    direction: output
    required: false
    default: $id.junction_annotation.log
    description: output log of junction annotation script
  - name: "--output_plot_r"
    type: file
    direction: output
    required: false
    default: $id.junction_annotation_plot.r
    description: r script to generate splice_junction and splice_events plot
  - name: "--output_junction_bed"
    type: file
    direction: output
    required: false
    default: $id.junction_annotation.bed
    description: junction annotation file (bed format)
  - name: "--output_junction_interact"
    type: file
    direction: output
    required: false
    default: $id.junction_annotation.Interact.bed
    description: interact file (bed format) of junctions. Can be uploaded to UCSC genome browser or converted to bigInteract (using bedToBigBed program) for visualization.
  - name: "--output_junction_sheet"
    type: file
    direction: output
    required: false
    default: $id.junction_annotation.xls
    description: junction annotation file (xls format)
  - name: "--output_splice_events_plot"
    type: file
    direction: output
    required: false
    default: $id.splice_events.pdf
    description: plot of splice events (pdf)
  - name: "--output_splice_junctions_plot"
    type: file
    direction: output
    required: false
    default: $id.splice_junctions_plot.pdf
    description: plot of junctions (pdf)
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/test.bed12
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ python3-pip, r-base]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_junctionannotation/script.sh
+++ b/src/rseqc/rseqc_junctionannotation/script.sh
@@ -0,0 +1,20 @@
 #!/bin/bash
 set -eo pipefail 
 prefix=$(openssl rand -hex 8)
 input="testData/unit_test_resources/test.paired_end.sorted.bam"
 refgene="testData/unit_test_resources/test.bed"
 junction_annotation.py \
    -i $par_input \
    -r $par_refgene \
    -o $prefix \
    -m $par_min_intron \
    -q $par_map_qual > $par_output_log
 [[ -f "$prefix.junction.bed" ]] && mv $prefix.junction.bed $par_output_junction_bed
 [[ -f "$prefix.junction.Interact.bed" ]] && mv $prefix.junction.Interact.bed $par_output_junction_interact
 [[ -f "$prefix.junction.xls" ]] && mv $prefix.junction.xls $par_output_junction_sheet
 [[ -f "$prefix.junction_plot.r" ]] && mv $prefix.junction_plot.r $par_output_plot_r
 [[ -f "$prefix.splice_events.pdf" ]] && mv $prefix.splice_events.pdf $par_output_splice_events_plot
 [[ -f "$prefix.splice_junction.pdf" ]] && mv $prefix.splice_junction.pdf $par_output_splice_junctions_plot
--- a/src/rseqc/rseqc_junctionannotation/test.sh
+++ b/src/rseqc/rseqc_junctionannotation/test.sh
@@ -0,0 +1,48 @@
 #!/bin/bash
 # define input and output for script
 input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
 input_bed="$meta_resources_dir/test.bed12"
 output_junction_bed="junction_annotation.bed"
 output_junction_interact="junction_annotation.Interact.bed"
 output_junction_sheet="junction_annotation.xls"
 output_plot_r="junction_annotation_plot.r"
 output_splice_events_plot="splice_events.pdf"
 output_splice_junctions_plot="splice_junctions_plot.pdf"
 output_log="junction_annotation.log"
 # run executable and test
 echo "> Running $meta_functionality_name"
 "$meta_executable" \
    --input "$input_bam" \
    --refgene "$input_bed" \
    --output_log "$output_log" \
    --output_plot_r "$output_plot_r" \
    --output_junction_bed "$output_junction_bed" \
    --output_junction_interact "$output_junction_interact" \
    --output_junction_sheet "$output_junction_sheet" \
    --output_splice_events_plot "$output_splice_events_plot" \
    --output_splice_junctions_plot "$output_splice_junctions_plot" 
 # exit_code=$?
 # [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Check if all output files were created"
 [ ! -f "$output_log" ] && echo "$output_log was not created" && exit 1
 [ ! -f "$output_junction_sheet" ] && echo "$output_junction_sheet was not created" && exit 1
 [ -s "$output_junction_sheet" ] && echo "$output_junction_sheet is not empty but should be" && exit 1
 [ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
 [ -s "$output_plot_r" ] && echo "$output_plot_r is not empty but should be" && exit 1
 # [ ! -f "$output_junction_bed" ] && echo "$output_junction_bed was not created" && exit 1
 # [ ! -s "$output_junction_bed" ] && echo "$output_junction_bed is empty" && exit 1
 # [ ! -f "$output_junction_interact" ] && echo "$output_junction_interact was not created" && exit 1
 # [ ! -s "$output_junction_interact" ] && echo "$output_junction_interact is empty" && exit 1
 # [ ! -f "$output_splice_events_plot" ] && echo "$output_splice_events_plot was not created" && exit 1
 # [ ! -s "$output_splice_events_plot" ] && echo "$output_splice_events_plot is empty" && exit 1
 # [ ! -f "$output_splice_junctions_plot" ] && echo "$output_splice_junctions_plot was not created" && exit 1
 # [ ! -s "$output_splice_junctions_plot" ] && echo "$output_splice_junctions_plot is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_junctionsaturation/config.vsh.yaml
+++ b/src/rseqc/rseqc_junctionsaturation/config.vsh.yaml
@@ -0,0 +1,105 @@
 name: "rseqc_junctionsaturation"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/junctionsaturation/main.nf]
    last_sha: 
 description: |
  Compare detected splice junctions to reference gene model.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--refgene"
    type: file 
    required: true
    description: Reference gene model in bed format
  - name: "--sampling_percentile_lower_bound"
    type: integer
    required: false
    default: 5 
    description: Sampling starts from this percentile, must be an integer between 0 and 100, default =5.
    min: 0
    max: 100
  - name: "--sampling_percentile_upper_bound"
    type: integer
    required: false
    default: 100
    description: Sampling ends at this percentile, must be an integer between 0 and 100, default =5.
    min: 0
    max: 100
  - name: "--sampling_percentile_step"
    type: integer
    required: false
    default: 5
    description: Sampling frequency in %. Smaller value means more sampling times. Must be an integer between 0 and 100, default = 5.
    min: 0
    max: 100
  - name: "--min_intron"
    type: integer
    required: false
    default: 50
    min: 1 
    description: Minimum intron length (bp), default = 50.
  - name: "--min_splice_read"
    type: integer
    required: false
    default: 1
    min: 1 
    description: Minimum number of supporting reads to call a junction, default = 1.
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
 - name: "Output"
  arguments: 
  - name: "--output_plot_r"
    type: file
    direction: output
    required: false
    default: $id.junction_saturation_plot.r
    description: r script to generate junction_saturation_plot plot
  - name: "--output_plot"
    type: file
    direction: output
    required: false
    default: $id.junction_saturation_plot.pdf
    description: plot of junction saturation (pdf)
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/test.bed
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ python3-pip, r-base]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_junctionsaturation/script.sh
+++ b/src/rseqc/rseqc_junctionsaturation/script.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 set -eo pipefail 
 prefix=$(openssl rand -hex 8)
 junction_saturation.py \
    -i $par_input \
    -r $par_refgene \
    -o $prefix \
    -l $par_sampling_percentile_lower_bound \
    -u $par_sampling_percentile_upper_bound \
    -s $par_sampling_percentile_step \
    -m $par_min_intron \
    -v $par_min_splice_read \
    -q $par_map_qual
 [[ -f "$prefix.junctionSaturation_plot.pdf" ]] && mv $prefix.junctionSaturation_plot.pdf $par_output_plot
 [[ -f "$prefix.junctionSaturation_plot.r" ]] && mv $prefix.junctionSaturation_plot.r $par_output_plot_r
--- a/src/rseqc/rseqc_junctionsaturation/test.sh
+++ b/src/rseqc/rseqc_junctionsaturation/test.sh
@@ -0,0 +1,30 @@
 #!/bin/bash
 gunzip "$meta_resources_dir/hg19_RefSeq.bed.gz"
 # define input and output for script
 input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
 input_bed="$meta_resources_dir/test.bed"
 output_plot="junction_saturation_plot.pdf"
 output_plot_r="junction_saturation_plot.r"
 # run executable and test
 echo "> Running $meta_functionality_name"
 "$meta_executable" \
    --input "$input_bam" \
    --refgene "$input_bed" \
    --output_plot_r "$output_plot_r" \
    --output_plot "$output_plot"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> asserting  all output files were created"
 [ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
 [ ! -s "$output_plot_r" ] && echo "$output_plot_r is empty" && exit 1
 [ ! -f "$output_plot" ] && echo "$output_plot was not created" && exit 1
 [ ! -s "$output_plot" ] && echo "$output_plot is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_readdistribution/config.vsh.yaml
+++ b/src/rseqc/rseqc_readdistribution/config.vsh.yaml
@@ -0,0 +1,52 @@
 name: "rseqc_readdistribution"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/readdistribution/main.nf]
    last_sha: 
 description: |
  Calculate how mapped reads are distributed over genomic features.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--refgene"
    type: file 
    required: true
    description: Reference gene model in bed format
 - name: "Output"
  arguments:   
  - name: "--output"
    type: file
    direction: output
    required: false
    default: $id.read_distribution.txt
    description: output file (txt) of read distribution analysis.
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
  - path: /testData/unit_test_resources/sarscov2/test.bed12
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: apt
      packages: [ python3-pip ]
    - type: python
      packages: [ RSeQC ]
 runners: 
 - type: executable
 - type: nextflow
--- a/src/rseqc/rseqc_readdistribution/script.sh
+++ b/src/rseqc/rseqc_readdistribution/script.sh
@@ -0,0 +1,8 @@
 #!/bin/bash
 set -eo pipefail 
 read_distribution.py \
    -i $par_input \
    -r $par_refgene \
 > $par_output
--- a/src/rseqc/rseqc_readdistribution/test.sh
+++ b/src/rseqc/rseqc_readdistribution/test.sh
@@ -0,0 +1,24 @@
 #!/bin/bash
 # define input and output for script
 input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
 input_bed="$meta_resources_dir/test.bed12"
 output="read_distribution.txt"
 # run executable and test
 echo "> Running $meta_functionality_name"
 "$meta_executable" \
    --input "$input_bam" \
    --refgene "$input_bed" \
    --output "$output"
 exit_code=$?
 [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
 echo ">> Asserting output file was created"
 [ ! -f "$output" ] && echo "$output was not created" && exit 1
 [ ! -f "$output" ] && echo "$output is empty" && exit 1
 exit 0
--- a/src/rseqc/rseqc_readduplication/config.vsh.yaml
+++ b/src/rseqc/rseqc_readduplication/config.vsh.yaml
@@ -0,0 +1,82 @@
 name: "rseqc_readduplication"
 namespace: "rseqc"
 info:
  migration_info:
    git_repo: https://github.com/nf-core/rnaseq.git
    paths: [modules/nf-core/rseqc/readduplication/main.nf]
    last_sha: 
 description: |
  Calculate read duplication rate.
 argument_groups:
 - name: "Input"
  arguments: 
  - name: "--input"
    type: file 
    required: true
    description: input alignment file in BAM or SAM format
  - name: "--read_count_upper_limit"
    type: integer
    required: false
    default: 500
    description: Upper limit of reads' occurence. Only used for plotting, default = 500 (times).
    min: 1
  - name: "--map_qual"
    type: integer
    required: false
    default: 30 
    description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
    min: 0
 - name: "Output"
  arguments: 
  - name: "--output_duplication_rate_plot_r"
    type: file
    direction: output
    required: false
    default: $id.duplication_rate_plot.r
    description: R script for generating duplication rate plot
  - name: "--output_duplication_rate_plot"
    type: file
    direction: output
    required: false
    default: $id.duplication_rate_plot.pdf
    description: duplication rate plot (pdf)
  - name: "--output_duplication_rate_mapping"
    type: file
    direction: output
    required: false
    default: $id.duplication_rate_mapping.xls
    description: Summary of mapping-based read duplication
  - name: "--output_duplication_rate_sequence"
    type: file
    direction: output
    required: false
    default: $id.duplication_rate_sequencing.xls
    description: Summary of sequencing-based read duplication
 resources:
  - type: bash_script
    path: script.sh
 test_resources:
  - type: bash_script
    path: test.sh
  - path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
 engines:
 - type: docker
  image: ubuntu:22.04
  setup:   
    - type: "apt"
      packages: [python3-pip, r-base]
    - type: python
      packages: [RSeQC]
 runners: 
 - type: executable
 - type: nextflow
--- a/Show More
+++ b/Show More