Build branch main with version main (da414e7)

Build pipeline: viash-hub.biobox.main-7dwhr Source commit: da414e72c6 Source message: Add star solo component (#62) * add star solo component * change arguments from camelCase to snake_case * get rid of multiple_sep * drop star_solo component and just add arguments to star_align_reads * Update src/star/star_align_reads/script.py Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com> --------- Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
2024-07-29 08:14:56 +00:00
parent ae3fef8eec
commit 02c1c9ebea
309 changed files with 58822 additions and 4148 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,7 @@

 # IDE ignores
 .idea/
+.vscode/

 # R specific ignores
 .Rhistory
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,25 +1,60 @@
 # biobox x.x.x

-## BUG FIXES
-
-* `pear`: fix component not exiting with the correct exitcode when PEAR fails.
-
-* `cutadapt`: fix `--par_quality_cutoff_r2` argument.
-
-* `cutadapt`: demultiplexing is now disabled by default. It can be re-enabled by using `demultiplex_mode`.
-
-## MINOR CHANGES
-
-* `busco` components: update BUSCO to `5.7.1`.
-
-# biobox 0.1.0
-
 ## BREAKING CHANGES

-* Change default `multiple_sep` to `;` (PR #25). This aligns with an upcoming breaking change in
-  Viash 0.9.0 in order to avoid issues with the current default separator `:` unintentionally
-  splitting up certain file paths.
+* `star/star_align_reads`: Change all arguments from `--camelCase` to `--snake_case` (PR #62).

+* `star/star_genome_generate`: Change all arguments from `--camelCase` to `--snake_case` (PR #62).
+
+## NEW FUNCTIONALITY
+
+* `star/star_align_reads`: Add star solo related arguments (PR #62).
+
+* `bd_rhapsody/bd_rhapsody_make_reference`: Create a reference for the BD Rhapsody pipeline (PR #75).
+
+* `umitools/umitools_dedup`: Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read (PR #54).
+
+* `seqtk`:
+  - `seqtk/seqtk_sample`: Subsamples sequences from FASTA/Q files (PR #68).
+  - `seqtk/seqtk_subseq`: Extract the sequences (complete or subsequence) from the FASTA/FASTQ files
+                based on a provided sequence IDs or region coordinates file (PR #85).
+
+* `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
+
+## MINOR CHANGES
+
+* `busco` components: update BUSCO to `5.7.1` (PR #72).
+
+* Update CI to reusable workflow in `viash-io/viash-actions` (PR #86).
+
+## DOCUMENTATION
+
+* Extend the contributing guidelines (PR #82):
+
+  - Update format to Viash 0.9.
+
+  - Descriptions should be formatted in markdown.
+
+  - Add defaults to descriptions, not as a default of the argument.
+
+  - Explain parameter expansion.
+
+  - Mention that the contents of the output of components in tests should be checked.
+
+* Add authorship to existing components (PR #88).
+
+## BUG FIXES
+
+* `pear`: fix component not exiting with the correct exitcode when PEAR fails (PR #70).
+
+* `cutadapt`: fix `--par_quality_cutoff_r2` argument (PR #69).
+
+* `cutadapt`: demultiplexing is now disabled by default. It can be re-enabled by using `demultiplex_mode` (PR #69).
+
+* `multiqc`: update multiple separator to `;` (PR #81).
+
+
+# biobox 0.1.0

 ## NEW FEATURES

@@ -66,6 +101,10 @@
    - `samtools/samtools_collate`: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).
    - `samtools/samtools_view`: Views and converts SAM/BAM/CRAM files (PR #48).
    - `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).
+    - `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTA (PR #53).
+
+* `umi_tools`:
+    -`umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).

 * `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).

@@ -93,4 +132,4 @@

 * Add escaping character before leading hashtag in the description field of the config file (PR #50).

-* Format URL in biobase/bcl_convert description (PR #55).
+* Format URL in biobase/bcl_convert description (PR #55).
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -65,22 +65,21 @@ runners:
 Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.

 ```yaml
-functionality:
-  name: arriba
-  description: Detect gene fusions from RNA-Seq data
-  keywords: [Gene fusion, RNA-Seq]
-  links:
-    homepage: https://arriba.readthedocs.io/en/latest/
-    documentation: https://arriba.readthedocs.io/en/latest/
-    repository: https://github.com/suhrig/arriba
-    issue_tracker: https://github.com/suhrig/arriba/issues
-  references:
-    doi: 10.1101/gr.257246.119
-    bibtex: |
-      @article{
-        ... a bibtex entry in case the doi is not available ...
-      }
-  license: MIT
+name: arriba
+description: Detect gene fusions from RNA-Seq data
+keywords: [Gene fusion, RNA-Seq]
+links:
+  homepage: https://arriba.readthedocs.io/en/latest/
+  documentation: https://arriba.readthedocs.io/en/latest/
+  repository: https://github.com/suhrig/arriba
+  issue_tracker: https://github.com/suhrig/arriba/issues
+references:
+  doi: 10.1101/gr.257246.119
+  bibtex: |
+    @article{
+      ... a bibtex entry in case the doi is not available ...
+    }
+license: MIT
 ```

 ### Step 4: Find a suitable container
@@ -162,7 +161,7 @@ argument_groups:
      type: file
      description: |
        File in SAM/BAM/CRAM format with main alignments as generated by STAR
-        (Aligned.out.sam). Arriba extracts candidate reads from this file.
+        (`Aligned.out.sam`). Arriba extracts candidate reads from this file.
      required: true
      example: Aligned.out.bam
 ```
@@ -175,7 +174,7 @@ Several notes:

 * Input arguments can have `multiple: true` to allow the user to specify multiple files.

-
+* The description should be formatted in markdown.

 ### Step 8: Add arguments for the output files

@@ -220,7 +219,7 @@ argument_groups:

 Note: 

-* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
+* Preferably, these outputs should not be directories but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).

 ### Step 9: Add arguments for the other arguments

@@ -230,6 +229,8 @@ Finally, add all other arguments to the config file. There are a few exceptions:

 * Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.

+* If the help lists defaults, do not add them as defaults but to the description. Example: `description: <Explanation of parameter>. Default: 10.`
+

 ### Step 10: Add a Docker engine

@@ -275,10 +276,13 @@ Next, we need to write a runner script that runs the tool with the input argumen
 ## VIASH START
 ## VIASH END

+# unset flags
+[[ "$par_option" == "false" ]] && unset par_option
+
 xxx \
  --input "$par_input" \
  --output "$par_output" \
-  $([ "$par_option" = "true" ] && echo "--option")
+  ${par_option:+--option}
 ```

 When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
@@ -291,6 +295,11 @@ As an example, this is what the Bash script for the `arriba` component looks lik
 ## VIASH START
 ## VIASH END

+# unset flags
+[[ "$par_skip_duplicate_marking" == "false" ]] && unset par_skip_duplicate_marking
+[[ "$par_extra_information" == "false" ]] && unset par_extra_information
+[[ "$par_fill_gaps" == "false" ]] && unset par_fill_gaps
+
 arriba \
  -x "$par_bam" \
  -a "$par_genome" \
@@ -298,26 +307,30 @@ arriba \
  -o "$par_fusions" \
  ${par_known_fusions:+-k "${par_known_fusions}"} \
  ${par_blacklist:+-b "${par_blacklist}"} \
-  ${par_structural_variants:+-d "${par_structural_variants}"} \
-  $([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
-  $([ "$par_extra_information" = "true" ] && echo "-X") \
-  $([ "$par_fill_gaps" = "true" ] && echo "-I")
+  # ...
+  ${par_extra_information:+-X} \
+  ${par_fill_gaps:+-I}
 ```

+Notes:
+
+* If your arguments can contain special variables (e.g. `$`), you can use quoting (need to find a documentation page for this) to make sure you can use the string as input. Example: `-x ${par_bam@Q}`.
+
+* Optional arguments can be passed to the command conditionally using Bash [parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html). For example: `${par_known_fusions:+-k ${par_known_fusions@Q}}`
+
+* If your tool allows for multiple inputs using a separator other than `;` (which is the default Viash multiple separator), you can substitute these values with a command like: `par_disable_filters=$(echo $par_disable_filters | tr ';' ',')`.
+

 ### Step 12: Create test script

-
 If the unit test requires test resources, these should be provided in the `test_resources` section of the component. 

 ```yaml
-functionality:
-  # ...
-  test_resources:
-    - type: bash_script
-      path: test.sh
-    - type: file
-      path: test_data
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
 ```

 Create a test script at `src/xxx/test.sh` that runs the component with the test data. This script should run the component (available with `$meta_executable`) with the test data and check if the output is as expected. The script should exit with a non-zero exit code if the output is not as expected. For example:
@@ -325,48 +338,64 @@ Create a test script at `src/xxx/test.sh` that runs the component with the test
 ```bash
 #!/bin/bash

+set -e
+
 ## VIASH START
 ## VIASH END

-echo "> Run xxx with test data"
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_doesnt_exist() {
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
+}
+assert_file_empty() {
+  [ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_file_not_contains() {
+  grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
+}
+assert_file_contains_regex() {
+  grep -q -E "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_file_not_contains_regex() {
+  grep -q -E "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
+}
+#############################################
+
+echo "> Run $meta_name with test data"
 "$meta_executable" \
-  --input "$meta_resources_dir/test_data/input.txt" \
+  --input "$meta_resources_dir/test_data/reads_R1.fastq" \
  --output "output.txt" \
  --option

-echo ">> Checking output"
-[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
-```
-
-
-For example, this is what the test script for the `arriba` component looks like:
-
-```bash
-#!/bin/bash
-
-## VIASH START
-## VIASH END
-
-echo "> Run arriba with blacklist"
-"$meta_executable" \
-  --bam "$meta_resources_dir/test_data/A.bam" \
-  --genome "$meta_resources_dir/test_data/genome.fasta" \
-  --gene_annotation "$meta_resources_dir/test_data/annotation.gtf" \
-  --blacklist "$meta_resources_dir/test_data/blacklist.tsv" \
-  --fusions "fusions.tsv" \
-  --fusions_discarded "fusions_discarded.tsv" \
-  --interesting_contigs "1,2"
-
-echo ">> Checking output"
-[ ! -f "fusions.tsv" ] && echo "Output file fusions.tsv does not exist" && exit 1
-[ ! -f "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv does not exist" && exit 1
+echo ">> Check if output exists"
+assert_file_exists "output.txt"

 echo ">> Check if output is empty"
-[ ! -s "fusions.tsv" ] && echo "Output file fusions.tsv is empty" && exit 1
-[ ! -s "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv is empty" && exit 1
+assert_file_not_empty "output.txt"
+
+echo ">> Check if output is correct"
+assert_file_contains "output.txt" "some expected output"
+
+echo "> All tests succeeded!"
 ```

-### Step 12: Create a `/var/software_versions.txt` file
+Notes:
+
+* Do always check the contents of the output file. If the output is not deterministic, you can use regular expressions to check the output.
+
+* If possible, generate your own test data instead of copying it from an external resource.
+
+### Step 13: Create a `/var/software_versions.txt` file

 For the sake of transparency and reproducibility, we require that the versions of the software used in the component are documented.

@@ -378,6 +407,8 @@ engines:
    image: quay.io/biocontainers/xxx:0.1.0--py_0
    setup:
      - type: docker
+        # note: /var/software_versions.txt should contain:
+        #   arriba: "2.4.0"
        run: |
          echo "xxx: \"0.1.0\"" > /var/software_versions.txt
 ```
--- a/src/_authors/angela_o_pisco.yaml
+++ b/src/_authors/angela_o_pisco.yaml
@@ -0,0 +1,14 @@
+name: Angela Oliveira Pisco
+info:
+  role: Contributor
+  links:
+    github: aopisco
+    orcid: "0000-0003-0142-2355"
+    linkedin: aopisco
+  organizations:
+    - name: Insitro
+      href: https://insitro.com
+      role: Director of Computational Biology
+    - name: Open Problems
+      href: https://openproblems.bio
+      role: Core Member
--- a/src/_authors/dorien_roosen.yaml
+++ b/src/_authors/dorien_roosen.yaml
@@ -0,0 +1,10 @@
+name: Dorien Roosen
+info:
+  links:
+    email: dorien@data-intuitive.com
+    github: dorien-er
+    linkedin: dorien-roosen
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Data Scientist
--- a/src/_authors/dries_schaumont.yaml
+++ b/src/_authors/dries_schaumont.yaml
@@ -0,0 +1,11 @@
+name: Dries Schaumont
+info:
+  links:
+    email: dries@data-intuitive.com
+    github: DriesSchaumont
+    orcid: "0000-0002-4389-0440"
+    linkedin: dries-schaumont
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Data Scientist
--- a/src/_authors/emma_rousseau.yaml
+++ b/src/_authors/emma_rousseau.yaml
@@ -0,0 +1,10 @@
+name: Emma Rousseau
+info:
+  links:
+    email: emma@data-intuitive.com
+    github: emmarousseau
+    linkedin: emmarousseau1
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Bioinformatician
--- a/src/_authors/jakub_majercik.yaml
+++ b/src/_authors/jakub_majercik.yaml
@@ -0,0 +1,10 @@
+name: Jakub Majercik
+info:
+  links:
+    email: jakub@data-intuitive.com
+    github: jakubmajercik
+    linkedin: jakubmajercik
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Bioinformatics Engineer
--- a/src/_authors/kai_waldrant.yaml
+++ b/src/_authors/kai_waldrant.yaml
@@ -0,0 +1,14 @@
+name: Kai Waldrant
+info:
+  links:
+    email: kai@data-intuitive.com
+    github: KaiWaldrant
+    orcid: "0009-0003-8555-1361"
+    linkedin: kaiwaldrant
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Bioinformatician
+    - name: Open Problems
+      href: https://openproblems.bio
+      role: Contributor
--- a/src/_authors/leila_paquay.yaml
+++ b/src/_authors/leila_paquay.yaml
@@ -0,0 +1,10 @@
+name: Leïla Paquay
+info:
+  links:
+    email: leila@data-intuitive.com
+    github: Leila011
+    linkedin: leilapaquay
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Software Developer
--- a/src/_authors/robrecht_cannoodt.yaml
+++ b/src/_authors/robrecht_cannoodt.yaml
@@ -0,0 +1,14 @@
+name: Robrecht Cannoodt
+info:
+  links:
+    email: robrecht@data-intuitive.com
+    github: rcannood
+    orcid: "0000-0003-3641-729X"
+    linkedin: robrechtcannoodt
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Data Science Engineer
+    - name: Open Problems
+      href: https://openproblems.bio
+      role: Core Member
--- a/src/_authors/sai_nirmayi_yasa.yaml
+++ b/src/_authors/sai_nirmayi_yasa.yaml
@@ -0,0 +1,10 @@
+name: Sai Nirmayi Yasa
+info:
+  links:
+    email: nirmayi@data-intuitive.com
+    github: sainirmayi
+    linkedin: sai-nirmayi-yasa
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Junior Bioinformatics Researcher
--- a/src/_authors/theodoro_gasperin.yaml
+++ b/src/_authors/theodoro_gasperin.yaml
@@ -0,0 +1,10 @@
+name: Theodoro Gasperin Terra Camargo
+info:
+  links:
+    email: theodorogtc@gmail.com
+    github: tgaspe
+    linkedin: theodoro-gasperin-terra-camargo
+  organizations:
+    - name: Data Intuitive
+      href: https://www.data-intuitive.com
+      role: Bioinformatician
--- a/src/_authors/toni_verbeiren.yaml
+++ b/src/_authors/toni_verbeiren.yaml
@@ -0,0 +1,9 @@
+name: Toni Verbeiren
+info:
+  links:
+    github: tverbeiren
+    linkedin: verbeiren
+  organizations:
+  - name: Data Intuitive
+    href: https://www.data-intuitive.com
+    role: Data Scientist and CEO
--- a/src/_authors/weiwei_schultz.yaml
+++ b/src/_authors/weiwei_schultz.yaml
@@ -0,0 +1,5 @@
+name: Weiwei Schultz
+info:
+  organizations:
+    - name: Janssen R&D US
+      role: Associate Director Data Sciences
--- a/src/agat/agat_convert_sp_gff2gtf/config.vsh.yaml
+++ b/src/agat/agat_convert_sp_gff2gtf/config.vsh.yaml
@@ -0,0 +1,94 @@
+name: agat_convert_sp_gff2gtf
+namespace: agat
+description: |
+  The script aims to convert any GTF/GFF file into a proper GTF file. Full
+  information about the format can be found here:
+  https://agat.readthedocs.io/en/latest/gxf.html You can choose among 7
+  different GTF types (1, 2, 2.1, 2.2, 2.5, 3 or relax). Depending the
+  version selected the script will filter out the features that are not
+  accepted. For GTF2.5 and 3, every level1 feature (e.g nc_gene
+  pseudogene) will be converted into gene feature and every level2 feature
+  (e.g mRNA ncRNA) will be converted into transcript feature. Using the
+  "relax" option you will produce a GTF-like output keeping all original
+  feature types (3rd column). No modification will occur e.g. mRNA to
+  transcript.
+
+  To be fully GTF compliant all feature have a gene_id and a transcript_id
+  attribute. The gene_id is unique identifier for the genomic source of
+  the transcript, which is used to group transcripts into genes. The
+  transcript_id is a unique identifier for the predicted transcript, which
+  is used to group features into transcripts.
+keywords: [gene annotations, GTF conversion]
+links:
+  homepage: https://github.com/NBISweden/AGAT
+  documentation: https://agat.readthedocs.io/
+  issue_tracker: https://github.com/NBISweden/AGAT/issues
+  repository: https://github.com/NBISweden/AGAT
+references: 
+  doi: 10.5281/zenodo.3552717
+license: GPL-3.0
+authors:
+  - __merge__: /src/_authors/leila_paquay.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --gff
+        alternatives: [-i]
+        description: Input GFF/GTF file that will be read
+        type: file
+        required: true
+        direction: input
+        example: input.gff
+  - name: Outputs
+    arguments:       
+      - name: --output
+        alternatives: [-o, --out, --outfile, --gtf]
+        description: Output GTF file. If no output file is specified, the output will be written to STDOUT.
+        type: file
+        direction: output
+        required: true
+        example: output.gtf
+  - name: Arguments
+    arguments:
+      - name: --gtf_version
+        description: |
+          Version of the GTF output (1,2,2.1,2.2,2.5,3 or relax). Default value from AGAT config file (relax for the default config). The script option has the higher priority.  
+          
+            * relax: all feature types are accepted.  
+            * GTF3 (9 feature types accepted): gene, transcript, exon, CDS, Selenocysteine, start_codon, stop_codon, three_prime_utr and five_prime_utr.  
+            * GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS, UTR, start_codon, stop_codon, Selenocysteine.  
+            * GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon, 5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon.  
+            * GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon, exon, 5UTR, 3UTR.  
+            * GTF2 (4 feature types accepted): CDS, start_codon, stop_codon, exon.  
+            * GTF1 (5 feature types accepted): CDS, start_codon, stop_codon, exon, intron.  
+        type: string
+        choices: [relax, "1", "2", "2.1", "2.2", "2.5", "3"]
+        required: false
+        example: "3"
+      - name: --config
+        alternatives: [-c]
+        description: |
+          Input agat config file. By default AGAT takes as input agat_config.yaml file from the working directory if any, otherwise it takes the orignal agat_config.yaml shipped with AGAT. To get the agat_config.yaml locally type: "agat config --expose". The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
+        type: file
+        required: false
+        example: custom_agat_config.yaml
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
+    setup:
+      - type: docker
+        run: |
+          agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
+runners:
+  - type: executable
+  - type: nextflow
--- a/src/agat/agat_convert_sp_gff2gtf/help.txt
+++ b/src/agat/agat_convert_sp_gff2gtf/help.txt
@@ -0,0 +1,102 @@
+```sh
+agat_convert_sp_gff2gtf.pl --help
+```
+ ------------------------------------------------------------------------------
+|   Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0                      |
+|   https://github.com/NBISweden/AGAT                                          |
+|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
+ ------------------------------------------------------------------------------
+
+
+Name:
+    agat_convert_sp_gff2gtf.pl
+
+Description:
+    The script aims to convert any GTF/GFF file into a proper GTF file. Full
+    information about the format can be found here:
+    https://agat.readthedocs.io/en/latest/gxf.html You can choose among 7
+    different GTF types (1, 2, 2.1, 2.2, 2.5, 3 or relax). Depending the
+    version selected the script will filter out the features that are not
+    accepted. For GTF2.5 and 3, every level1 feature (e.g nc_gene
+    pseudogene) will be converted into gene feature and every level2 feature
+    (e.g mRNA ncRNA) will be converted into transcript feature. Using the
+    "relax" option you will produce a GTF-like output keeping all original
+    feature types (3rd column). No modification will occur e.g. mRNA to
+    transcript.
+
+    To be fully GTF compliant all feature have a gene_id and a transcript_id
+    attribute. The gene_id is unique identifier for the genomic source of
+    the transcript, which is used to group transcripts into genes. The
+    transcript_id is a unique identifier for the predicted transcript, which
+    is used to group features into transcripts.
+
+Usage:
+        agat_convert_sp_gff2gtf.pl --gff infile.gff [ -o outfile ]
+        agat_convert_sp_gff2gtf -h
+
+Options:
+    --gff, --gtf or -i
+            Input GFF/GTF file that will be read
+
+    --gtf_version version of the GTF output (1,2,2.1,2.2,2.5,3 or relax).
+    Default value from AGAT config file (relax for the default config). The
+    script option has the higher priority.
+            relax: all feature types are accepted.
+
+            GTF3 (9 feature types accepted): gene, transcript, exon, CDS,
+            Selenocysteine, start_codon, stop_codon, three_prime_utr and
+            five_prime_utr
+
+            GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS,
+            UTR, start_codon, stop_codon, Selenocysteine
+
+            GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon,
+            5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon
+
+            GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon,
+            exon, 5UTR, 3UTR
+
+            GTF2 (4 feature types accepted): CDS, start_codon, stop_codon,
+            exon
+
+            GTF1 (5 feature types accepted): CDS, start_codon, stop_codon,
+            exon, intron
+
+    -o , --output , --out , --outfile or --gtf
+            Output GTF file. If no output file is specified, the output will
+            be written to STDOUT.
+
+    -c or --config
+            String - Input agat config file. By default AGAT takes as input
+            agat_config.yaml file from the working directory if any,
+            otherwise it takes the orignal agat_config.yaml shipped with
+            AGAT. To get the agat_config.yaml locally type: "agat config
+            --expose". The --config option gives you the possibility to use
+            your own AGAT config file (located elsewhere or named
+            differently).
+
+    -h or --help
+            Display this helpful text.
+
+Feedback:
+  Did you find a bug?:
+    Do not hesitate to report bugs to help us keep track of the bugs and
+    their resolution. Please use the GitHub issue tracking system available
+    at this address:
+
+                https://github.com/NBISweden/AGAT/issues
+
+     Ensure that the bug was not already reported by searching under Issues.
+     If you're unable to find an (open) issue addressing the problem, open a new one.
+     Try as much as possible to include in the issue when relevant:
+     - a clear description,
+     - as much relevant information as possible,
+     - the command used,
+     - a data sample,
+     - an explanation of the expected behaviour that is not occurring.
+
+  Do you want to contribute?:
+    You are very welcome, visit this address for the Contributing
+    guidelines:
+    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
+
--- a/src/agat/agat_convert_sp_gff2gtf/script.sh
+++ b/src/agat/agat_convert_sp_gff2gtf/script.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+agat_convert_sp_gff2gtf.pl \
+  -i "$par_gff" \
+  -o "$par_output" \
+  ${par_gtf_version:+--gtf_version "${par_gtf_version}"} \
+  ${par_config:+--config "${par_config}"}
--- a/src/agat/agat_convert_sp_gff2gtf/test.sh
+++ b/src/agat/agat_convert_sp_gff2gtf/test.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+test_dir="${meta_resources_dir}/test_data"
+
+echo "> Run $meta_name with test data"
+"$meta_executable" \
+  --gff "$test_dir/0_test.gff" \
+  --output "output.gtf" 
+
+echo ">> Checking output"
+[ ! -f "output.gtf" ] && echo "Output file output.gtf does not exist" && exit 1
+
+echo ">> Check if output is empty"
+[ ! -s "output.gtf" ] && echo "Output file output.gtf is empty" && exit 1
+
+echo ">> Check if the conversion resulted in the right GTF format"
+idGFF=$(head -n 2 "$test_dir/0_test.gff" | grep -o 'ID=[^;]*' | cut -d '=' -f 2-)
+expectedGTF="gene_id \"$idGFF\"; ID \"$idGFF\";"
+extractedGTF=$(head -n 3 "output.gtf" | grep -o 'gene_id "[^"]*"; ID "[^"]*";')
+[ "$extractedGTF" != "$expectedGTF" ] && echo "Output file output.gtf does not have the right format" && exit 1
+
+rm output.gtf
+
+echo "> Run $meta_name with test data and GTF version 2.5"
+"$meta_executable" \
+  --gff "$test_dir/0_test.gff" \
+  --output "output.gtf" \
+  --gtf_version "2.5"
+
+echo ">> Check if the output file header display the right GTF version"
+grep -q "##gtf-version 2.5" "output.gtf"
+[ $? -ne 0 ] && echo "Output file output.gtf header does not display the right GTF version" && exit 1
+
+echo "> Test successful"
--- a/src/agat/agat_convert_sp_gff2gtf/test_data/0_test.gff
+++ b/src/agat/agat_convert_sp_gff2gtf/test_data/0_test.gff
@@ -0,0 +1,36 @@
+##gff-version 3
+scaffold625	maker	gene	337818	343277	.	+	.	ID=CLUHARG00000005458;Name=TUBB3_2
+scaffold625	maker	mRNA	337818	343277	.	+	.	ID=CLUHART00000008717;Parent=CLUHARG00000005458
+scaffold625	maker	exon	337818	337971	.	+	.	ID=CLUHART00000008717:exon:1404;Parent=CLUHART00000008717
+scaffold625	maker	exon	340733	340841	.	+	.	ID=CLUHART00000008717:exon:1405;Parent=CLUHART00000008717
+scaffold625	maker	exon	341518	341628	.	+	.	ID=CLUHART00000008717:exon:1406;Parent=CLUHART00000008717
+scaffold625	maker	exon	341964	343277	.	+	.	ID=CLUHART00000008717:exon:1407;Parent=CLUHART00000008717
+scaffold625	maker	CDS	337915	337971	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
+scaffold625	maker	CDS	340733	340841	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
+scaffold625	maker	CDS	341518	341628	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
+scaffold625	maker	CDS	341964	343033	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
+scaffold625	maker	five_prime_UTR	337818	337914	.	+	.	ID=CLUHART00000008717:five_prime_utr;Parent=CLUHART00000008717
+scaffold625	maker	three_prime_UTR	343034	343277	.	+	.	ID=CLUHART00000008717:three_prime_utr;Parent=CLUHART00000008717
+scaffold789	maker	gene	558184	564780	.	+	.	ID=CLUHARG00000003852;Name=PF11_0240
+scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006146;Parent=CLUHARG00000003852
+scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006146:exon:995;Parent=CLUHART00000006146
+scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006146:exon:996;Parent=CLUHART00000006146
+scaffold789	maker	exon	564171	564235	.	+	.	ID=CLUHART00000006146:exon:997;Parent=CLUHART00000006146
+scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006146:exon:998;Parent=CLUHART00000006146
+scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
+scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
+scaffold789	maker	CDS	564171	564235	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
+scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
+scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006146:five_prime_utr;Parent=CLUHART00000006146
+scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006146:three_prime_utr;Parent=CLUHART00000006146
+scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006147;Parent=CLUHARG00000003852
+scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006147:exon:997;Parent=CLUHART00000006147
+scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006147:exon:998;Parent=CLUHART00000006147
+scaffold789	maker	exon	562057	562121	.	+	.	ID=CLUHART00000006147:exon:999;Parent=CLUHART00000006147
+scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006147:exon:1000;Parent=CLUHART00000006147
+scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
+scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
+scaffold789	maker	CDS	562057	562121	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
+scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
+scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006147:five_prime_utr;Parent=CLUHART00000006147
+scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006147:three_prime_utr;Parent=CLUHART00000006147
--- a/src/agat/agat_convert_sp_gff2gtf/test_data/script.sh
+++ b/src/agat/agat_convert_sp_gff2gtf/test_data/script.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+# clone repo
+if [ ! -d /tmp/agat_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
+fi
+
+# copy test data
+cp -r /tmp/agat_source/t/gff_syntax/in/0_test.gff src/agat/agat_convert_sp_gff2gtf/test_data
--- a/src/arriba/config.vsh.yaml
+++ b/src/arriba/config.vsh.yaml
@@ -11,6 +11,9 @@ license: MIT
 requirements:
  cpus: 1
  commands: [ arriba ]
+authors:
+  - __merge__: /src/_authors/robrecht_cannoodt.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/bcl_convert/config.vsh.yaml
+++ b/src/bcl_convert/config.vsh.yaml
@@ -4,6 +4,17 @@ description: |
  Information about upgrading from bcl2fastq via
  [Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)
  and [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)
+keywords: [demultiplex, fastq, bcl, illumina]
+links:
+  homepage: https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html
+  documentation: https://support.illumina.com/downloads/bcl-convert-user-guide.html
+license: Proprietary
+authors:
+  - __merge__: /src/_authors/toni_verbeiren.yaml
+    roles: [ author, maintainer ]
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author ]
+
 argument_groups:
  - name: Input arguments
    arguments:
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
@@ -0,0 +1,143 @@
+name: bd_rhapsody_make_reference
+namespace: bd_rhapsody
+description: |
+  The Reference Files Generator creates an archive containing Genome Index
+  and Transcriptome annotation files needed for the BD Rhapsody Sequencing
+  Analysis Pipeline. The app takes as input one or more FASTA and GTF files
+  and produces a compressed archive in the form of a tar.gz file. The 
+  archive contains:
+  
+  - STAR index
+  - Filtered GTF file
+keywords: [genome, reference, index, align]
+links:
+  repository: https://bitbucket.org/CRSwDev/cwl/src/master/v2.2.1/Extra_Utilities/
+  documentation: https://bd-rhapsody-bioinfo-docs.genomics.bd.com/resources/extra_utilities.html#make-rhapsody-reference
+license: Unknown
+authors:
+  - __merge__: /src/_authors/robrecht_cannoodt.yaml
+    roles: [ author, maintainer ]
+  - __merge__: /src/_authors/weiwei_schultz.yaml
+    roles: [ contributor ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - type: file
+        name: --genome_fasta
+        required: true
+        description: Reference genome file in FASTA or FASTA.GZ format. The BD Rhapsody Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse.
+        example: genome_sequence.fa.gz
+        multiple: true
+        info:
+          config_key: Genome_fasta
+      - type: file
+        name: --gtf
+        required: true
+        description: |
+          File path to the transcript annotation files in GTF or GTF.GZ format. The Sequence Analysis Pipeline requires the 'gene_name' or 
+          'gene_id' attribute to be set on each gene and exon feature. Gene and exon feature lines must have the same attribute, and exons
+          must have a corresponding gene with the same value. For TCR/BCR assays, the TCR or BCR gene segments must have the 'gene_type' or
+          'gene_biotype' attribute set, and the value should begin with 'TR' or 'IG', respectively.
+        example: transcriptome_annotation.gtf.gz
+        multiple: true
+        info:
+          config_key: Gtf
+      - type: file
+        name: --extra_sequences
+        description: |
+          File path to additional sequences in FASTA format to use when building the STAR index. (e.g. transgenes or CRISPR guide barcodes).
+          GTF lines for these sequences will be automatically generated and combined with the main GTF.
+        required: false
+        multiple: true
+        info:
+          config_key: Extra_sequences
+  - name: Outputs
+    arguments:
+      - type: file
+        name: --reference_archive
+        direction: output
+        required: true
+        description: |
+          A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an
+          input in the BD Rhapsody Sequencing Analysis Pipeline.
+        example: star_index.tar.gz
+  - name: Arguments
+    arguments:
+      - type: string
+        name: --mitochondrial_contigs
+        description: |
+          Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are
+          identified as 'nuclear fragments' in the ATACseq analysis pipeline.
+        required: false
+        multiple: true
+        default: [chrM, chrMT, M, MT]
+        info:
+          config_key: Mitochondrial_contigs
+      - type: boolean_true
+        name: --filtering_off
+        description: |
+          By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features 
+          having the following attribute values are kept:
+
+            - protein_coding
+            - lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97)
+            - IG_LV_gene
+            - IG_V_gene
+            - IG_V_pseudogene
+            - IG_D_gene
+            - IG_J_gene
+            - IG_J_pseudogene
+            - IG_C_gene
+            - IG_C_pseudogene
+            - TR_V_gene
+            - TR_V_pseudogene
+            - TR_D_gene
+            - TR_J_gene
+            - TR_J_pseudogene
+            - TR_C_gene
+
+            If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.
+        info:
+          config_key: Filtering_off
+      - type: boolean_true
+        name: --wta_only_index
+        description: Build a WTA only index, otherwise builds a WTA + ATAC index.
+        info:
+          config_key: Wta_Only
+      - type: string
+        name: --extra_star_params
+        description: Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.
+        example: --limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11
+        required: false
+        info:
+          config_key: Extra_STAR_params
+
+resources:
+  - type: python_script
+    path: script.py
+  - path: make_rhap_reference_2.2.1_nodocker.cwl
+
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - path: test_data
+
+requirements:
+  commands: [ "cwl-runner" ]
+
+engines:
+  - type: docker
+    image: bdgenomics/rhapsody:2.2.1
+    setup:
+      - type: apt
+        packages: [procps]
+      - type: python
+        packages: [cwlref-runner, cwl-runner]
+      - type: docker
+        run: |
+          echo "bdgenomics/rhapsody: 2.2.1" > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/help.txt
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/help.txt
@@ -0,0 +1,66 @@
+```bash
+cwl-runner src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl --help
+```
+
+usage: src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
+       [-h] [--Archive_prefix ARCHIVE_PREFIX]
+       [--Extra_STAR_params EXTRA_STAR_PARAMS]
+       [--Extra_sequences EXTRA_SEQUENCES] [--Filtering_off] --Genome_fasta
+       GENOME_FASTA --Gtf GTF [--Maximum_threads MAXIMUM_THREADS]
+       [--Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS] [--WTA_Only]
+       [job_order]
+
+The Reference Files Generator creates an archive containing Genome Index and
+Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing
+Analysis Pipeline. The app takes as input one or more FASTA and GTF files and
+produces a compressed archive in the form of a tar.gz file. The archive
+contains:\n - STAR index\n - Filtered GTF file
+
+positional arguments:
+  job_order             Job input json file
+
+options:
+  -h, --help            show this help message and exit
+  --Archive_prefix ARCHIVE_PREFIX
+                        A prefix for naming the compressed archive file
+                        containing the Reference genome index and annotation
+                        files. The default value is constructed based on the
+                        input Reference files.
+  --Extra_STAR_params EXTRA_STAR_PARAMS
+                        Additional parameters to pass to STAR when building
+                        the genome index. Specify exactly like how you would
+                        on the command line. Example: --limitGenomeGenerateRAM
+                        48000 --genomeSAindexNbases 11
+  --Extra_sequences EXTRA_SEQUENCES
+                        Additional sequences in FASTA format to use when
+                        building the STAR index. (E.g. phiX genome)
+  --Filtering_off       By default the input Transcript Annotation files are
+                        filtered based on the gene_type/gene_biotype
+                        attribute. Only features having the following
+                        attribute values are are kept: - protein_coding -
+                        lncRNA (lincRNA and antisense for Gencode <
+                        v31/M22/Ensembl97) - IG_LV_gene - IG_V_gene -
+                        IG_V_pseudogene - IG_D_gene - IG_J_gene -
+                        IG_J_pseudogene - IG_C_gene - IG_C_pseudogene -
+                        TR_V_gene - TR_V_pseudogene - TR_D_gene - TR_J_gene -
+                        TR_J_pseudogene - TR_C_gene If you have already pre-
+                        filtered the input Annotation files and/or wish to
+                        turn-off the filtering, please set this option to
+                        True.
+  --Genome_fasta GENOME_FASTA
+                        Reference genome file in FASTA format. The BD
+                        Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38
+                        for Human and GRCm39 for Mouse.
+  --Gtf GTF             Transcript annotation files in GTF format. The BD
+                        Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode
+                        v42 for Human and M31 for Mouse.
+  --Maximum_threads MAXIMUM_THREADS
+                        The maximum number of threads to use in the pipeline.
+                        By default, all available cores are used.
+  --Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS
+                        Names of the Mitochondrial contigs in the provided
+                        Reference Genome. Fragments originating from contigs
+                        other than these are identified as 'nuclear fragments'
+                        in the ATACseq analysis pipeline.
+  --WTA_Only            Build a WTA only index, otherwise builds a WTA + ATAC
+                        index.
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
@@ -0,0 +1,115 @@
+requirements:
+  InlineJavascriptRequirement: {}
+class: CommandLineTool
+label: Reference Files Generator for BD Rhapsodyâ„¢ Sequencing Analysis Pipeline
+cwlVersion: v1.2
+doc: >- 
+    The Reference Files Generator creates an archive containing Genome Index and Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline. The app takes as input one or more FASTA and GTF files and produces a compressed archive in the form of a tar.gz file. The archive contains:\n  - STAR index\n  - Filtered GTF file
+
+
+baseCommand: run_reference_generator.sh 
+inputs: 
+    Genome_fasta:
+        type: File[]
+        label: Reference Genome
+        doc: |-
+            Reference genome file in FASTA format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse.
+        inputBinding:
+            prefix: --reference-genome
+            shellQuote: false
+    Gtf:
+        type: File[]
+        label: Transcript Annotations
+        doc: |-
+            Transcript annotation files in GTF format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode v42 for Human and M31 for Mouse.
+        inputBinding:
+            prefix: --gtf
+            shellQuote: false
+    Extra_sequences:
+        type: File[]?
+        label: Extra Sequences
+        doc: |-
+            Additional sequences in FASTA format to use when building the STAR index. (E.g. phiX genome)
+        inputBinding:
+            prefix: --extra-sequences
+            shellQuote: false
+    Mitochondrial_Contigs:
+        type: string[]?
+        default: ["chrM", "chrMT", "M", "MT"]
+        label: Mitochondrial Contig Names
+        doc: |-
+            Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are identified as 'nuclear fragments' in the ATACseq analysis pipeline.
+        inputBinding:
+            prefix: --mitochondrial-contigs
+            shellQuote: false
+    Filtering_off:
+        type: boolean?
+        label: Turn off filtering
+        doc: |-
+            By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features having the following attribute values are are kept:
+            - protein_coding
+            - lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97)
+            - IG_LV_gene
+            - IG_V_gene
+            - IG_V_pseudogene
+            - IG_D_gene
+            - IG_J_gene
+            - IG_J_pseudogene
+            - IG_C_gene
+            - IG_C_pseudogene
+            - TR_V_gene
+            - TR_V_pseudogene
+            - TR_D_gene
+            - TR_J_gene
+            - TR_J_pseudogene
+            - TR_C_gene
+            If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.
+        inputBinding: 
+            prefix: --filtering-off
+            shellQuote: false
+    WTA_Only:
+        type: boolean?
+        label: WTA only index
+        doc: Build a WTA only index, otherwise builds a WTA + ATAC index.
+        inputBinding:
+            prefix: --wta-only-index
+            shellQuote: false
+    Archive_prefix:
+        type: string?
+        label: Archive Prefix
+        doc: |-
+            A prefix for naming the compressed archive file containing the Reference genome index and annotation files. The default value is constructed based on the input Reference files.
+        inputBinding:
+            prefix: --archive-prefix
+            shellQuote: false
+    Extra_STAR_params:
+        type: string?
+        label: Extra STAR Params
+        doc: |-
+            Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.
+            Example:
+              --limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11
+        inputBinding:
+            prefix: --extra-star-params 
+            shellQuote: true
+  
+    Maximum_threads:
+        type: int?
+        label: Maximum Number of Threads
+        doc: |-
+            The maximum number of threads to use in the pipeline. By default, all available cores are used.
+        inputBinding:
+            prefix: --maximum-threads
+            shellQuote: false
+
+outputs:
+
+    Archive:
+        type: File
+        doc: |- 
+            A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an input in the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline.
+        id: Reference_Archive
+        label: Reference Files Archive
+        outputBinding:
+            glob: '*.tar.gz'
+
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/script.py
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/script.py
@@ -0,0 +1,161 @@
+import os
+import re
+import subprocess
+import tempfile
+from typing import Any
+import yaml
+import shutil
+
+## VIASH START
+par = {
+    "genome_fasta": [],
+    "gtf": [],
+    "extra_sequences": [],
+    "mitochondrial_contigs": ["chrM", "chrMT", "M", "MT"],
+    "filtering_off": False,
+    "wta_only_index": False,
+    "extra_star_params": None,
+    "reference_archive": "output.tar.gz",
+}
+meta = {
+    "config": "target/nextflow/reference/build_bdrhap_2_reference/.config.vsh.yaml",
+    "resources_dir": os.path.abspath("src/reference/build_bdrhap_2_reference"),
+    "temp_dir": os.getenv("VIASH_TEMP"),
+    "memory_mb": None,
+    "cpus": None
+}
+## VIASH END
+
+def clean_arg(argument):
+    argument["clean_name"] = re.sub("^-*", "", argument["name"])
+    return argument
+
+def read_config(path: str) -> dict[str, Any]:
+    with open(path, "r") as f:
+        config = yaml.safe_load(f)
+    
+    config["all_arguments"] = [
+        clean_arg(arg)
+        for grp in config["argument_groups"]
+        for arg in grp["arguments"]
+    ]
+    
+    return config
+
+def strip_margin(text: str) -> str:
+    return re.sub("(\n?)[ \t]*\|", "\\1", text)
+
+def process_params(par: dict[str, Any], config) -> str:
+    # check input parameters
+    assert par["genome_fasta"], "Pass at least one set of inputs to --genome_fasta."
+    assert par["gtf"], "Pass at least one set of inputs to --gtf."
+    assert par["reference_archive"].endswith(".tar.gz"), "Output reference_archive must end with .tar.gz."
+
+    # make paths absolute
+    for argument in config["all_arguments"]:
+        if par[argument["clean_name"]] and argument["type"] == "file":
+            if isinstance(par[argument["clean_name"]], list):
+                par[argument["clean_name"]] = [ os.path.abspath(f) for f in par[argument["clean_name"]] ]
+            else:
+                par[argument["clean_name"]] = os.path.abspath(par[argument["clean_name"]])
+    
+    return par
+
+def generate_config(par: dict[str, Any], meta, config) -> str:
+    content_list = [strip_margin(f"""\
+        |#!/usr/bin/env cwl-runner
+        |
+        |""")]
+        
+    
+    config_key_value_pairs = []
+    for argument in config["all_arguments"]:
+        config_key = (argument.get("info") or {}).get("config_key")
+        arg_type = argument["type"]
+        par_value = par[argument["clean_name"]]
+        if par_value and config_key:
+            config_key_value_pairs.append((config_key, arg_type, par_value))
+
+    if meta["cpus"]:
+        config_key_value_pairs.append(("Maximum_threads", "integer", meta["cpus"]))
+
+    # print(config_key_value_pairs)
+
+    for config_key, arg_type, par_value in config_key_value_pairs:
+        if arg_type == "file":
+            str = strip_margin(f"""\
+                |{config_key}:
+                |""")
+            if isinstance(par_value, list):
+                for file in par_value:
+                    str += strip_margin(f"""\
+                        | - class: File
+                        |   location: "{file}"
+                        |""")
+            else:
+                str += strip_margin(f"""\
+                    |   class: File
+                    |   location: "{par_value}"
+                    |""")
+            content_list.append(str)
+        else:
+            content_list.append(strip_margin(f"""\
+                |{config_key}: {par_value}
+                |"""))
+            
+    ## Write config to file
+    return "".join(content_list)
+
+def get_cwl_file(meta: dict[str, Any]) -> str:
+    # create cwl file (if need be)
+    cwl_file=os.path.join(meta["resources_dir"], "make_rhap_reference_2.2.1_nodocker.cwl")
+
+    return cwl_file
+
+def main(par: dict[str, Any], meta: dict[str, Any]):
+    config = read_config(meta["config"])
+        
+    # Preprocess params
+    par = process_params(par, config)
+
+    # fetch cwl file
+    cwl_file = get_cwl_file(meta)
+
+    # Create output dir if not exists
+    outdir = os.path.dirname(par["reference_archive"])
+    if not os.path.exists(outdir):
+        os.makedirs(outdir)
+
+    ## Run pipeline
+    with tempfile.TemporaryDirectory(prefix="cwl-bd_rhapsody_wta-", dir=meta["temp_dir"]) as temp_dir:
+        # Create params file
+        config_file = os.path.join(temp_dir, "config.yml")
+        config_content = generate_config(par, meta, config)
+        with open(config_file, "w") as f:
+            f.write(config_content)
+
+
+        cmd = [
+            "cwl-runner",
+            "--no-container",
+            "--preserve-entire-environment",
+            "--outdir",
+            temp_dir,
+            cwl_file,
+            config_file
+        ]
+
+        env = dict(os.environ)
+        env["TMPDIR"] = temp_dir
+
+        print("> " + " ".join(cmd), flush=True)
+        _ = subprocess.check_call(
+            cmd,
+            cwd=os.path.dirname(config_file),
+            env=env
+        )
+
+        shutil.move(os.path.join(temp_dir, "Rhap_reference.tar.gz"), par["reference_archive"])
+
+if __name__ == "__main__":
+    main(par, meta)
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/test.sh
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/test.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+
+set -e
+
+#############################################
+# helper functions
+assert_file_exists() {
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
+}
+assert_file_doesnt_exist() {
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
+}
+assert_file_empty() {
+  [ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
+}
+assert_file_not_empty() {
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
+}
+assert_file_contains() {
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
+}
+assert_file_not_contains() {
+  grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
+}
+#############################################
+
+in_fa="$meta_resources_dir/test_data/reference_small.fa"
+in_gtf="$meta_resources_dir/test_data/reference_small.gtf"
+
+echo "#############################################"
+echo "> Simple run"
+
+mkdir simple_run
+cd simple_run
+
+out_tar="myreference.tar.gz"
+
+echo "> Running $meta_name."
+$meta_executable \
+  --genome_fasta "$in_fa" \
+  --gtf "$in_gtf" \
+  --reference_archive "$out_tar" \
+  --extra_star_params "--genomeSAindexNbases 6" \
+  ---cpus 2
+
+exit_code=$?
+[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
+
+assert_file_exists "$out_tar"
+assert_file_not_empty "$out_tar"
+
+echo ">> Checking whether output contains the expected files"
+tar -xvf "$out_tar" > /dev/null
+assert_file_exists "BD_Rhapsody_Reference_Files/star_index/genomeParameters.txt"
+assert_file_exists "BD_Rhapsody_Reference_Files/bwa-mem2_index/reference_small.ann"
+assert_file_exists "BD_Rhapsody_Reference_Files/reference_small-processed.gtf"
+assert_file_exists "BD_Rhapsody_Reference_Files/mitochondrial_contigs.txt"
+assert_file_contains "BD_Rhapsody_Reference_Files/reference_small-processed.gtf" "chr1.*HAVANA.*ENSG00000243485"
+assert_file_contains "BD_Rhapsody_Reference_Files/mitochondrial_contigs.txt" 'chrMT'
+
+cd ..
+
+echo "#############################################"
+
+echo "> Tests succeeded!"
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.fa
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.fa
@@ -0,0 +1,27 @@
+>chr1 1
+TGGGGAAGCAAGGCGGAGTTGGGCAGCTCGTGTTCAATGGGTAGAGTTTCAGGCTGGGGT
+GATGGAAGGGTGCTGGAAATGAGTGGTAGTGATGGCGGCACAACAGTGTGAATCTACTTA
+ATCCCACTGAACTGTATGCTGAAAAATGGTTTAGACGGTGAATTTTAGGTTATGTATGTT
+TTACCACAATTTTTAAAAAGCTAGTGAAAAGCTGGTAAAAAGAAAGAAAAGAGGCTTTTT
+TAAAAAGTTAAATATATAAAAAGAGCATCATCAGTCCAAAGTCCAGCAGTTGTCCCTCCT
+GGAATCCGTTGGCTTGCCTCCGGCATTTTTGGCCCTTGCCTTTTAGGGTTGCCAGATTAA
+AAGACAGGATGCCCAGCTAGTTTGAATTTTAGATAAACAACGAATAATTTCGTAGCATAA
+ATATGTCCCAAGCTTAGTTTGGGACATACTTATGCTAAAAAACATTATTGGTTGTTTATC
+TGAGATTCAGAATTAAGCATTTTATATTTTATTTGCTGCCTCTGGCCACCCTACTCTCTT
+CCTAACACTCTCTCCCTCTCCCAGTTTTGTCCGCCTTCCCTGCCTCCTCTTCTGGGGGAG
+TTAGATCGAGTTGTAACAAGAACATGCCACTGTCTCGCTGGCTGCAGCGTGTGGTCCCCT
+TACCAGAGGTAAAGAAGAGATGGATCTCCACTCATGTTGTAGACAGAATGTTTATGTCCT
+CTCCAAATGCTTATGTTGAAACCCTAACCCCTAATGTGATGGTATGTGGAGATGGGCCTT
+TGGTAGGTAATTACGGTTAGATGAGGTCATGGGGTGGGGCCCTCATTATAGATCTGGTAA
+GAAAAGAGAGCATTGTCTCTGTGTCTCCCTCTCTCTCTCTCTCTCTCTCTCTCATTTCTC
+TCTATCTCATTTCTCTCTCTCTCGCTATCTCATTTTTCTCTCTCTCTCTTTCTCTCCTCT
+GTCTTTTCCCACCAAGTGAGGATGCGAAGAGAAGGTGGCTGTCTGCAAACCAGGAAGAGA
+GCCCTCACCGGGAACCCGTCCAGCTGCCACCTTGAACTTGGACTTCCAAGCCTCCAGAAC
+TGTGAGGGATAAATGTATGATTTTAAAGTCGCCCAGTGTGTGGTATTTTGTTTTGACTAA
+TACAACCTGAAAACATTTTCCCCTCACTCCACCTGAGCAATATCTGAGTGGCTTAAGGTA
+CTCAGGACACAACAAAGGAGAAATGTCCCATGCACAAGGTGCACCCATGCCTGGGTAAAG
+CAGCCTGGCACAGAGGGAAGCACACAGGCTCAGGGATCTGCTATTCATTCTTTGTGTGAC
+CCTGGGCAAGCCATGAATGGAGCTTCAGTCACCCCATTTGTAATGGGATTTAATTGTGCT
+TGCCCTGCCTCCTTTTGAGGGCTGTAGAGAAAAGATGTCAAAGTATTTTGTAATCTGGCT
+GGGCGTGGTGGCTCATGCCTGTAATCCTAGCACTTTGGTAGGCTGACGCGAGAGGACTGC
+T
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.gtf
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/reference_small.gtf
@@ -0,0 +1,8 @@
+chr1	HAVANA	exon	565	668	.	+	.	gene_id "ENSG00000243485.5"; transcript_id "ENST00000473358.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 2; exon_id "ENSE00001922571.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; tag "Ensembl_canonical"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1";
+chr1	HAVANA	exon	977	1098	.	+	.	gene_id "ENSG00000243485.5"; transcript_id "ENST00000473358.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 3; exon_id "ENSE00001827679.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; tag "Ensembl_canonical"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1";
+chr1	HAVANA	transcript	268	1110	.	+	.	gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
+chr1	HAVANA	exon	268	668	.	+	.	gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 1; exon_id "ENSE00001841699.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
+chr1	HAVANA	exon	977	1110	.	+	.	gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 2; exon_id "ENSE00001890064.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
+chr1	ENSEMBL	gene	367	504	.	+	.	gene_id "ENSG00000284332.1"; gene_type "miRNA"; gene_name "MIR1302-2"; level 3; hgnc_id "HGNC:35294";
+chr1	ENSEMBL	transcript	367	504	.	+	.	gene_id "ENSG00000284332.1"; transcript_id "ENST00000607096.1"; gene_type "miRNA"; gene_name "MIR1302-2"; transcript_type "miRNA"; transcript_name "MIR1302-2-201"; level 3; transcript_support_level "NA"; hgnc_id "HGNC:35294"; tag "basic"; tag "Ensembl_canonical";
+chr1	ENSEMBL	exon	367	504	.	+	.	gene_id "ENSG00000284332.1"; transcript_id "ENST00000607096.1"; gene_type "miRNA"; gene_name "MIR1302-2"; transcript_type "miRNA"; transcript_name "MIR1302-2-201"; exon_number 1; exon_id "ENSE00003695741.1"; level 3; transcript_support_level "NA"; hgnc_id "HGNC:35294"; tag "basic"; tag "Ensembl_canonical";
--- a/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh
+++ b/src/bd_rhapsody/bd_rhapsody_make_reference/test_data/script.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+
+TMP_DIR=/tmp/bd_rhapsody_make_reference
+OUT_DIR=src/bd_rhapsody/bd_rhapsody_make_reference/test_data
+
+# check if seqkit is installed
+if ! command -v seqkit &> /dev/null; then
+  echo "seqkit could not be found"
+  exit 1
+fi
+
+# create temporary directory and clean up on exit
+mkdir -p $TMP_DIR
+function clean_up {
+    rm -rf "$TMP_DIR"
+}
+trap clean_up EXIT
+
+# fetch reference
+ORIG_FA=$TMP_DIR/reference.fa.gz
+if [ ! -f $ORIG_FA ]; then
+  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz \
+    -O $ORIG_FA
+fi
+
+ORIG_GTF=$TMP_DIR/reference.gtf.gz
+if [ ! -f $ORIG_GTF ]; then
+  wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz \
+    -O $ORIG_GTF
+fi
+
+# create small reference
+START=30000
+END=31500
+CHR=chr1
+
+# subset to small region
+seqkit grep -r -p "^$CHR\$" "$ORIG_FA" | \
+  seqkit subseq -r "$START:$END" > $OUT_DIR/reference_small.fa
+
+zcat "$ORIG_GTF" | \
+  awk -v FS='\t' -v OFS='\t' "
+    \$1 == \"$CHR\" && \$4 >= $START && \$5 <= $END {
+      \$4 = \$4 - $START + 1;
+      \$5 = \$5 - $START + 1;
+      print;
+    }" > $OUT_DIR/reference_small.gtf
--- a/src/bedtools/bedtools_getfasta/config.vsh.yaml
+++ b/src/bedtools/bedtools_getfasta/config.vsh.yaml
@@ -10,6 +10,9 @@ references:
 license: GPL-2.0
 requirements:
  commands: [bedtools]
+authors:
+  - __merge__: /src/_authors/dries_schaumont.yaml
+    roles: [ author, maintainer ]

 argument_groups:
  - name: Input arguments
--- a/src/busco/busco_download_datasets/config.vsh.yaml
+++ b/src/busco/busco_download_datasets/config.vsh.yaml
@@ -9,6 +9,9 @@ links:
 references:
  doi: 10.1007/978-1-4939-9173-0_14
 license: MIT
+authors:
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/busco/busco_list_datasets/config.vsh.yaml
+++ b/src/busco/busco_list_datasets/config.vsh.yaml
@@ -9,6 +9,9 @@ links:
 references:
  doi: 10.1007/978-1-4939-9173-0_14
 license: MIT
+authors:
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Outputs
    arguments:
--- a/src/busco/busco_run/config.vsh.yaml
+++ b/src/busco/busco_run/config.vsh.yaml
@@ -9,6 +9,9 @@ links:
 references:
  doi: 10.1007/978-1-4939-9173-0_14
 license: MIT
+authors:
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/cutadapt/config.vsh.yaml
+++ b/src/cutadapt/config.vsh.yaml
@@ -9,6 +9,9 @@ links:
 references:
  doi: 10.14806/ej.17.1.200
 license: MIT
+authors:
+  - __merge__: /src/_authors/toni_verbeiren.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  ####################################################################
  - name: Specify Adapters for R1
--- a/src/cutadapt/test.sh
+++ b/src/cutadapt/test.sh
@@ -6,25 +6,25 @@ set -eo pipefail
 #############################################
 # helper functions
 assert_file_exists() {
-  [ -f "$1" ] || (echo "File '$1' does not exist" && exit 1)
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
 }
 assert_file_doesnt_exist() {
-  [ ! -f "$1" ] || (echo "File '$1' exists but shouldn't" && exit 1)
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
 }
 assert_file_empty() {
-  [ ! -s "$1" ] || (echo "File '$1' is not empty but should be" && exit 1)
+  [ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
 }
 assert_file_not_empty() {
-  [ -s "$1" ] || (echo "File '$1' is empty but shouldn't be" && exit 1)
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
 }
 assert_file_contains() {
-  grep -q "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
 }
 assert_file_not_contains() {
-  grep -q "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
+  grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
 }
-
 #############################################
+
 mkdir test_multiple_output
 cd test_multiple_output

--- a/src/falco/config.vsh.yaml
+++ b/src/falco/config.vsh.yaml
@@ -9,6 +9,9 @@ references:
 license: GPL-3.0
 requirements:
  commands: [falco]
+authors:
+  - __merge__: /src/_authors/toni_verbeiren.yaml
+    roles: [ author, maintainer ]

 # Notes:
 # - falco as arguments similar to -subsample and we update those to --subsample
--- a/src/fastp/config.vsh.yaml
+++ b/src/fastp/config.vsh.yaml
@@ -26,6 +26,9 @@ links:
 references:
  doi: "10.1093/bioinformatics/bty560"
 license: MIT
+authors:
+  - __merge__: /src/_authors/robrecht_cannoodt.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    description: |
--- a/src/featurecounts/config.vsh.yaml
+++ b/src/featurecounts/config.vsh.yaml
@@ -11,7 +11,9 @@ references:
 license: GPL-3.0 
 requirements:
  commands: [ featureCounts ]
-
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/gffread/config.vsh.yaml
+++ b/src/gffread/config.vsh.yaml
@@ -8,8 +8,9 @@ links:
 references: 
  doi: 10.12688/f1000research.23297.2
 license: MIT
-requirements:
-  commands: [ gffread ]
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
@@ -52,7 +53,7 @@ argument_groups:
        required: true
        description: |
          Write the output records into <outfile>.
-        default: output.gff
+        example: output.gff
      - name: --force_exons
        type: boolean_true
        description: |
@@ -154,7 +155,6 @@ argument_groups:
      - name: --table
        type: string
        multiple: true
-        multiple_sep: ","
        description: |
          Output a simple tab delimited format instead of GFF, with columns having the values 
          of GFF attributes given in <attrlist>; special pseudo-attributes (prefixed by @) are 
--- a/src/gffread/script.sh
+++ b/src/gffread/script.sh
@@ -50,6 +50,8 @@
 [[ "$par_expose_dups" == "false" ]] && unset par_expose_dups
 [[ "$par_cluster_only" == "false" ]] && unset par_cluster_only

+# if par_table is not empty, replace ";" with ","
+par_table=$(echo "$par_table" | tr ';' ',')

 $(which gffread) \
    "$par_input" \
--- a/src/gffread/test.sh
+++ b/src/gffread/test.sh
@@ -86,7 +86,7 @@ diff "$expected_output_dir/transcripts.fa" "$test_output_dir/transcripts.fa" ||
 echo "> Test 4 - Generate table from GFF annotation file"

 "$meta_executable" \
-  --table @id,@chr,@start,@end,@strand,@exons,Name,gene,product \
+  --table "@id;@chr;@start;@end;@strand;@exons;Name;gene;product" \
  --outfile "$test_output_dir/annotation.tbl" \
  --input "$test_dir/sequence.gff3"

--- a/src/lofreq/call/config.vsh.yaml
+++ b/src/lofreq/call/config.vsh.yaml
@@ -17,6 +17,9 @@ references:
 license: "MIT"
 requirements:
  commands: [ lofreq ]
+authors:
+  - __merge__: /src/_authors/kai_waldrant.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/lofreq/indelqual/config.vsh.yaml
+++ b/src/lofreq/indelqual/config.vsh.yaml
@@ -18,6 +18,9 @@ references:
 license: "MIT"
 requirements:
  commands: [ lofreq ]
+authors:
+  - __merge__: /src/_authors/kai_waldrant.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/multiqc/config.vsh.yaml
+++ b/src/multiqc/config.vsh.yaml
@@ -11,7 +11,9 @@ info:
  references:
    doi: 10.1093/bioinformatics/btw354
  licence: GPL v3 or later
-
+authors:
+  - __merge__: /src/_authors/dorien_roosen.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: "Input"
    arguments:
@@ -54,25 +56,21 @@ argument_groups:
      - name: "--include_modules"
        type: string
        multiple: true
-        multiple_sep: ","
-        example: fastqc,cutadapt
+        example: [fastqc, cutadapt]
        description: Use only these module
      - name: "--exclude_modules"
        type: string
        multiple: true
-        multiple_sep: ","
-        example: fastqc,cutadapt
+        example: [fastqc, cutadapt]
        description: Do not use only these modules
      - name: "--ignore_analysis"
        type: string
        multiple: true
-        multiple_sep: ","
-        example: run_one/*,run_two/*
+        example: [run_one/*, run_two/*]
      - name: "--ignore_samples"
        type: string
        multiple: true
-        multiple_sep: ","
-        example: sample_1*,sample_3*
+        example: [sample_1*, sample_3*]
      - name: "--ignore_symlinks"
        type: boolean_true
        description: Ignore symlinked directories and files
--- a/src/multiqc/script.sh
+++ b/src/multiqc/script.sh
@@ -38,7 +38,7 @@ IFS=";" read -ra inputs <<< $par_input

 if [[ -n "$par_include_modules" ]]; then
    include_modules=""
-    IFS="," read -ra incl_modules <<< $par_include_modules
+    IFS=";" read -ra incl_modules <<< $par_include_modules
    for i in "${incl_modules[@]}"; do
        include_modules+="--include $i "
    done
@@ -47,7 +47,7 @@ fi

 if [[ -n "$par_exclude_modules" ]]; then
    exclude_modules=""
-    IFS="," read -ra excl_modules <<< $par_exclude_modules
+    IFS=";" read -ra excl_modules <<< $par_exclude_modules
    for i in "${excl_modules[@]}"; do
        exclude_modules+="--exclude $i"
    done
@@ -56,7 +56,7 @@ fi

 if [[ -n "$par_ignore_analysis" ]]; then
    ignore=""
-    IFS="," read -ra ignore_analysis <<< $par_ignore_analysis
+    IFS=";" read -ra ignore_analysis <<< $par_ignore_analysis
    for i in "${ignore_analysis[@]}"; do
        ignore+="--ignore $i "
    done
@@ -65,7 +65,7 @@ fi

 if [[ -n "$par_ignore_samples" ]]; then
    ignore_samples=""
-    IFS="," read -ra ign_samples <<< $par_ignore_samples
+    IFS=";" read -ra ign_samples <<< $par_ignore_samples
    for i in "${ign_samples[@]}"; do
        ignore_samples+="--ignore-samples $i"
    done
--- a/src/pear/config.vsh.yaml
+++ b/src/pear/config.vsh.yaml
@@ -12,7 +12,10 @@ references:
  doi: 10.1093/bioinformatics/btt593
 license: "CC-BY-NC-SA-3.0"
 requirements:
-  commands: [ pear , gzip ]
+  commands: [ pear, gzip ]
+authors:
+  - __merge__: /src/_authors/kai_waldrant.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/salmon/salmon_index/config.vsh.yaml
+++ b/src/salmon/salmon_index/config.vsh.yaml
@@ -12,14 +12,16 @@ references:
 license: GPL-3.0 
 requirements:
  commands: [ salmon ]
-
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
      - name: --genome
        type: file
        description: | 
-          Genome of the organism to prepare the set of decoy sequences. Required to build decoy-aware transccriptome.
+          Genome of the organism to prepare the set of decoy sequences. Required to build decoy-aware transcriptome.
        required: false
        example: genome.fasta
      - name: --transcripts
@@ -110,4 +112,4 @@ engines:
          salmon index -v 2>&1 | sed 's/salmon \([0-9.]*\)/salmon: \1/' > /var/software_versions.txt
 runners:
  - type: executable
-  - type: nextflow
+  - type: nextflow
--- a/src/salmon/salmon_quant/config.vsh.yaml
+++ b/src/salmon/salmon_quant/config.vsh.yaml
@@ -12,7 +12,9 @@ references:
 license: GPL-3.0 
 requirements:
  commands: [ salmon ]
-
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Common input options
    arguments:
--- a/src/samtools/samtools_collate/config.vsh.yaml
+++ b/src/samtools/samtools_collate/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_faidx/config.vsh.yaml
+++ b/src/samtools/samtools_faidx/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_fasta/config.vsh.yaml
+++ b/src/samtools/samtools_fasta/config.vsh.yaml
@@ -0,0 +1,193 @@
+name: samtools_fasta
+namespace: samtools
+description: Converts a SAM, BAM or CRAM to FASTA format.
+keywords: [fasta, bam, sam, cram]
+links:
+  homepage: https://www.htslib.org/
+  documentation: https://www.htslib.org/doc/samtools-fasta.html
+  repository: https://github.com/samtools/samtools
+references: 
+  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
+license: MIT/Expat
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        type: file
+        description: input SAM/BAM/CRAM file
+        required: true
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        description: output FASTA file
+        required: true
+        direction: output
+  - name: Options
+    arguments:
+      - name: --no_suffix
+        alternatives: -n
+        type: boolean_true
+        description: |
+          By default, either '/1' or '/2' is added to the end of read names where the corresponding 
+          READ1 or READ2 FLAG bit is set. Using -n causes read names to be left as they are.
+      - name: --suffix
+        alternatives: -N
+        type: boolean_true
+        description: |
+          Always add either '/1' or '/2' to the end of read names even when put into different files.
+      - name: --use_oq
+        alternatives: -O
+        type: boolean_true
+        description: |
+          Use quality values from OQ tags in preference to standard quality string if available.
+      - name: --singleton
+        alternatives: -s
+        type: file
+        description: write singleton reads to FILE.
+      - name: --copy_tags
+        alternatives: -t
+        type: boolean_true
+        description: |
+          Copy RG, BC and QT tags to the FASTA header line, if they exist.
+      - name: --copy_tags_list
+        alternatives: -T
+        type: string
+        description: |
+          Specify a comma-separated list of tags to copy to the FASTA header line, if they exist. 
+          TAGLIST can be blank or `*` to indicate all tags should be copied to the output. If using `*`, 
+          be careful to quote it to avoid unwanted shell expansion.
+      - name: --read1
+        alternatives: -1
+        type: file
+        description: |
+          Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of outputting them. 
+          If the -s option is used, only paired reads will be written to this file.
+        direction: output
+      - name: --read2
+        alternatives: -2
+        type: file
+        description: |
+          Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of outputting them. 
+          If the -s option is used, only paired reads will be written to this file.
+        direction: output
+      - name: --output_reads
+        alternatives: -o
+        type: file
+        description: |
+          Write reads with either READ1 FLAG or READ2 flag set to FILE instead of outputting them to stdout. 
+          This is equivalent to -1 FILE -2 FILE.
+        direction: output
+      - name: --output_reads_both
+        alternatives: -0
+        type: file
+        description: |
+          Write reads where the READ1 and READ2 FLAG bits set are either both set or both unset to FILE 
+          instead of outputting them.
+        direction: output
+      - name: --filter_flags
+        alternatives: -f
+        type: integer
+        description: |
+          Only output alignments with all bits set in INT present in the FLAG field. INT can be specified 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' 
+          (i.e. /^0[0-7]+/). Default: `0`.
+        example: 0
+      - name: --excl_flags
+        alternatives: -F
+        type: string
+        description: |
+          Do not output alignments with any bits set in INT present in the FLAG field. INT can be specified 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0'
+          (i.e. /^0[0-7]+/). This defaults to 0x900 representing filtering of secondary and 
+          supplementary alignments. Default: `0x900`.
+        example: "0x900"
+      - name: --incl_flags
+        alternatives: --rf
+        type: string
+        description: |
+          Only output alignments with any bits set in INT present in the FLAG field. INT can be specified 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with '0'
+          (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of 
+          flag names. Default: `0`.
+        example: 0
+      - name: --excl_flags_all
+        alternatives: -G
+        type: integer
+        description: |
+          Only EXCLUDE reads with all of the bits set in INT present in the FLAG field. INT can be specified 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' (i.e. /^0[0-7]+/).
+          Default: `0`.
+        example: 0
+      - name: --aux_tag
+        alternatives: -d
+        type: string
+        description: |
+          Only output alignments containing an auxiliary tag matching both TAG and VAL. If VAL is omitted 
+          then any value is accepted. The tag types supported are i, f, Z, A and H. "B" arrays are not 
+          supported. This is comparable to the method used in samtools view --tag. The option may be specified 
+          multiple times and is equivalent to using the --aux_tag_file option.
+      - name: --aux_tag_file
+        alternatives: -D
+        type: string
+        description: |
+          Only output alignments containing an auxiliary tag matching TAG and having a value listed in FILE. 
+          The format of the file is one line per value. This is equivalent to specifying --aux_tag multiple times.
+      - name: --casava
+        alternatives: -i
+        type: boolean_true
+        description: add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
+      - name: --compression
+        alternatives: -c
+        type: integer
+        description: set compression level when writing gz or bgzf fasta files.
+        example: 0
+      - name: --index1
+        alternatives: --i1
+        type: file
+        description: write first index reads to FILE.
+      - name: --index2
+        alternatives: --i2
+        type: file
+        description: write second index reads to FILE.
+      - name: --barcode_tag
+        type: string
+        description: |
+          Auxiliary tag to find index reads in. Default: `BC`.
+        example: "BC"
+      - name: --quality_tag
+        type: string
+        description: |
+          Auxiliary tag to find index quality in. Default: `QT`.
+        example: "QT"
+      - name: --index_format
+        type: string
+        description: |
+          string to describe how to parse the barcode and quality tags. For example:
+          * `i14i8`: the first 14 characters are index 1, the next 8 characters are index 2.
+          * `n8i14`: ignore the first 8 characters, and use the next 14 characters for index 1.
+          If the tag contains a separator, then the numeric part can be replaced with`*` to mean 
+          'read until the separator or end of tag', for example: `n*i*`.
+
+resources:
+  - type: bash_script
+    path: ../samtools_fastq/script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
+    setup:
+      - type: docker
+        run: |
+          samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | \
+          sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt
+runners:
+- type: executable
+- type: nextflow
--- a/src/samtools/samtools_fasta/help.txt
+++ b/src/samtools/samtools_fasta/help.txt
@@ -0,0 +1,80 @@
+```
+samtools fastq
+```
+
+Usage: samtools fastq [options...] <in.bam>
+
+Description:
+Converts a SAM, BAM or CRAM to FASTQ format.
+
+Options:
+  -0 FILE      write reads designated READ_OTHER to FILE
+  -1 FILE      write reads designated READ1 to FILE
+  -2 FILE      write reads designated READ2 to FILE
+  -o FILE      write reads designated READ1 or READ2 to FILE
+               note: if a singleton file is specified with -s, only
+               paired reads will be written to the -1 and -2 files.
+  -d, --tag TAG[:VAL]
+               only include reads containing TAG, optionally with value VAL
+  -f, --require-flags INT
+               only include reads with all  of the FLAGs in INT present [0]
+  -F, --excl[ude]-flags INT
+               only include reads with none of the FLAGs in INT present [0x900]
+      --rf, --incl[ude]-flags INT
+               only include reads with any  of the FLAGs in INT present [0]
+  -G INT       only EXCLUDE reads with all  of the FLAGs in INT present [0]
+  -n           don't append /1 and /2 to the read name
+  -N           always append /1 and /2 to the read name
+  -O           output quality in the OQ tag if present
+  -s FILE      write singleton reads designated READ1 or READ2 to FILE
+  -t           copy RG, BC and QT tags to the FASTQ header line
+  -T TAGLIST   copy arbitrary tags to the FASTQ header line, '*' for all
+  -v INT       default quality score if not given in file [1]
+  -i           add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
+  -c INT       compression level [0..9] to use when writing bgzf files [1]
+  --i1 FILE    write first index reads to FILE
+  --i2 FILE    write second index reads to FILE
+  --barcode-tag TAG
+               Barcode tag [BC]
+  --quality-tag TAG
+               Quality tag [QT]
+  --index-format STR
+               How to parse barcode and quality tags
+
+      --input-fmt-option OPT[=VAL]
+               Specify a single input file format option in the form
+               of OPTION or OPTION=VALUE
+      --reference FILE
+               Reference sequence FASTA FILE [null]
+  -@, --threads INT
+               Number of additional threads to use [0]
+      --verbosity INT
+               Set level of verbosity
+
+The files will be automatically compressed if the file names have a .gz
+or .bgzf extension.  The input to this program must be collated by name.
+Run 'samtools collate' or 'samtools sort -n' to achieve this.
+
+Reads are designated READ1 if FLAG READ1 is set and READ2 is not set.
+Reads are designated READ2 if FLAG READ1 is not set and READ2 is set.
+Otherwise reads are designated READ_OTHER (both flags set or both flags unset).
+Run 'samtools flags' for more information on flag codes and meanings.
+
+The index-format string describes how to parse the barcode and quality tags.
+It is made up of 'i' or 'n' followed by a length or '*'.  For example:
+   i14i8       The first 14 characters are index 1, the next 8 are index 2
+   n8i14       Ignore the first 8 characters, and use the next 14 for index 1
+
+If the tag contains a separator, then the numeric part can be replaced with
+'*' to mean 'read until the separator or end of tag', for example:
+   i*i*        Break the tag at the separator into index 1 and index 2
+   n*i*        Ignore the left part of the tag until the separator,
+               then use the second part of the tag as index 1
+
+Examples:
+To get just the paired reads in separate files, use:
+   samtools fastq -1 pair1.fq -2 pair2.fq -0 /dev/null -s /dev/null -n in.bam
+
+To get all non-supplementary/secondary reads in a single file, redirect
+the output:
+   samtools fastq in.bam > all_reads.fq
--- a/src/samtools/samtools_fasta/test.sh
+++ b/src/samtools/samtools_fasta/test.sh
@@ -0,0 +1,96 @@
+#!/bin/bash
+
+test_dir="${meta_resources_dir}/test_data"
+out_dir="${meta_resources_dir}/out_data"
+
+############################################################################################
+
+echo ">>> Test 1: Convert all reads from a bam file to fasta format"
+"$meta_executable" \
+  --input "$test_dir/a.bam" \
+  --output "$out_dir/a.fa"
+
+echo ">>> Check if output file exists"
+[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1
+
+echo ">>> Check if output is empty"
+[ ! -s "$out_dir/a.fa" ] && echo "Output file a.fa is empty" && exit 1
+
+echo ">>> Check if output matches expected output"
+diff "$out_dir/a.fa" "$test_dir/a.fa" || 
+  (echo "Output file a.fa does not match expected output" && exit 1)
+
+rm "$out_dir/a.fa"
+
+############################################################################################
+
+echo ">>> Test 2: Convert all reads from a sam file to fasta format"
+"$meta_executable" \
+  --input "$test_dir/a.sam" \
+  --output "$out_dir/a.fa"
+
+echo ">>> Check if output file exists"
+[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1
+
+echo ">>> Check if output is empty"
+[ ! -s "$out_dir/a.fa" ] && echo "Output file a.fa is empty" && exit 1
+
+echo ">>> Check if output matches expected output"
+diff "$out_dir/a.fa" "$test_dir/a.fa" || 
+  (echo "Output file a.fa does not match expected output" && exit 1)
+
+rm "$out_dir/a.fa"
+
+############################################################################################
+
+echo ">>> Test 3: Output reads from bam file to separate files"
+
+"$meta_executable" \
+  --input "$test_dir/a.bam" \
+  --read1 "$out_dir/a.1.fa" \
+  --read2 "$out_dir/a.2.fa" \
+  --output "$out_dir/a.fa"
+
+echo ">>> Check if output files exist"
+[ ! -f "$out_dir/a.1.fa" ] && echo "Output file a.1.fa does not exist" && exit 1
+[ ! -f "$out_dir/a.2.fa" ] && echo "Output file a.2.fa does not exist" && exit 1
+[ ! -f "$out_dir/a.fa" ] && echo "Output file a.fa does not exist" && exit 1
+
+echo ">>> Check if output files are empty"
+[ ! -s "$out_dir/a.1.fa" ] && echo "Output file a.1.fa is empty" && exit 1
+[ ! -s "$out_dir/a.2.fa" ] && echo "Output file a.2.fa is empty" && exit 1
+# output should be empty since input has no singleton reads
+
+echo ">>> Check if output files match expected output"
+diff "$out_dir/a.1.fa" "$test_dir/a.1.fa" || 
+  (echo "Output file a.1.fa does not match expected output" && exit 1)
+diff "$out_dir/a.2.fa" "$test_dir/a.2.fa" ||
+  (echo "Output file a.2.fa does not match expected output" && exit 1)
+
+rm "$out_dir/a.1.fa" "$out_dir/a.2.fa" "$out_dir/a.fa"
+
+############################################################################################
+
+echo ">>> Test 4: Output only forward reads from bam file to fasta format"
+
+"$meta_executable" \
+  --input "$test_dir/a.sam" \
+  --excl_flags "0x80" \
+  --output "$out_dir/half.fa"
+
+echo ">>> Check if output file exists"
+[ ! -f "$out_dir/half.fa" ] && echo "Output file half.fa does not exist" && exit 1
+
+echo ">>> Check if output is empty"
+[ ! -s "$out_dir/half.fa" ] && echo "Output file half.fa is empty" && exit 1
+
+echo ">>> Check if output matches expected output"
+diff "$out_dir/half.fa" "$test_dir/half.fa" || 
+  (echo "Output file half.fa does not match expected output" && exit 1)
+
+rm "$out_dir/half.fa"
+
+############################################################################################
+
+echo "All tests succeeded!"
+exit 0
--- a/src/samtools/samtools_fasta/test_data/a.1.fa
+++ b/src/samtools/samtools_fasta/test_data/a.1.fa
@@ -0,0 +1,6 @@
+>a1
+AAAAAAAAAA
+>b1
+AAAAAAAAAA
+>c1
+AAAAAAAAAA
--- a/src/samtools/samtools_fasta/test_data/a.2.fa
+++ b/src/samtools/samtools_fasta/test_data/a.2.fa
@@ -0,0 +1,6 @@
+>a1
+AAAAAAAAAA
+>b1
+AAAAAAAAAA
+>c1
+AAAAAAAAAA
--- a/src/samtools/samtools_fasta/test_data/a.bam
+++ b/src/samtools/samtools_fasta/test_data/a.bam
--- a/src/samtools/samtools_fasta/test_data/a.fa
+++ b/src/samtools/samtools_fasta/test_data/a.fa
@@ -0,0 +1,12 @@
+>a1/1
+AAAAAAAAAA
+>b1/1
+AAAAAAAAAA
+>c1/1
+AAAAAAAAAA
+>a1/2
+AAAAAAAAAA
+>b1/2
+AAAAAAAAAA
+>c1/2
+AAAAAAAAAA
--- a/src/samtools/samtools_fasta/test_data/a.sam
+++ b/src/samtools/samtools_fasta/test_data/a.sam
@@ -0,0 +1,7 @@
+@SQ	SN:xx	LN:20
+a1	99	xx	1	1	10M	=	11	20	AAAAAAAAAA	**********
+b1	99	xx	1	1	10M	=	11	20	AAAAAAAAAA	**********
+c1	99	xx	1	1	10M	=	11	20	AAAAAAAAAA	**********
+a1	147	xx	11	1	10M	=	1	-20	TTTTTTTTTT	**********
+b1	147	xx	11	1	10M	=	1	-20	TTTTTTTTTT	**********
+c1	147	xx	11	1	10M	=	1	-20	TTTTTTTTTT	**********
--- a/src/samtools/samtools_fasta/test_data/half.fa
+++ b/src/samtools/samtools_fasta/test_data/half.fa
@@ -0,0 +1,6 @@
+>a1/1
+AAAAAAAAAA
+>b1/1
+AAAAAAAAAA
+>c1/1
+AAAAAAAAAA
--- a/src/samtools/samtools_fasta/test_data/script.sh
+++ b/src/samtools/samtools_fasta/test_data/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+# dowload test data from snakemake wrapper
+if [ ! -d /tmp/fastq_source ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers.git /tmp/fastq_source
+fi
+
+cp -r /tmp/fastq_source/bio/samtools/fastx/test/*.sam src/samtools/samtools_fastq/test_data/
+cp -r /tmp/fastq_source/bio/samtools/fastq/interleaved/test/mapped/*.bam src/samtools/samtools_fastq/test_data/
+cp -r /tmp/fastq_source/bio/samtools/fastq/interleaved/test/reads/*.fq src/samtools/samtools_fastq/test_data/
+cp -r /tmp/fastq_source/bio/samtools/fastq/separate/test/reads/*.fq src/samtools/samtools_fastq/test_data/
--- a/src/samtools/samtools_fastq/config.vsh.yaml
+++ b/src/samtools/samtools_fastq/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
@@ -56,7 +58,7 @@ argument_groups:
        type: string
        description: |
          Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist. 
-          TAGLIST can be blank or * to indicate all tags should be copied to the output. If using *, 
+          TAGLIST can be blank or `*` to indicate all tags should be copied to the output. If using `*`, 
          be careful to quote it to avoid unwanted shell expansion.
      - name: --read1
        alternatives: -1
@@ -91,35 +93,35 @@ argument_groups:
        type: integer
        description: |
          Only output alignments with all bits set in INT present in the FLAG field. INT can be specified 
-          in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' 
-          (i.e. /^0[0-7]+/).
-        default: 0
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0'
+          (i.e. /^0[0-7]+/). Default: `0`.
+        example: 0
      - name: --excl_flags
        alternatives: -F
        type: string
        description: |
          Do not output alignments with any bits set in INT present in the FLAG field. INT can be specified 
-          in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' 
          (i.e. /^0[0-7]+/). This defaults to 0x900 representing filtering of secondary and 
-          supplementary alignments.
-        default: 0x900
+          supplementary alignments. Default: `0x900`.
+        example: "0x900"
      - name: --incl_flags
        alternatives: --rf
        type: string
        description: |
          Only output alignments with any bits set in INT present in the FLAG field. INT can be specified 
-          in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' 
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with '0'
          (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of 
-          flag names.
-        default: 0
+          flag names. Default: `0`.
+        example: 0
      - name: --excl_flags_all
        alternatives: -G
        type: integer
        description: |
          Only EXCLUDE reads with all of the bits set in INT present in the FLAG field. INT can be specified 
-          in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' 
-          (i.e. /^0[0-7]+/).
-        default: 0
+          in hex by beginning with '0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with '0' (i.e. /^0[0-7]+/).
+          Default: `0`.
+        example: 0
      - name: --aux_tag
        alternatives: -d
        type: string
@@ -137,12 +139,13 @@ argument_groups:
      - name: --casava
        alternatives: -i
        type: boolean_true
-        description: add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
+        description: |
+          Add Illumina Casava 1.8 format entry to header, for example: `1:N:0:ATCACG`.
      - name: --compression
        alternatives: -c
        type: integer
        description: set compression level when writing gz or bgzf fastq files.
-        default: 0
+        example: 0
      - name: --index1
        alternatives: --i1
        type: file
@@ -153,20 +156,22 @@ argument_groups:
        description: write second index reads to FILE.
      - name: --barcode_tag
        type: string
-        description: Auxiliary tag to find index reads in.
-        default: BC
+        description: |
+          Auxiliary tag to find index reads in. Default: `BC`.
+        example: "BC"
      - name: --quality_tag
        type: string
-        description: Auxiliary tag to find index quality in.
-        default: QT
+        description: |
+          Auxiliary tag to find index quality in. Default: `QT`.
+        example: QT
      - name: --index_format
        type: string
        description: |
          string to describe how to parse the barcode and quality tags. For example:
-          [i14i8]: the first 14 characters are index 1, the next 8 characters are index 2.
-          [n8i14]: ignore the first 8 characters, and use the next 14 characters for index 1.
+          * `i14i8`: the first 14 characters are index 1, the next 8 characters are index 2.
+          * `n8i14`: ignore the first 8 characters, and use the next 14 characters for index 1.
          If the tag contains a separator, then the numeric part can be replaced with '*' to mean 
-          'read until the separator or end of tag', for example: [n*i*].
+          'read until the separator or end of tag', for example: `n*i*`.

 resources:
  - type: bash_script
--- a/src/samtools/samtools_fastq/script.sh
+++ b/src/samtools/samtools_fastq/script.sh
@@ -11,7 +11,14 @@ set -e
 [[ "$par_copy_tags" == "false" ]] && unset par_copy_tags
 [[ "$par_casava" == "false" ]] && unset par_casava

-samtools fastq \
+if [[ "$meta_name" == "samtools_fasta" ]]; then
+  subcommand=fasta
+elif [[ "$meta_name" == "samtools_fastq" ]]; then
+  subcommand=fastq
+else
+  echo "Unrecognized component name" && exit 1
+fi
+samtools "$subcommand" \
    ${par_no_suffix:+-n} \
    ${par_suffix:+-N} \
    ${par_use_oq:+-O} \
--- a/src/samtools/samtools_flagstat/config.vsh.yaml
+++ b/src/samtools/samtools_flagstat/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_idxstats/config.vsh.yaml
+++ b/src/samtools/samtools_idxstats/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_index/config.vsh.yaml
+++ b/src/samtools/samtools_index/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_sort/config.vsh.yaml
+++ b/src/samtools/samtools_sort/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/samtools/samtools_stats/config.vsh.yaml
+++ b/src/samtools/samtools_stats/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
@@ -30,10 +32,10 @@ argument_groups:
    - name: --coverage
      alternatives: -c
      type: integer
-      description: |
-        Coverage distribution min,max,step [1,1000,1].
      multiple: true
-      multiple_sep: ','
+      description: |
+        Coverage distribution min;max;step. Default: [1, 1000, 1].
+      example: [1, 1000, 1]
    - name: --remove_dups
      alternatives: -d
      type: boolean_true
@@ -48,25 +50,25 @@ argument_groups:
      alternatives: -f
      type: string
      description: |
-        Required flag, 0 for unset. See also `samtools flags`.
-      default: "0"
+        Required flag, 0 for unset. See also `samtools flags`. Default: `"0"`.
+      example: "0"
    - name: --filtering_flag
      alternatives: -F
      type: string
      description: |
-        Filtering flag, 0 for unset. See also `samtools flags`.
-      default: "0"
+        Filtering flag, 0 for unset. See also `samtools flags`. Default: `0`.
+      example: "0"
    - name: --GC_depth
      type: double
      description: |
-        The size of GC-depth bins (decreasing bin size increases memory requirement).
-      default: 20000.0
+        The size of GC-depth bins (decreasing bin size increases memory requirement). Default: `20000`.
+      example: 20000.0
    - name: --insert_size
      alternatives: -i
      type: integer
      description: |
-        Maximum insert size.
-      default: 8000
+        Maximum insert size. Default: `8000`.
+      example: 8000
    - name: --id
      alternatives: -I
      type: string
@@ -76,14 +78,14 @@ argument_groups:
      alternatives: -l
      type: integer
      description: |
-        Include in the statistics only reads with the given read length.
-      default: -1
+        Include in the statistics only reads with the given read length. Default: `-1`.
+      example: -1
    - name: --most_inserts
      alternatives: -m
      type: double
      description: |
-        Report only the main part of inserts.
-      default: 0.99
+        Report only the main part of inserts. Default: `0.99`.
+      example: 0.99
    - name: --split_prefix
      alternatives: -P
      type: string
@@ -93,8 +95,8 @@ argument_groups:
      alternatives: -q
      type: integer
      description: |
-        The BWA trimming parameter.
-      default: 0
+        The BWA trimming parameter. Default: `0`.
+      example: 0
    - name: --ref_seq
      alternatives: -r
      type: file
@@ -124,8 +126,8 @@ argument_groups:
      alternatives: -g
      type: integer
      description: |
-        Only bases with coverage above this value will be included in the target percentage computation.
-      default: 0
+        Only bases with coverage above this value will be included in the target percentage computation. Default: `0`.
+      example: 0
    - name: --input_fmt_option
      type: string
      description: |
@@ -141,7 +143,7 @@ argument_groups:
      type: file
      description: |
        Output file.
-      default: "out.txt"
+      example: "out.txt"
      required: true
      direction: output

--- a/src/samtools/samtools_stats/script.sh
+++ b/src/samtools/samtools_stats/script.sh
@@ -10,6 +10,9 @@ set -e
 [[ "$par_sparse" == "false" ]] && unset par_sparse
 [[ "$par_remove_overlaps" == "false" ]] && unset par_remove_overlaps

+# change the coverage input from X;X;X to X,X,X
+par_coverage=$(echo "$par_coverage" | tr ';' ',')
+
 samtools stats \
    ${par_coverage:+-c "$par_coverage"} \
    ${par_remove_dups:+-d} \
--- a/src/samtools/samtools_stats/test.sh
+++ b/src/samtools/samtools_stats/test.sh
@@ -17,7 +17,7 @@ echo ">>> Checking whether output is non-empty"
 [ ! -s "$test_dir/test.paired_end.sorted.txt" ] && echo "File 'test.paired_end.sorted.txt' is empty!" && exit 1

 echo ">>> Checking whether output is correct"
-# compare using diff,  ignoring the line stating the command that was passed.
+# compare using diff, ignoring the line stating the command that was passed.
 diff <(grep -v "^# The command" "$test_dir/test.paired_end.sorted.txt") \
    <(grep -v "^# The command" "$test_dir/ref.paired_end.sorted.txt") || \
    (echo "Output file ref.paired_end.sorted.txt does not match expected output" && exit 1)
--- a/src/samtools/samtools_view/config.vsh.yaml
+++ b/src/samtools/samtools_view/config.vsh.yaml
@@ -9,7 +9,9 @@ links:
 references: 
  doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
 license: MIT/Expat
-
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
 argument_groups:
  - name: Inputs
    arguments:
--- a/src/seqtk/seqtk_sample/config.vsh.yaml
+++ b/src/seqtk/seqtk_sample/config.vsh.yaml
@@ -0,0 +1,57 @@
+name: seqtk_sample
+namespace: seqtk
+description: Subsamples sequences from FASTA/Q files.
+keywords: [sample, FASTA, FASTQ]
+links:
+  repository: https://github.com/lh3/seqtk/tree/v1.4
+license: MIT
+authors:
+  - __merge__: /src/_authors/jakub_majercik.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        type: file
+        description: The input FASTA/Q file.
+        required: true
+
+  - name: Outputs
+    arguments:
+      - name: --output
+        type: file
+        description: The output FASTA/Q file.
+        required: true
+        direction: output
+
+  - name: Options
+    arguments:
+      - name: --seed
+        type: integer
+        description: Seed for random generator.
+        example: 42
+      - name: --fraction_number
+        type: double
+        description: Fraction or number of sequences to sample.
+        required: true
+        example: 0.1
+      - name: --two_pass_mode
+        type: boolean_true
+        description: Twice as slow but with much reduced memory
+
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: ../test_data
+
+engines:
+  - type: docker
+    image: quay.io/biocontainers/seqtk:1.4--he4a0461_2
+runners:
+  - type: executable
+  - type: nextflow
--- a/src/seqtk/seqtk_sample/help.txt
+++ b/src/seqtk/seqtk_sample/help.txt
@@ -0,0 +1,9 @@
+```
+seqtk_subseq
+```
+Usage:   seqtk subseq [options] <in.fa> <in.bed>|<name.list>
+Options:
+  -t       TAB delimited output
+  -s       strand aware
+  -l INT   sequence line length [0]
+Note: Use 'samtools faidx' if only a few regions are intended.
--- a/src/seqtk/seqtk_sample/script.sh
+++ b/src/seqtk/seqtk_sample/script.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+seqtk sample \
+    ${par_two_pass_mode:+-2} \
+    ${par_seed:+-s "$par_seed"} \
+    "$par_input" \
+    "$par_fraction_number" \
+    > "$par_output"
--- a/src/seqtk/seqtk_sample/test.sh
+++ b/src/seqtk/seqtk_sample/test.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+
+set -e
+
+## VIASH START
+meta_executable="target/executable/seqtk/seqtk_sample"
+meta_resources_dir="src/seqtk"
+## VIASH END
+
+#########################################################################################
+mkdir seqtk_sample_se
+cd seqtk_sample_se
+
+echo "> Run seqtk_sample on fastq SE"
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
+  --seed 42 \
+  --fraction_number 3 \
+  --output "sampled.fastq"
+
+echo ">> Check if output exists"
+if [ ! -f "sampled.fastq" ]; then
+    echo ">> sampled.fastq does not exist"
+    exit 1
+fi
+
+echo ">> Count number of samples"
+num_samples=$(grep -c '^@' sampled.fastq)
+if [ "$num_samples" -ne 3 ]; then
+    echo ">> sampled.fastq does not contain 3 samples"
+    exit 1
+fi
+
+#########################################################################################
+cd ..
+mkdir seqtk_sample_pe_number
+cd seqtk_sample_pe_number
+
+echo ">> Run seqtk_sample on fastq.gz PE with number of reads"
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
+  --seed 42 \
+  --fraction_number 3 \
+  --output "sampled_1.fastq"
+
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/reads/a.2.fastq.gz" \
+  --seed 42 \
+  --fraction_number 3 \
+  --output "sampled_2.fastq"
+
+echo ">> Check if output exists"
+if [ ! -f "sampled_1.fastq" ] || [ ! -f "sampled_2.fastq" ]; then
+    echo ">> One or both output files do not exist"
+    exit 1
+fi
+
+echo ">> Compare reads"
+# Extract headers
+headers1=$(grep '^@' sampled_1.fastq | sed -e's/ 1$//' | sort)
+headers2=$(grep '^@' sampled_2.fastq | sed -e 's/ 2$//' | sort)
+
+# Compare headers
+diff <(echo "$headers1") <(echo "$headers2") || { echo "Mismatch detected"; exit 1; }
+
+echo ">> Count number of samples"
+num_headers=$(echo "$headers1" | wc -l)
+if [ "$num_headers" -ne 3 ]; then
+    echo ">> sampled_1.fastq does not contain 3 headers"
+    exit 1
+fi
+
+#########################################################################################
+cd ..
+mkdir seqtk_sample_pe_fraction
+cd seqtk_sample_pe_fraction
+
+echo ">> Run seqtk_sample on fastq.gz PE with fraction of reads"
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
+  --seed 42 \
+  --fraction_number 0.5 \
+  --output "sampled_1.fastq"
+
+"$meta_executable" \
+  --input "$meta_resources_dir/test_data/reads/a.2.fastq.gz" \
+  --seed 42 \
+  --fraction_number 0.5 \
+  --output "sampled_2.fastq"
+
+echo ">> Check if output exists"
+if [ ! -f "sampled_1.fastq" ] || [ ! -f "sampled_2.fastq" ]; then
+    echo ">> One or both output files do not exist"
+    exit 1
+fi
+
+echo ">> Compare reads"
+# Extract headers
+headers1=$(grep '^@' sampled_1.fastq | sed -e's/ 1$//' | sort)
+headers2=$(grep '^@' sampled_2.fastq | sed -e 's/ 2$//' | sort)
+
+# Compare headers
+diff <(echo "$headers1") <(echo "$headers2") || { echo "Mismatch detected"; exit 1; }
+
--- a/src/seqtk/seqtk_subseq/config.vsh.yaml
+++ b/src/seqtk/seqtk_subseq/config.vsh.yaml
@@ -0,0 +1,78 @@
+name: seqtk_subseq
+namespace: seqtk
+description: | 
+  Extract subsequences from FASTA/Q files. Takes as input a FASTA/Q file and a name.lst (sequence ids file) or a reg.bed (genomic regions file).
+keywords: [subseq, FASTA, FASTQ]
+links:
+  repository: https://github.com/lh3/seqtk/tree/v1.4
+license: MIT
+authors:
+  - __merge__: /src/_authors/theodoro_gasperin.yaml
+    roles: [ author, maintainer ]
+
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: "--input"
+        type: file
+        direction: input
+        description: The input FASTA/Q file.
+        required: true
+        example: input.fa
+        
+      - name: "--name_list"
+        type: file
+        direction: input
+        description: | 
+          List of sequence names (name.lst) or genomic regions (reg.bed) to extract.
+        required: true
+        example: list.lst
+
+  - name: Outputs
+    arguments:
+      - name: "--output"
+        alternatives: -o
+        type: file
+        direction: output
+        description: The output FASTA/Q file.
+        required: true
+        default: output.fa
+
+  - name: Options
+    arguments:
+      - name: "--tab"
+        alternatives: -t
+        type: boolean_true
+        description: TAB delimited output.
+        
+      - name: "--strand_aware"
+        alternatives: -s
+        type: boolean_true
+        description: Strand aware.
+        
+      - name: "--sequence_line_length"
+        alternatives: -l
+        type: integer
+        description: | 
+          Sequence line length of input fasta file. Default: 0.
+        example: 0
+        
+
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+
+engines:
+  - type: docker
+    image: quay.io/biocontainers/seqtk:1.4--he4a0461_2
+    setup:
+      - type: docker
+        run: |
+          echo $(echo $(seqtk 2>&1) | sed -n 's/.*\(Version: [^ ]*\).*/\1/p') > /var/software_versions.txt
+
+runners:
+  - type: executable
+  - type: nextflow
--- a/src/seqtk/seqtk_subseq/help.txt
+++ b/src/seqtk/seqtk_subseq/help.txt
@@ -0,0 +1,9 @@
+```bash
+seqtk subseq
+```
+Usage:   seqtk subseq [options] <in.fa> <in.bed>|<name.list>
+Options:
+  -t       TAB delimited output
+  -s       strand aware
+  -l INT   sequence line length [0]
+Note: Use 'samtools faidx' if only a few regions are intended.
--- a/src/seqtk/seqtk_subseq/script.sh
+++ b/src/seqtk/seqtk_subseq/script.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+[[ "$par_tab" == "false" ]] && unset par_tab
+[[ "$par_strand_aware" == "false" ]] && unset par_strand_aware
+
+seqtk subseq \
+    ${par_tab:+-t} \
+    ${par_strand_aware:+-s} \
+    ${par_sequence_line_length:+-l "$par_sequence_line_length"} \
+    "$par_input" \
+    "$par_name_list" \
+    > "$par_output"
--- a/src/seqtk/seqtk_subseq/test.sh
+++ b/src/seqtk/seqtk_subseq/test.sh
@@ -0,0 +1,182 @@
+#!/bin/bash
+
+# exit on error
+set -e
+
+## VIASH START
+meta_executable="target/executable/seqtk/seqtk_subseq"
+meta_resources_dir="src/seqtk"
+## VIASH END
+
+# Create directories for tests
+echo "Creating Test Data..."
+mkdir test_data
+
+# Create and populate input.fasta
+cat > "test_data/input.fasta" <<EOL
+>KU562861.1
+GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCA
+AGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA
+>GU056837.1
+CTAATTTTATTTTTTTATAATAATTATTGGAGGAACTAAAACATTAATGAAATAATAATTATCATAATTA
+TTAATTACATATTTATTAGGTATAATATTTAAGGAAAAATATATTTTATGTTAATTGTAATAATTAGAAC
+>CP097510.1
+CGATTTAGATCGGTGTAGTCAACACACATCCTCCACTTCCATTAGGCTTCTTGACGAGGACTACATTGAC
+AGCCACCGAGGGAACCGACCTCCTCAATGAAGTCAGACGCCAAGAGCCTATCAACTTCCTTCTGCACAGC
+>JAMFTS010000002.1
+CCTAAACCCTAAACCCTAAACCCCCTACAAACCTTACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA
+ACCCGAAACCCTATACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCAAACCTAATCCCTAAACC
+>MH150936.1
+TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTC
+AAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG
+EOL
+
+# Update id.list with new entries
+cat > "test_data/id.list" <<EOL
+KU562861.1
+MH150936.1
+EOL
+
+# Create and populate reg.bed
+cat > "test_data/reg.bed" <<EOL
+KU562861.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")+$(echo -e "\n")
+MH150936.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")-
+EOL
+
+#########################################################################################
+# Run basic test
+mkdir test1
+cd test1
+
+echo "> Run seqtk_subseq on FASTA/Q file"
+"$meta_executable" \
+  --input "../test_data/input.fasta" \
+  --name_list "../test_data/id.list" \
+  --output "sub_sample.fq"
+
+expected_output_basic=">KU562861.1
+GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCAAGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA
+>MH150936.1
+TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTCAAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG"
+output_basic=$(cat sub_sample.fq)
+
+if [ "$output_basic" != "$expected_output_basic" ]; then
+  echo "Test failed"
+  echo "Expected:"
+  echo "$expected_output_basic"
+  echo "Got:"
+  echo "$output_basic"
+  exit 1
+fi
+
+#########################################################################################
+# Run reg.bed as name list input test
+cd ..
+mkdir test2
+cd test2
+
+echo "> Run seqtk_subseq on FASTA/Q file with BED file as name list"
+"$meta_executable" \
+  --input "../test_data/input.fasta" \
+  --name_list "../test_data/reg.bed" \
+  --output "sub_sample.fq"
+
+expected_output_basic=">KU562861.1:11-20
+AGTGTTCGAG
+>MH150936.1:11-20
+TGAAAACTTT"
+output_basic=$(cat sub_sample.fq)
+
+if [ "$output_basic" != "$expected_output_basic" ]; then
+  echo "Test failed"
+  echo "Expected:"
+  echo "$expected_output_basic"
+  echo "Got:"
+  echo "$output_basic"
+  exit 1
+fi
+
+#########################################################################################
+# Run tab option output test
+cd ..
+mkdir test3
+cd test3
+
+echo "> Run seqtk_subseq with TAB option"
+"$meta_executable" \
+  --tab \
+  --input "../test_data/input.fasta" \
+  --name_list "../test_data/reg.bed" \
+  --output "sub_sample.fq"
+
+expected_output_tabular=$'KU562861.1\t11\tAGTGTTCGAG\nMH150936.1\t11\tTGAAAACTTT'
+output_tabular=$(cat sub_sample.fq)
+
+if [ "$output_tabular" != "$expected_output_tabular" ]; then
+  echo "Test failed"
+  echo "Expected:"
+  echo "$expected_output_tabular"
+  echo "Got:"
+  echo "$output_tabular"
+  exit 1
+fi
+
+#########################################################################################
+# Run line option output test
+cd ..
+mkdir test4
+cd test4
+
+echo "> Run seqtk_subseq with line length option"
+"$meta_executable" \
+  --sequence_line_length 5 \
+  --input "../test_data/input.fasta" \
+  --name_list "../test_data/reg.bed" \
+  --output "sub_sample.fq"
+
+expected_output_wrapped=">KU562861.1:11-20
+AGTGT
+TCGAG
+>MH150936.1:11-20
+TGAAA
+ACTTT"
+output_wrapped=$(cat sub_sample.fq)
+
+if [ "$output_wrapped" != "$expected_output_wrapped" ]; then
+  echo "Test failed"
+  echo "Expected:"
+  echo "$expected_output_wrapped"
+  echo "Got:"
+  echo "$output_wrapped"
+  exit 1
+fi
+
+#########################################################################################
+# Run Strand Aware option output test
+cd ..
+mkdir test5
+cd test5
+
+echo "> Run seqtk_subseq with strand aware option"
+"$meta_executable" \
+  --strand_aware \
+  --input "../test_data/input.fasta" \
+  --name_list "../test_data/reg.bed" \
+  --output "sub_sample.fq"
+
+expected_output_wrapped=">KU562861.1:11-20
+AGTGTTCGAG
+>MH150936.1:11-20
+AAAGTTTTCA"
+output_wrapped=$(cat sub_sample.fq)
+
+if [ "$output_wrapped" != "$expected_output_wrapped" ]; then
+  echo "Test failed"
+  echo "Expected:"
+  echo "$expected_output_wrapped"
+  echo "Got:"
+  echo "$output_wrapped"
+  exit 1
+fi
+
+echo "All tests succeeded!"
--- a/src/seqtk/test_data/reads/a.1.fastq.gz
+++ b/src/seqtk/test_data/reads/a.1.fastq.gz
--- a/src/seqtk/test_data/reads/a.2.fastq.gz
+++ b/src/seqtk/test_data/reads/a.2.fastq.gz
--- a/src/seqtk/test_data/reads/a.fastq
+++ b/src/seqtk/test_data/reads/a.fastq
@@ -0,0 +1,4 @@
+@1
+ACGGCAT
+
+!!!!!!!
--- a/src/seqtk/test_data/reads/a.fastq.gz
+++ b/src/seqtk/test_data/reads/a.fastq.gz
--- a/src/seqtk/test_data/reads/id.list
+++ b/src/seqtk/test_data/reads/id.list
@@ -0,0 +1 @@
+1
--- a/src/seqtk/test_data/script.sh
+++ b/src/seqtk/test_data/script.sh
@@ -0,0 +1,9 @@
+# clone repo
+if [ ! -d /tmp/snakemake-wrappers ]; then
+  git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
+fi
+
+# copy test data
+cp -r /tmp/snakemake-wrappers/bio/seqtk/test/* src/seqtk/test_data
+
+rm src/seqtk/test_data/Snakefile
--- a/src/star/star_align_reads/argument_groups.yaml
+++ b/src/star/star_align_reads/argument_groups.yaml
--- a/src/star/star_align_reads/config.vsh.yaml
+++ b/src/star/star_align_reads/config.vsh.yaml
@@ -11,6 +11,11 @@ references:
 license: MIT
 requirements:
  commands: [ STAR, python, ps, zcat, bzcat ]
+authors:
+  - __merge__: /src/_authors/angela_o_pisco.yaml
+    roles: [ author ]
+  - __merge__: /src/_authors/robrecht_cannoodt.yaml
+    roles: [ author, maintainer ]
 # manually taking care of the main input and output arguments
 argument_groups:
  - name: Inputs
@@ -113,6 +118,8 @@ engines:
            rm -rf /tmp/STAR-${STAR_VERSION} /tmp/${STAR_VERSION}.zip && \
            apt-get --purge autoremove -y ${PACKAGES} && \
            apt-get clean
+      - type: python
+        packages: [ pyyaml ]
      - type: docker
        run: |
          STAR --version | sed 's#\(.*\)#star: "\1"#' > /var/software_versions.txt
--- a/src/star/star_align_reads/script.py
+++ b/src/star/star_align_reads/script.py
@@ -2,6 +2,7 @@ import tempfile
 import subprocess
 import shutil
 from pathlib import Path
+import yaml

 ## VIASH START
 par = {
@@ -18,10 +19,20 @@ par = {
 }
 meta = {
    "cpus": 8,
-    "temp_dir": "/tmp"
+    "temp_dir": "/tmp",
+    "config": "target/executable/star/star_align_reads/.config.vsh.yaml",
 }
 ## VIASH END

+# read config
+with open(meta["config"], 'r') as stream:
+    config = yaml.safe_load(stream)
+all_arguments = {
+    arg["name"].lstrip('-'): arg
+    for argument_group in config["argument_groups"]
+    for arg in argument_group["arguments"]
+}
+
 ##################################################
 # check and process SE / PE R1 input files
 input_r1 = par["input"]
@@ -87,8 +98,13 @@ with tempfile.TemporaryDirectory(prefix="star-", dir=meta["temp_dir"], ignore_cl
    cmd_args = [ "STAR" ]
    for name, value in par.items():
        if value is not None:
+            if name in all_arguments:
+                arg_info = all_arguments[name].get("info", {})
+                cli_name = arg_info.get("orig_name", f"--{name}")
+            else:
+                cli_name = f"--{name}"
            val_to_add = value if isinstance(value, list) else [value]
-            cmd_args.extend([f"--{name}"] + [str(x) for x in val_to_add])
+            cmd_args.extend([cli_name] + [str(x) for x in val_to_add])
    print("", flush=True)

    # run command
--- a/src/star/star_align_reads/test.sh
+++ b/src/star/star_align_reads/test.sh
@@ -7,35 +7,34 @@ meta_executable="target/docker/star/star_align_reads/star_align_reads"
 meta_resources_dir="src/star/star_align_reads"
 ## VIASH END

-#########################################################################################
-
+#############################################
 # helper functions
 assert_file_exists() {
-  [ -f "$1" ] || (echo "File '$1' does not exist" && exit 1)
+  [ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
 }
 assert_file_doesnt_exist() {
-  [ ! -f "$1" ] || (echo "File '$1' exists but shouldn't" && exit 1)
+  [ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
 }
 assert_file_empty() {
-  [ ! -s "$1" ] || (echo "File '$1' is not empty but should be" && exit 1)
+  [ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
 }
 assert_file_not_empty() {
-  [ -s "$1" ] || (echo "File '$1' is empty but shouldn't be" && exit 1)
+  [ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
 }
 assert_file_contains() {
-  grep -q "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
+  grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
 }
 assert_file_not_contains() {
-  grep -q "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
+  grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
 }
 assert_file_contains_regex() {
-  grep -q -E "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
+  grep -q -E "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
 }
 assert_file_not_contains_regex() {
-  grep -q -E "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
+  grep -q -E "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
 }
+#############################################

-#########################################################################################
 echo "> Prepare test data"

 cat > reads_R1.fastq <<'EOF'
@@ -89,14 +88,14 @@ cd star_align_reads_se
 echo "> Run star_align_reads on SE"
 "$meta_executable" \
  --input "../reads_R1.fastq" \
-  --genomeDir "../index/" \
+  --genome_dir "../index/" \
  --aligned_reads "output.sam" \
  --log "log.txt" \
-  --outReadsUnmapped "Fastx" \
+  --out_reads_unmapped "Fastx" \
  --unmapped "unmapped.sam" \
-  --quantMode "TranscriptomeSAM;GeneCounts" \
+  --quant_mode "TranscriptomeSAM;GeneCounts" \
  --reads_per_gene "reads_per_gene.tsv" \
-  --outSJtype Standard \
+  --out_sj_type Standard \
  --splice_junctions "splice_junctions.tsv" \
  --reads_aligned_to_transcriptome "transcriptome_aligned.bam" \
  ${meta_cpus:+---cpus $meta_cpus}
@@ -144,10 +143,10 @@ echo ">> Run star_align_reads on PE"
 "$meta_executable" \
  --input ../reads_R1.fastq \
  --input_r2 ../reads_R2.fastq \
-  --genomeDir ../index/ \
+  --genome_dir ../index/ \
  --aligned_reads output.bam \
  --log log.txt \
-  --outReadsUnmapped Fastx \
+  --out_reads_unmapped Fastx \
  --unmapped unmapped_r1.bam \
  --unmapped_r2 unmapped_r2.bam \
  ${meta_cpus:+---cpus $meta_cpus}
--- a/src/star/star_align_reads/utils/process_params.R
+++ b/src/star/star_align_reads/utils/process_params.R
@@ -14,6 +14,14 @@ param_txt <- iconv(param_txt, "UTF-8", "ASCII//TRANSLIT")
 dev_begin <- grep("#####UnderDevelopment_begin", param_txt)
 dev_end <- grep("#####UnderDevelopment_end", param_txt)

+camel_case_to_snake_case <- function(x) {
+  x %>%
+    str_replace_all("([A-Z][A-Z][A-Z]*)", "_\\1_") %>%
+    str_replace_all("([a-z])([A-Z])", "\\1_\\2") %>%
+    str_to_lower() %>%
+    str_replace_all("_$", "")
+}
+
 # strip development sections
 nondev_ix <- unlist(map2(c(1, dev_end + 1), c(dev_begin - 1, length(param_txt)), function(i, j) {
  if (i >= 1 && i < j) {
@@ -128,9 +136,8 @@ out2 <- out %>%
  # remove arguments that are related to a different runmode
  filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description)) %>%
  filter(!grepl("--runMode", group_name) | grepl("--runMode alignReads", group_name)) %>%
-  filter(!grepl("STARsolo", group_name)) %>%
  mutate(
-    viash_arg = paste0("--", name),
+    viash_arg = paste0("--", camel_case_to_snake_case(name)),
    type_step1 = type %>%
      str_replace_all(".*(int, string|string|int|real|double)\\(?(s?).*", "\\1\\2"),
    viash_type = type_map[gsub("(int, string|string|int|real|double).*", "\\1", type_step1)],
@@ -155,28 +162,41 @@ out2 <- out %>%
    group_name = gsub(" - .*", "", group_name),
    required = ifelse(name %in% required_args, TRUE, NA)
  )
-print(out2, n = 200)
-out2 %>% mutate(i = row_number()) %>% 
-  # filter(is.na(default_step1) != is.na(viash_default)) %>%
+
+# change references to argument names
+out3 <- out2
+for (i in seq_len(nrow(out2))) {
+  orig_name <- paste0("--", out2$name[[i]])
+  new_name <- out2$viash_arg[[i]]
+  out3$description <- str_replace_all(out3$description, orig_name, new_name)
+}
+
+# sanity checks
+out3 %>% select(name, viash_arg) %>% as.data.frame()
+print(out3, n = 200)
+out3 %>%
+  mutate(i = row_number()) %>%
  select(-group_name, -description)
+out3 %>% filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description))

-out2 %>% filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description))
-
-argument_groups <- map(unique(out2$group_name), function(group_name) {
-  args <- out2 %>%
+# create argument groups
+argument_groups <- map(unique(out3$group_name), function(group_name) {
+  args <- out3 %>%
    filter(group_name == !!group_name) %>%
-    pmap(function(viash_arg, viash_type, multiple, viash_default, description, required, ...) {
-      li <- lst(
+    pmap(function(viash_arg, viash_type, multiple, viash_default, description, required, name, ...) {
+      li <- list(
        name = viash_arg,
        type = viash_type,
-        description = description
+        description = description,
+        info = list(
+          orig_name = paste0("--", name)
+        )
      )
      if (all(!is.na(viash_default))) {
        li$example <- viash_default
      }
      if (!is.na(multiple) && multiple) {
        li$multiple <- multiple
-        li$multiple_sep <- ";"
      }
      if (!is.na(required) && required) {
        li$required <- required
@@ -186,4 +206,10 @@ argument_groups <- map(unique(out2$group_name), function(group_name) {
  list(name = group_name, arguments = args)
 })

-yaml::write_yaml(list(argument_groups = argument_groups), yaml_file)
+yaml::write_yaml(
+  list(argument_groups = argument_groups),
+  yaml_file,
+  handlers = list(
+    logical = yaml::verbatim_logical
+  )
+)
--- a/src/star/star_genome_generate/config.vsh.yaml
+++ b/src/star/star_genome_generate/config.vsh.yaml
@@ -11,75 +11,74 @@ references:
 license: MIT
 requirements:
  commands: [ STAR ]
-
+authors:
+  - __merge__: /src/_authors/sai_nirmayi_yasa.yaml
+    roles: [ author, maintainer ]
 argument_groups:
 - name: "Input"
  arguments: 
-  - name: "--genomeFastaFiles"
+  - name: "--genome_fasta_files"
    type: file
    description: |
      Path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped.
    required: true
-    multiple: yes
-    multiple_sep: ;  
-  - name: "--sjdbGTFfile"
+    multiple: true
+  - name: "--sjdb_gtf_file"
    type: file
    description: Path to the GTF file with annotations
-  - name: --sjdbOverhang
+  - name: --sjdb_overhang
    type: integer
    description: Length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)
    example: 100
-  - name: --sjdbGTFchrPrefix
+  - name: --sjdb_gtf_chr_prefix
    type: string
    description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes)
-  - name: --sjdbGTFfeatureExon
+  - name: --sjdb_gtf_feature_exon
    type: string
    description: Feature type in GTF file to be used as exons for building transcripts
    example: exon
-  - name: --sjdbGTFtagExonParentTranscript
+  - name: --sjdb_gtf_tag_exon_parent_transcript
    type: string
    description: GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files)
    example: transcript_id
-  - name: --sjdbGTFtagExonParentGene
+  - name: --sjdb_gtf_tag_exon_parent_gene
    type: string
    description: GTF attribute name for parent gene ID (default "gene_id" works for GTF files)
    example: gene_id
-  - name: --sjdbGTFtagExonParentGeneName
+  - name: --sjdb_gtf_tag_exon_parent_gene_name
    type: string
    description: GTF attribute name for parent gene name
    example: gene_name
-    multiple: yes
-    multiple_sep: ;
-  - name: --sjdbGTFtagExonParentGeneType
+    multiple: true
+  - name: --sjdb_gtf_tag_exon_parent_gene_type
    type: string
    description: GTF attribute name for parent gene type
    example:
    - gene_type
    - gene_biotype
-    multiple: yes
-    multiple_sep: ;
-  - name: --limitGenomeGenerateRAM
+    multiple: true
+  - name: --limit_genome_generate_ram
    type: long
    description: Maximum available RAM (bytes) for genome generation
-    example: '31000000000'
-  - name: --genomeSAindexNbases
+    example: 31000000000
+  - name: --genome_sa_index_nbases
    type: integer
    description: Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, this parameter must be scaled down to min(14, log2(GenomeLength)/2 - 1).
    example: 14
-  - name: --genomeChrBinNbits
+  - name: --genome_chr_bin_nbits
    type: integer
    description: Defined as log2(chrBin), where chrBin is the size of the bins for genome storage. Each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
    example: 18
-  - name: --genomeSAsparseD
+  - name: --genome_sa_sparse_d
    type: integer
    min: 0
    example: 1
    description: Suffux array sparsity, i.e. distance between indices. Use bigger numbers to decrease needed RAM at the cost of mapping speed reduction.
-  - name: --genomeSuffixLengthMax
+  - name: --genome_suffix_length_max
    type: integer
    description: Maximum length of the suffixes, has to be longer than read length. Use -1 for infinite length.
    example: -1
-  - name: --genomeTransformType   
+  - name: --genome_transform_type   
    type: string
    description: |
      Type of genome transformation
@@ -87,7 +86,7 @@ argument_groups:
        Haploid    ... replace reference alleles with alternative alleles from VCF file (e.g. consensus allele)
        Diploid    ... create two haplotypes for each chromosome listed in VCF file, for genotypes 1|2, assumes perfect phasing (e.g. personal genome)
    example: None
-  - name: --genomeTransformVCF
+  - name: --genome_transform_vcf
    type: file
    description: path to VCF file for genome transformation
  
--- a/src/star/star_genome_generate/script.sh
+++ b/src/star/star_genome_generate/script.sh
@@ -10,20 +10,20 @@ mkdir -p $par_index
 STAR \
    --runMode genomeGenerate \
    --genomeDir $par_index \
-    --genomeFastaFiles $par_genomeFastaFiles \
+    --genomeFastaFiles $par_genome_fasta_files \
    ${meta_cpus:+--runThreadN "${meta_cpus}"} \
-    ${par_sjdbGTFfile:+--sjdbGTFfile "${par_sjdbGTFfile}"} \
+    ${par_sjdb_gtf_file:+--sjdbGTFfile "${par_sjdb_gtf_file}"} \
    ${par_sjdbOverhang:+--sjdbOverhang "${par_sjdbOverhang}"} \
-    ${par_genomeSAindexNbases:+--genomeSAindexNbases "${par_genomeSAindexNbases}"} \
-    ${par_sjdbGTFchrPrefix:+--sjdbGTFchrPrefix "${par_sjdbGTFchrPrefix}"} \
-    ${par_sjdbGTFfeatureExon:+--sjdbGTFfeatureExon "${par_sjdbGTFfeatureExon}"} \
-    ${par_sjdbGTFtagExonParentTranscript:+--sjdbGTFtagExonParentTranscript "${par_sjdbGTFtagExonParentTranscript}"} \
-    ${par_sjdbGTFtagExonParentGene:+--sjdbGTFtagExonParentGene "${par_sjdbGTFtagExonParentGene}"} \
-    ${par_sjdbGTFtagExonParentGeneName:+--sjdbGTFtagExonParentGeneName "${par_sjdbGTFtagExonParentGeneName}"} \
-    ${par_sjdbGTFtagExonParentGeneType:+--sjdbGTFtagExonParentGeneType "${sjdbGTFtagExonParentGeneType}"} \
-    ${par_limitGenomeGenerateRAM:+--limitGenomeGenerateRAM "${par_limitGenomeGenerateRAM}"} \
-    ${par_genomeChrBinNbits:+--genomeChrBinNbits "${par_genomeChrBinNbits}"} \
-    ${par_genomeSAsparseD:+--genomeSAsparseD "${par_genomeSAsparseD}"} \
-    ${par_genomeSuffixLengthMax:+--genomeSuffixLengthMax "${par_genomeSuffixLengthMax}"} \
-    ${par_genomeTransformType:+--genomeTransformType "${par_genomeTransformType}"} \
-    ${par_genomeTransformVCF:+--genomeTransformVCF "${par_genomeTransformVCF}"} \
+    ${par_genome_sa_index_nbases:+--genomeSAindexNbases "${par_genome_sa_index_nbases}"} \
+    ${par_sjdb_gtf_chr_prefix:+--sjdbGTFchrPrefix "${par_sjdb_gtf_chr_prefix}"} \
+    ${par_sjdb_gtf_feature_exon:+--sjdbGTFfeatureExon "${par_sjdb_gtf_feature_exon}"} \
+    ${par_sjdb_gtf_tag_exon_parent_transcript:+--sjdbGTFtag_exon_parent_transcript "${par_sjdb_gtf_tag_exon_parent_transcript}"} \
+    ${par_sjdb_gtf_tag_exon_parent_gene:+--sjdbGTFtag_exon_parent_gene "${par_sjdb_gtf_tag_exon_parent_gene}"} \
+    ${par_sjdb_gtf_tag_exon_parent_geneName:+--sjdbGTFtag_exon_parent_geneName "${par_sjdb_gtf_tag_exon_parent_geneName}"} \
+    ${par_sjdb_gtf_tag_exon_parent_geneType:+--sjdbGTFtag_exon_parent_geneType "${sjdbGTFtag_exon_parent_geneType}"} \
+    ${par_limit_genome_generate_ram:+--limitGenomeGenerateRAM "${par_limit_genome_generate_ram}"} \
+    ${par_genome_chr_bin_nbits:+--genomeChrBinNbits "${par_genome_chr_bin_nbits}"} \
+    ${par_genome_sa_sparse_d:+--genomeSAsparseD "${par_genome_sa_sparse_d}"} \
+    ${par_genome_suffix_length_max:+--genomeSuffixLengthMax "${par_genome_suffix_length_max}"} \
+    ${par_genome_transform_type:+--genomeTransformType "${par_genome_transform_type}"} \
+    ${par_genome_transform_vcf:+--genomeTransformVCF "${par_genome_transform_vCF}"} \
--- a/src/star/star_genome_generate/test.sh
+++ b/src/star/star_genome_generate/test.sh
@@ -27,9 +27,9 @@ echo "> Generate index"
 "$meta_executable" \
  ${meta_cpus:+---cpus $meta_cpus} \
  --index "star_index/" \
-  --genomeFastaFiles "genome.fasta" \
-  --sjdbGTFfile "genes.gtf" \
-  --genomeSAindexNbases 2
+  --genome_fasta_files "genome.fasta" \
+  --sjdb_gtf_file "genes.gtf" \
+  --genome_sa_index_nbases 4 

 files=("Genome" "Log.out" "SA" "SAindex" "chrLength.txt" "chrName.txt" "chrNameLength.txt" "chrStart.txt" "exonGeTrInfo.tab" "exonInfo.tab" "geneInfo.tab" "genomeParameters.txt" "sjdbInfo.txt" "sjdbList.fromGTF.out.tab" "sjdbList.out.tab" "transcriptInfo.tab")

--- a/src/umi_tools/umi_tools_dedup/config.vsh.yaml
+++ b/src/umi_tools/umi_tools_dedup/config.vsh.yaml
@@ -0,0 +1,305 @@
+name: umi_tools_dedup
+namespace: umi_tools
+description: |
+  Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read.
+keywords: [umi_tools, deduplication, dedup]
+links:
+  homepage: https://umi-tools.readthedocs.io/en/latest/
+  documentation: https://umi-tools.readthedocs.io/en/latest/reference/dedup.html
+  repository: https://github.com/CGATOxford/UMI-tools
+references: 
+  doi: 10.1101/gr.209601.116
+license: MIT
+authors:
+  - __merge__: /src/_authors/emma_rousseau.yaml
+    roles: [ author, maintainer ]
+argument_groups:
+  - name: Inputs
+    arguments:
+      - name: --input
+        alternatives: --stdin
+        type: file
+        description: Input BAM or SAM file. Use --in_sam to specify SAM format.
+        required: true
+      - name: --in_sam
+        type: boolean_true
+        description: |
+          By default, inputs are assumed to be in BAM format. Use this options to specify the use of SAM
+          format for input.
+      - name: --bai
+        type: file
+        description: BAM index
+      - name: --random_seed
+        type: integer
+        description: Random seed to initialize number generator with.
+
+  - name: Outputs
+    arguments:
+      - name: --output
+        alternatives: --stdout
+        type: file
+        description: Deduplicated BAM file.
+        required: true
+        direction: output
+      - name: --out_sam
+        type: boolean_true
+        description: |
+          By default, outputa are written in BAM format. Use this options to specify the use of SAM format
+          for output.
+      - name: --paired
+        type: boolean_true
+        description: |
+          BAM is paired end - output both read pairs. This will also force the use of the template length
+          to determine reads with the same mapping coordinates.
+      - name: --output_stats
+        type: string
+        description: |
+          Generate files containing UMI based deduplication statistics files with this prefix in the file names.
+      - name: --extract_umi_method
+        type: string
+        choices: [read_id, tag, umis]
+        description: |
+          Specify the method by which the barcodes were encoded in the read.
+          The options are:
+            * read_id (default) 
+            * tag
+            * umis
+        example: "read_id"
+      - name: --umi_tag
+        type: string
+        description: |
+          The tag containing the UMI sequence. This is only required if the extract_umi_method is set to tag.
+      - name: --umi_separator
+        type: string
+        description: |
+          The separator used to separate the UMI from the read sequence. This is only required if the
+          extract_umi_method is set to id_read. Default: `_`.
+        example: '_'
+      - name: --umi_tag_split
+        type: string
+        description: Separate the UMI in tag by <SPLIT> and take the first element.
+      - name: --umi_tag_delimiter
+        type: string
+        description: Separate the UMI in by <DELIMITER> and concatenate the elements.
+      - name: --cell_tag
+        type: string
+        description: |
+          The tag containing the cell barcode sequence. This is only required if the extract_umi_method
+          is set to tag.
+      - name: --cell_tag_split
+        type: string
+        description: Separate the cell barcode in tag by <SPLIT> and take the first element.
+      - name: --cell_tag_delimiter
+        type: string
+        description: Separate the cell barcode in by <DELIMITER> and concatenate the elements.
+  
+  - name: Grouping Options
+    arguments:    
+      - name: --method
+        type: string
+        choices: [unique, percentile, cluster, adjacency, directional]
+        description: |
+          The method to use for grouping reads. 
+          The options are: 
+            * unique
+            * percentile
+            * cluster
+            * adjacency
+            * directional (default)
+        example: "directional"
+      - name: --edit_distance_threshold
+        type: integer
+        description: |
+          For the adjacency and cluster methods the threshold for the edit distance to connect two
+          UMIs in the network can be increased. The default value of 1 works best unless the UMI is
+          very long (>14bp). Default: `1`.
+        example: 1
+      - name: --spliced_is_unique
+        type: boolean_true
+        description: |
+          Causes two reads that start in the same position on the same strand and having the same UMI
+          to be considered unique if one is spliced and the other is not. (Uses the 'N' cigar operation
+          to test for splicing).
+      - name: --soft_clip_threshold
+        type: integer
+        description: |
+          Mappers that soft clip will sometimes do so rather than mapping a spliced read if there is only
+          a small overhang over the exon junction. By setting this option, you can treat reads with at
+          least this many bases soft-clipped at the 3' end as spliced. Default: `4`.
+        example: 4
+      - name: --multimapping_detection_method
+        type: string
+        description: |
+          If the sam/bam contains tags to identify multimapping reads, you can specify for use when selecting
+          the best read at a given loci. Supported tags are `NH`, `X0` and `XT`. If not specified, the read
+          with the highest mapping quality will be selected.
+      - name: --read_length
+        type: boolean_true
+        description: Use the read length as a criteria when deduping, for e.g. sRNA-Seq.
+  
+  - name: Single-cell RNA-Seq Options
+    arguments:
+      - name: --per_gene
+        type: boolean_true
+        description: |
+          Reads will be grouped together if they have the same gene. This is useful if your library prep
+          generates PCR duplicates with non identical alignment positions such as CEL-Seq. Note this option
+          is hardcoded to be on with the count command. I.e. counting is always performed per-gene. Must be
+          combined with either --gene_tag or --per_contig option.
+      - name: --gene_tag
+        type: string
+        description: |
+          Deduplicate per gene. The gene information is encoded in the bam read tag specified.
+      - name: --assigned_status_tag
+        type: string
+        description: |
+          BAM tag which describes whether a read is assigned to a gene. Defaults to the same value as given
+          for --gene_tag.
+      - name: --skip_tags_regex
+        type: string
+        description: |
+          Use in conjunction with the --assigned_status_tag option to skip any reads where the tag matches
+          this regex. Default ("^[__|Unassigned]") matches anything which starts with "__" or "Unassigned".
+      - name: --per_contig
+        type: boolean_true
+        description: |
+          Deduplicate per contig (field 3 in BAM; RNAME). All reads with the sam contig will be considered to
+          have the same alignment position. This is useful if you have aligned to a reference transcriptome
+          with one transcript per gene. If you have aligned to a transcriptome with more than one transcript
+          per gene, you can supply a map between transcripts and gene using the --gene_transcript_map option.
+      - name: --gene_transcript_map
+        type: file
+        description: |
+          A file containing a mapping between gene names and transcript names. The file should be tab
+          separated with the gene name in the first column and the transcript name in the second column.
+      - name: --per_cell
+        type: boolean_true
+        description: |
+          Reads will only be grouped together if they have the same cell barcode. Can be combined with
+          --per_gene.
+  
+  - name: SAM/BAM Options
+    arguments:
+      - name: --mapping_quality
+        type: integer
+        description: |
+          Minimium mapping quality (MAPQ) for a read to be retained. Default: `0`.
+        example: 0
+      - name: --unmapped_reads
+        type: string
+        description: |
+          How unmapped reads should be handled. 
+          The options are:
+            * "discard": Discard all unmapped reads. (default)
+            * "use":     If read2 is unmapped, deduplicate using read1 only. Requires --paired.
+            * "output":  Output unmapped reads/read pairs without UMI grouping/deduplication. Only available in umi_tools group.
+        example: "discard"
+      - name: --chimeric_pairs
+        type: string
+        choices: [discard, use, output]
+        description: |
+          How chimeric pairs should be handled. 
+          The options are:
+            * "discard": Discard all chimeric read pairs.
+            * "use":     Deduplicate using read1 only. (default)
+            * "output":  Output chimeric pairs without UMI grouping/deduplication. Only available in
+                         umi_tools group.
+        example: "use"
+      - name: --unpaired_reads
+        type: string
+        choices: [discard, use, output]
+        description: |
+          How unpaired reads should be handled. 
+          The options are: 
+            * "discard": Discard all unmapped reads.
+            * "use": If read2 is unmapped, deduplicate using read1 only. Requires --paired. (default)
+            * "output":  Output unmapped reads/read pairs without UMI grouping/deduplication. Only available
+                         in umi_tools group.
+        example: "use"
+      - name: --ignore_umi
+        type: boolean_true
+        description: Ignore the UMI and group reads using mapping coordinates only.
+      - name: --subset
+        type: double
+        description: |
+          Only consider a fraction of the reads, chosen at random. This is useful for doing saturation
+          analyses.
+      - name: --chrom
+        type: string
+        description: Only consider a single chromosome. This is useful for debugging/testing purposes.
+  
+  - name: Group/Dedup Options
+    arguments:
+      - name: --no_sort_output
+        type: boolean_true
+        description: |
+          By default, output is sorted. This involves the use of a temporary unsorted file (saved in
+          --temp_dir). Use this option to turn off sorting.
+      - name: --buffer_whole_contig
+        type: boolean_true
+        description: |
+          Forces dedup to parse an entire contig before yielding any reads for deduplication. This is the
+          only way to absolutely guarantee that all reads with the same start position are grouped together
+          for deduplication since dedup uses the start position of the read, not the alignment coordinate on
+          which the reads are sorted. However, by default, dedup reads for another 1000bp before outputting
+          read groups which will avoid any reads being missed with short read sequencing (<1000bp).
+  
+  - name: Common Options
+    arguments:
+      - name: --log
+        alternatives: -L
+        type: file
+        description: File with logging information.
+      - name: --log2stderr
+        type: boolean_true
+        description: Send logging information to stderr.
+      - name: --verbose
+        alternatives: -v
+        type: integer
+        description: |
+          Log level. The higher, the more output. Default: `0`.
+        example: 0
+      - name: --error
+        alternatives: -E
+        type: file
+        description: File with error information.
+      - name: --temp_dir
+        type: string
+        description: |
+          Directory for temporary files. If not set, the bash environmental variable TMPDIR is used.
+      - name: --compresslevel
+        type: integer
+        description: |
+          Level of Gzip compression to use. Default=6 matches GNU gzip rather than python gzip default.
+          Default: `6`.
+        example: 6
+      - name: --timeit
+        type: file
+        description: Store timing information in file.
+      - name: --timeit_name
+        type: string
+        description: |
+          Name in timing file for this class of jobs. Default: `all`.
+        example: "all"
+      - name: --timeit_header
+        type: string
+        description: Add header for timing information.
+
+resources:
+  - type: bash_script
+    path: script.sh
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: test_data
+engines:
+  - type: docker
+    image: quay.io/biocontainers/umi_tools:1.1.5--py39hf95cd2a_1
+    setup:
+      - type: docker
+        run: |
+            umi_tools -v | sed 's/ version//g' > /var/software_versions.txt
+runners:
+- type: executable
+- type: nextflow
--- a/src/umi_tools/umi_tools_dedup/help.txt
+++ b/src/umi_tools/umi_tools_dedup/help.txt
@@ -0,0 +1,113 @@
+'''
+Generated from the following UMI-tools documentation:
+      https://umi-tools.readthedocs.io/en/latest/common_options.html#common-options
+      https://umi-tools.readthedocs.io/en/latest/reference/dedup.html
+'''
+
+
+dedup - Deduplicate reads using UMI and mapping coordinates
+
+Usage: umi_tools dedup [OPTIONS] [--stdin=IN_BAM] [--stdout=OUT_BAM]
+
+       note: If --stdout is ommited, standard out is output. To
+             generate a valid BAM file on standard out, please
+             redirect log with --log=LOGFILE or --log2stderr 
+
+Common UMI-tools Options:
+
+      -S, --stdout                  File where output is to go [default = stdout].
+      -L, --log                     File with logging information [default = stdout].
+      --log2stderr                  Send logging information to stderr [default = False].
+      -v, --verbose                 Log level. The higher, the more output [default = 1].
+      -E, --error                   File with error information [default = stderr].
+      --temp-dir                    Directory for temporary files. If not set, the bash environmental variable TMPDIR is used[default = None].
+      --compresslevel               Level of Gzip compression to use. Default=6 matches GNU gzip rather than python gzip default (which is 9)
+
+      profiling and debugging options:
+      --timeit                      Store timing information in file [default=none].
+      --timeit-name                 Name in timing file for this class of jobs [default=all].
+      --timeit-header               Add header for timing information [default=none].
+      --random-seed                 Random seed to initialize number generator with [default=none].
+
+Dedup Options:
+      --output-stats=<prefix>             One can use the edit distance between UMIs at the same position as an quality control for the 
+                                          deduplication process by comparing with a null expectation of random sampling. For the random
+                                          sampling, the observed frequency of UMIs is used to more reasonably model the null expectation.
+                                          Use this option to generate a stats outfiles called: 
+                                                [PREFIX]_stats_edit_distance.tsv   
+                                                      Reports the (binned) average edit distance between the UMIs at each position.
+                                          In addition, this option will trigger reporting of further summary statistics for the UMIs which
+                                          may be informative for selecting the optimal deduplication method or debugging.
+                                          Each unique UMI sequence may be observed [0-many] times at multiple positions in the BAM. The
+                                          following files report the distribution for the frequencies of each UMI.
+                                                [PREFIX]_stats_per_umi_per_position.tsv
+                                                      Tabulates the counts for unique combinations of UMI and position.
+                                                [PREFIX]_stats_per_umi_per.tsv
+                                                      The _stats_per_umi_per.tsv table provides UMI-level summary statistics. 
+      --extract-umi-method=<method>       How are the barcodes encoded in the read?
+                                          Options are: read_id (default), tag, umis
+      --umi-separator=<separator>         Separator between read id and UMI. See --extract-umi-method above. Default=_
+      --umi-tag=<tag>                     Tag which contains UMI. See --extract-umi-method above
+      --umi-tag-split=<split>             Separate the UMI in tag by SPLIT and take the first element
+      --umi-tag-delimiter=<delimiter>     Separate the UMI in by DELIMITER and concatenate the elements
+      --cell-tag=<tag>                    Tag which contains cell barcode. See --extract-umi-method above
+      --cell-tag-split=<split>            Separate the cell barcode in tag by SPLIT and take the first element
+      --cell-tag-delimiter=<delimiter>    Separate the cell barcode in by DELIMITER and concatenate the elements
+      --method=<method>                   What method to use to identify group of reads with the same (or similar) UMI(s)?
+                                          All methods start by identifying the reads with the same mapping position.
+                                          The simplest methods, unique and percentile, group reads with the exact same UMI.
+                                          The network-based methods, cluster, adjacency and directional, build networks where
+                                          nodes are UMIs and edges connect UMIs with an edit distance <= threshold (usually 1).
+                                          The groups of reads are then defined from the network in a method-specific manner.
+                                          For all the network-based methods, each read group is equivalent to one read count for the gene.
+      --edit-distance-threshold=<threshold>     For the adjacency and cluster methods the threshold for the edit distance to connect
+                                                two UMIs in the network can be increased. The default value of 1 works best unless
+                                                the UMI is very long (>14bp).
+      --spliced-is-unique           Causes two reads that start in the same position on the same strand and having the
+                                    same UMI to be considered unique if one is spliced and the other is not.
+                                    (Uses the 'N' cigar operation to test for splicing).
+      --soft-clip-threshold=<threshold>    Mappers that soft clip will sometimes do so rather than mapping a spliced read if
+                                          there is only a small overhang over the exon junction. By setting this option, you
+                                          can treat reads with at least this many bases soft-clipped at the 3' end as spliced.
+                                          Default=4.
+      --multimapping-detection-method=<method>  If the sam/bam contains tags to identify multimapping reads, you can specify
+                                                for use when selecting the best read at a given loci. Supported tags are "NH",
+                                                "X0" and "XT". If not specified, the read with the highest mapping quality will be selected.
+      --read-length                              Use the read length as a criteria when deduping, for e.g sRNA-Seq.
+      --per-gene                    Reads will be grouped together if they have the same gene. This is useful if your
+                                    library prep generates PCR duplicates with non identical alignment positions such as CEL-Seq.
+                                    Note this option is hardcoded to be on with the count command. I.e counting is always
+                                    performed per-gene. Must be combined with either --gene-tag or --per-contig option.
+      --gene-tag=<tag>              Deduplicate per gene. The gene information is encoded in the bam read tag specified
+      --assigned-status-tag=<tag>   BAM tag which describes whether a read is assigned to a gene. Defaults to the same value
+                                    as given for --gene-tag
+      --skip-tags-regex=<regex>     Use in conjunction with the --assigned-status-tag option to skip any reads where the
+                                    tag matches this regex. Default ("^[__|Unassigned]") matches anything which starts with "__"
+                                    or "Unassigned":
+      --per-contig                  Deduplicate per contig (field 3 in BAM; RNAME). All reads with the same contig will be
+                                    considered to have the same alignment position. This is useful if you have aligned to a
+                                    reference transcriptome with one transcript per gene. If you have aligned to a transcriptome
+                                    with more than one transcript per gene, you can supply a map between transcripts and gene
+                                    using the --gene-transcript-map option
+      --gene-transcript-map=<file>  File mapping genes to transcripts (tab separated)
+      --per-cell                    Reads will only be grouped together if they have the same cell barcode. Can be combined with --per-gene.
+      --mapping-quality=<quality>   Minimium mapping quality (MAPQ) for a read to be retained. Default is 0.
+      --unmapped-reads=<option>     How should unmapped reads be handled.
+      --chimeric-pairs=<option>     How should chimeric read pairs be handled.
+      --unpaired-reads=<option>     How should unpaired reads be handled.
+      --ignore-umi                  Ignore the UMI and group reads using mapping coordinates only
+      --subset=<fraction>           Only consider a fraction of the reads, chosen at random. This is useful for doing saturation analyses.
+      --chrom=<chromosome>          Only consider a single chromosome. This is useful for debugging/testing purposes
+      --in-sam                      Input is in SAM format
+      --out-sam                     Output is in SAM format
+      --paired                      BAM is paired end - output both read pairs. This will also force the use of the template
+                                    length to determine reads with the same mapping coordinates.
+      --no-sort-output              By default, output is sorted. This involves the use of a temporary unsorted file since
+                                    reads are considered in the order of their start position which may not be the same as
+                                    their alignment coordinate due to soft-clipping and reverse alignments. The temp file
+                                    will be saved (in --temp-dir) and deleted when it has been sorted to the outfile. Use
+                                    this option to turn off sorting.
+      --buffer-whole-contig         Forces dedup to parse an entire contig before yielding any reads for deduplication.
+                                    This is the only way to absolutely guarantee that all reads with the same start position
+                                    are grouped together for deduplication since dedup uses the start position of the read,
+                                    not the alignment coordinate on which the reads are
--- a/src/umi_tools/umi_tools_dedup/script.sh
+++ b/src/umi_tools/umi_tools_dedup/script.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+set -e
+
+test_dir="${metal_executable}/test_data"
+
+[[ "$par_paired" == "false" ]] && unset par_paired
+[[ "$par_in_sam" == "false" ]] && unset par_in_sam
+[[ "$par_out_sam" == "false" ]] && unset par_out_sam
+[[ "$par_spliced_is_unique" == "false" ]] && unset par_spliced_is_unique
+[[ "$par_per_gene" == "false" ]] && unset par_per_gene
+[[ "$par_per_contig" == "false" ]] && unset par_per_contig
+[[ "$par_per_cell" == "false" ]] && unset par_per_cell
+[[ "$par_no_sort_output" == "false" ]] && unset par_no_sort_output
+[[ "$par_buffer_whole_contig" == "false" ]] && unset par_buffer_whole_contig
+[[ "$par_ignore_umi" == "false" ]] && unset par_ignore_umi
+[[ "$par_subset" == "false" ]] && unset par_subset
+[[ "$par_log2stderr" == "false" ]] && unset par_log2stderr
+[[ "$par_read_length" == "false" ]] && unset par_read_length
+
+umi_tools dedup \
+    --stdin "$par_input" \
+    ${par_in_sam:+--in-sam} \
+    --stdout "$par_output" \
+    ${par_out_sam:+--out-sam} \
+    ${par_paired:+--paired} \
+    ${par_output_stats:+--output-stats "$par_output_stats"} \
+    ${par_extract_umi_method:+--extract-umi-method "$par_extract_umi_method"} \
+    ${par_umi_tag:+--umi-tag "$par_umi_tag"} \
+    ${par_umi_separator:+--umi-separator "$par_umi_separator"} \
+    ${par_umi_tag_split:+--umi-tag-split "$par_umi_tag_split"} \
+    ${par_umi_tag_delimiter:+--umi-tag-delimiter "$par_umi_tag_delimiter"} \
+    ${par_cell_tag:+--cell-tag "$par_cell_tag"} \
+    ${par_cell_tag_split:+--cell-tag-split "$par_cell_tag_split"} \
+    ${par_cell_tag_delimiter:+--cell-tag-delimiter "$par_cell_tag_delimiter"} \
+    ${par_method:+--method "$par_method"} \
+    ${par_edit_distance_threshold:+--edit-distance-threshold "$par_edit_distance_threshold"} \
+    ${par_spliced_is_unique:+--spliced-is-unique} \
+    ${par_soft_clip_threshold:+--soft-clip-threshold "$par_soft_clip_threshold"} \
+    ${par_multimapping_detection_method:+--multimapping-detection-method "$par_multimapping_detection_method"} \
+    ${par_read_length:+--read-length} \
+    ${par_per_gene:+--per-gene} \
+    ${par_gene_tag:+--gene-tag "$par_gene_tag"} \
+    ${par_assigned_status_tag:+--assigned-status-tag "$par_assigned_status_tag"} \
+    ${par_skip_tags_regex:+--skip-tags-regex "$par_skip_tags_regex"} \
+    ${par_per_contig:+--per-contig} \
+    ${par_gene_transcript_map:+--gene-transcript-map "$par_gene_transcript_map"} \
+    ${par_per_cell:+--per-cell} \
+    ${par_mapping_quality:+--mapping-quality "$par_mapping_quality"} \
+    ${par_unmapped_reads:+--unmapped-reads "$par_unmapped_reads"} \
+    ${par_chimeric_pairs:+--chimeric-pairs "$par_chimeric_pairs"} \
+    ${par_unpaired_reads:+--unpaired-reads "$par_unpaired_reads"} \
+    ${par_ignore_umi:+--ignore-umi} \
+    ${par_subset:+--subset "$par_subset"} \
+    ${par_chrom:+--chrom "$par_chrom"} \
+    ${par_no_sort_output:+--no-sort-output} \
+    ${par_buffer_whole_contig:+--buffer-whole-contig} \
+    ${par_log:+-L "$par_log"} \
+    ${par_log2stderr:+--log2stderr} \
+    ${par_verbose:+-v "$par_verbose"} \
+    ${par_error:+-E "$par_error"} \
+    ${par_temp_dir:+--temp-dir "$par_temp_dir"} \
+    ${par_compresslevel:+--compresslevel "$par_compresslevel"} \
+    ${par_timeit:+--timeit "$par_timeit"} \
+    ${par_timeit_name:+--timeit-name "$par_timeit_name"} \
+    ${par_timeit_header:+--timeit-header "$par_timeit_header"} \
+    ${par_random_seed:+--random-seed "$par_random_seed"}
+
+exit 0
--- a/src/umi_tools/umi_tools_dedup/test.sh
+++ b/src/umi_tools/umi_tools_dedup/test.sh
@@ -0,0 +1,87 @@
+#!/bin/bash
+
+test_dir="${meta_resources_dir}/test_data"
+out_dir="${meta_resources_dir}/out"
+
+mkdir -p "$out_dir"
+
+############################################################################################
+
+echo ">>> Test 1: Basic usage of $meta_functionality_name with statistics output"
+
+"$meta_executable" \
+  --paired \
+  --input "$test_dir/sample.bam" \
+  --bai "$test_dir/sample.bam.bai" \
+  --output "$out_dir/deduped.sam" \
+  --out_sam \
+  --output_stats "$out_dir/dedup" \
+  --random_seed 1
+
+echo ">>> Checking whether output exists"
+[ ! -f "$out_dir/deduped.sam" ] && echo "File 'deduped.sam' does not exist!" && exit 1
+[ ! -f "$out_dir/dedup_edit_distance.tsv" ] && echo "File 'dedup_edit_distance.tsv' does not exist!" && exit 1
+
+echo ">>> Checking whether output is non-empty"
+[ ! -s "$out_dir/deduped.sam" ] && echo "File 'deduped.sam' is empty!" && exit 1
+[ ! -s "$out_dir/dedup_edit_distance.tsv" ] && echo "File 'dedup_edit_distance.tsv' is empty!" && exit 1
+
+echo ">>> Checking whether output is correct"
+diff "$out_dir/deduped.sam" "$test_dir/deduped.sam" || \
+    (echo "Output file deduped.sam does not match expected output" && exit 1)
+diff "$out_dir/dedup_edit_distance.tsv" "$test_dir/dedup_edit_distance.tsv" || \
+    (echo "Output file dedup_edit_distance.tsv does not match expected output" && exit 1)
+
+############################################################################################
+
+echo ">>> Test 2: $meta_functionality_name with random subset selection"
+
+"$meta_executable" \
+  --paired \
+  --input "$test_dir/sample.bam" \
+  --bai "$test_dir/sample.bam.bai" \
+  --output "$out_dir/deduped_fraction.sam" \
+  --out_sam \
+  --subset 0.5 \
+  --random_seed 1
+
+
+echo ">>> Checking whether output exists"
+[ ! -f "$out_dir/deduped_fraction.sam" ] && echo "File 'deduped_fraction.sam' does not exist!" && exit 1
+
+echo ">>> Checking whether output is non-empty"
+[ ! -s "$out_dir/deduped_fraction.sam" ] && echo "File 'deduped_fraction.sam' is empty!" && exit 1
+
+echo ">>> Checking whether output is correct"
+diff "$out_dir/deduped_fraction.sam" "$test_dir/deduped_fraction.sam" || \
+    (echo "Output file deduped_fraction.sam does not match expected output" && exit 1)
+
+############################################################################################
+
+echo ">>> Test 3: $meta_functionality_name with --method unique"
+
+"$meta_executable" \
+  --paired \
+  --input "$test_dir/sample.bam" \
+  --bai "$test_dir/sample.bam.bai" \
+  --output "$out_dir/deduped_unique.sam" \
+  --out_sam \
+  --method "unique" \
+  --random_seed 1
+
+echo ">>> Checking whether output exists"
+[ ! -f "$out_dir/deduped_unique.sam" ] && echo "File 'deduped_unique.sam' does not exist!" && exit 1
+
+echo ">>> Checking whether output is non-empty"
+[ ! -s "$out_dir/deduped_unique.sam" ] && echo "File 'deduped_unique.sam' is empty!" && exit 1
+
+echo ">>> Checking whether output is correct"
+diff "$out_dir/deduped_unique.sam" "$test_dir/deduped_unique.sam" || \
+    (echo "Output file deduped_unique.sam does not match expected output" && exit 1)
+
+############################################################################################
+
+rm -rf "$out_dir"
+
+echo "All tests succeeded!"
+exit 0
--- a/src/umi_tools/umi_tools_dedup/test_data/dedup_edit_distance.tsv
+++ b/src/umi_tools/umi_tools_dedup/test_data/dedup_edit_distance.tsv
@@ -0,0 +1,5 @@
+unique	unique_null	directional	directional_null	edit_distance
+3	3	4	4	Single_UMI
+0	1	0	0	0
+1	0	0	0	1
+0	0	0	0	2
--- a/src/umi_tools/umi_tools_dedup/test_data/dedup_per_umi.tsv
+++ b/src/umi_tools/umi_tools_dedup/test_data/dedup_per_umi.tsv
@@ -0,0 +1,6 @@
+UMI	median_counts_pre	times_observed_pre	total_counts_pre	median_counts_post	times_observed_post	total_counts_post
+ACCGGTTTA	74	1	74	74	1	74
+ACTGGTTTC	48	1	48	49	1	49
+AGCGGTTAC	1	1	1	1	1	1
+CCAGGTTCT	1	1	1	1	1	1
+TCTGGTTTC	1	1	1	0	0	0
--- a/Show More
+++ b/Show More