Build pipeline: vsh-ci-template-7m7cc
Source commit: 73b07a96e0
Source message: Fix quotes in test (#4)
* Fix quotes in test
* query the first output file to pick up the 'name' value again and output it
grep with start and end position constaints
384 lines
12 KiB
Markdown
384 lines
12 KiB
Markdown
|
|
# Contributing guidelines
|
|
|
|
We encourage contributions from the community. To contribute:
|
|
|
|
1. **Fork the Repository**: Start by forking this repository to your account.
|
|
2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
|
|
3. **Submit a Pull Request**: After testing your component, submit a pull request for review.
|
|
|
|
## Procedure of adding a component
|
|
|
|
### Step 1: Find a component to contribute
|
|
|
|
* Find a tool to contribute to this repo.
|
|
|
|
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
|
|
|
|
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
|
|
|
|
* Create an issue to show that you are working on this component.
|
|
|
|
|
|
### Step 2: Add config template
|
|
|
|
Change all occurrences of `xxx` to the name of the component.
|
|
|
|
Create a file at `src/xxx/config.vsh.yaml` with contents:
|
|
|
|
```yaml
|
|
name: xxx
|
|
description: xxx
|
|
keywords: [tag1, tag2]
|
|
links:
|
|
homepage: yyy
|
|
documentation: yyy
|
|
issue_tracker: yyy
|
|
repository: yyy
|
|
references:
|
|
doi: 12345/12345678.yz
|
|
license: MIT/Apache-2.0/GPL-3.0/...
|
|
argument_groups:
|
|
- name: Inputs
|
|
arguments: <...>
|
|
- name: Outputs
|
|
arguments: <...>
|
|
- name: Arguments
|
|
arguments: <...>
|
|
resources:
|
|
- type: bash_script
|
|
path: script.sh
|
|
test_resources:
|
|
- type: bash_script
|
|
path: test.sh
|
|
- type: file
|
|
path: test_data
|
|
engines:
|
|
- <...>
|
|
runners:
|
|
- type: executable
|
|
- type: nextflow
|
|
```
|
|
|
|
### Step 3: Fill in the metadata
|
|
|
|
Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
|
|
|
|
```yaml
|
|
functionality:
|
|
name: arriba
|
|
description: Detect gene fusions from RNA-Seq data
|
|
keywords: [Gene fusion, RNA-Seq]
|
|
links:
|
|
homepage: https://arriba.readthedocs.io/en/latest/
|
|
documentation: https://arriba.readthedocs.io/en/latest/
|
|
repository: https://github.com/suhrig/arriba
|
|
issue_tracker: https://github.com/suhrig/arriba/issues
|
|
references:
|
|
doi: 10.1101/gr.257246.119
|
|
bibtex: |
|
|
@article{
|
|
... a bibtex entry in case the doi is not available ...
|
|
}
|
|
license: MIT
|
|
```
|
|
|
|
### Step 4: Find a suitable container
|
|
|
|
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
|
|
|
|
If no such container is found, you can create a custom container in the next step.
|
|
|
|
|
|
### Step 5: Create help file
|
|
|
|
To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
|
|
|
|
````bash
|
|
cat <<EOF > src/xxx/help.txt
|
|
```sh
|
|
xxx --help
|
|
```
|
|
EOF
|
|
|
|
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
|
|
````
|
|
|
|
Notes:
|
|
|
|
* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
|
|
|
|
* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`:
|
|
|
|
```bash
|
|
docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h
|
|
```
|
|
|
|
|
|
### Step 6: Create or fetch test data
|
|
|
|
To help develop the component, it's interesting to have some test data available. In most cases, we can use the test data from the Snakemake wrappers.
|
|
|
|
To make sure we can reproduce the test data in the future, we store the command to fetch the test data in a file at `src/xxx/test_data/script.sh`.
|
|
|
|
```bash
|
|
cat <<EOF > src/xxx/test_data/script.sh
|
|
|
|
# clone repo
|
|
if [ ! -d /tmp/snakemake-wrappers ]; then
|
|
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
|
|
fi
|
|
|
|
# copy test data
|
|
cp -r /tmp/snakemake-wrappers/bio/xxx/test/* src/xxx/test_data
|
|
EOF
|
|
```
|
|
|
|
The test data should be suitable for testing this component. Ensure that the test data is small enough: ideally <1KB, preferably <10KB, if need be <100KB.
|
|
|
|
### Step 7: Add arguments for the input files
|
|
|
|
By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component.
|
|
|
|
For instance, in the [arriba help file](src/arriba/help.txt), we see the following:
|
|
|
|
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
|
|
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
|
|
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
|
|
-o fusions.tsv [-O fusions.discarded.tsv] \
|
|
[OPTIONS]
|
|
|
|
-x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
|
(Aligned.out.sam). Arriba extracts candidate reads from this file.
|
|
|
|
Based on this information, we can add the following input arguments to the config file.
|
|
|
|
```yaml
|
|
argument_groups:
|
|
- name: Inputs
|
|
arguments:
|
|
- name: --bam
|
|
alternatives: -x
|
|
type: file
|
|
description: |
|
|
File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
|
(Aligned.out.sam). Arriba extracts candidate reads from this file.
|
|
required: true
|
|
example: Aligned.out.bam
|
|
```
|
|
|
|
Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments.
|
|
|
|
Several notes:
|
|
|
|
* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`.
|
|
|
|
* Input arguments can have `multiple: true` to allow the user to specify multiple files.
|
|
|
|
|
|
|
|
### Step 8: Add arguments for the output files
|
|
|
|
By looking at the help file, we now also add output arguments to the config file.
|
|
|
|
For example, in the [arriba help file](src/arriba/help.txt), we see the following:
|
|
|
|
|
|
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
|
|
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
|
|
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
|
|
-o fusions.tsv [-O fusions.discarded.tsv] \
|
|
[OPTIONS]
|
|
|
|
-o FILE Output file with fusions that have passed all filters.
|
|
|
|
-O FILE Output file with fusions that were discarded due to filtering.
|
|
|
|
Based on this information, we can add the following output arguments to the config file.
|
|
|
|
```yaml
|
|
argument_groups:
|
|
- name: Outputs
|
|
arguments:
|
|
- name: --fusions
|
|
alternatives: -o
|
|
type: file
|
|
direction: output
|
|
description: |
|
|
Output file with fusions that have passed all filters.
|
|
required: true
|
|
example: fusions.tsv
|
|
- name: --fusions_discarded
|
|
alternatives: -O
|
|
type: file
|
|
direction: output
|
|
description: |
|
|
Output file with fusions that were discarded due to filtering.
|
|
required: false
|
|
example: fusions.discarded.tsv
|
|
```
|
|
|
|
Note:
|
|
|
|
* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
|
|
|
|
### Step 9: Add arguments for the other arguments
|
|
|
|
Finally, add all other arguments to the config file. There are a few exceptions:
|
|
|
|
* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file.
|
|
|
|
* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.
|
|
|
|
|
|
### Step 10: Add a Docker engine
|
|
|
|
To ensure reproducibility of components, we require that all components are run in a Docker container.
|
|
|
|
```yaml
|
|
engines:
|
|
- type: docker
|
|
image: quay.io/biocontainers/xxx:0.1.0--py_0
|
|
```
|
|
|
|
The container should have your tool installed, as well as `ps`.
|
|
|
|
If you didn't find a suitable container in the previous step, you can create a custom container. For example:
|
|
|
|
```yaml
|
|
engines:
|
|
- type: docker
|
|
image: python:3.10
|
|
setup:
|
|
- type: python
|
|
packages: numpy
|
|
```
|
|
|
|
For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform).
|
|
|
|
Here is a list of base containers we can recommend:
|
|
|
|
* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu)
|
|
* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script)
|
|
* JavaScript: [`node`](https://hub.docker.com/_/node)
|
|
* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
|
|
* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse)
|
|
* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt)
|
|
|
|
### Step 11: Write a runner script
|
|
|
|
Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments.
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
## VIASH START
|
|
## VIASH END
|
|
|
|
xxx \
|
|
--input "$par_input" \
|
|
--output "$par_output" \
|
|
$([ "$par_option" = "true" ] && echo "--option")
|
|
```
|
|
|
|
When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
|
|
|
|
As an example, this is what the Bash script for the `arriba` component looks like:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
## VIASH START
|
|
## VIASH END
|
|
|
|
arriba \
|
|
-x "$par_bam" \
|
|
-a "$par_genome" \
|
|
-g "$par_gene_annotation" \
|
|
-o "$par_fusions" \
|
|
${par_known_fusions:+-k "${par_known_fusions}"} \
|
|
${par_blacklist:+-b "${par_blacklist}"} \
|
|
${par_structural_variants:+-d "${par_structural_variants}"} \
|
|
$([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
|
|
$([ "$par_extra_information" = "true" ] && echo "-X") \
|
|
$([ "$par_fill_gaps" = "true" ] && echo "-I")
|
|
```
|
|
|
|
|
|
### Step 12: Create test script
|
|
|
|
|
|
If the unit test requires test resources, these should be provided in the `test_resources` section of the component.
|
|
|
|
```yaml
|
|
functionality:
|
|
# ...
|
|
test_resources:
|
|
- type: bash_script
|
|
path: test.sh
|
|
- type: file
|
|
path: test_data
|
|
```
|
|
|
|
Create a test script at `src/xxx/test.sh` that runs the component with the test data. This script should run the component (available with `$meta_executable`) with the test data and check if the output is as expected. The script should exit with a non-zero exit code if the output is not as expected. For example:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
## VIASH START
|
|
## VIASH END
|
|
|
|
echo "> Run xxx with test data"
|
|
"$meta_executable" \
|
|
--input "$meta_resources_dir/test_data/input.txt" \
|
|
--output "output.txt" \
|
|
--option
|
|
|
|
echo ">> Checking output"
|
|
[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
|
|
```
|
|
|
|
|
|
For example, this is what the test script for the `arriba` component looks like:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
## VIASH START
|
|
## VIASH END
|
|
|
|
echo "> Run arriba with blacklist"
|
|
"$meta_executable" \
|
|
--bam "$meta_resources_dir/test_data/A.bam" \
|
|
--genome "$meta_resources_dir/test_data/genome.fasta" \
|
|
--gene_annotation "$meta_resources_dir/test_data/annotation.gtf" \
|
|
--blacklist "$meta_resources_dir/test_data/blacklist.tsv" \
|
|
--fusions "fusions.tsv" \
|
|
--fusions_discarded "fusions_discarded.tsv" \
|
|
--interesting_contigs "1,2"
|
|
|
|
echo ">> Checking output"
|
|
[ ! -f "fusions.tsv" ] && echo "Output file fusions.tsv does not exist" && exit 1
|
|
[ ! -f "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv does not exist" && exit 1
|
|
|
|
echo ">> Check if output is empty"
|
|
[ ! -s "fusions.tsv" ] && echo "Output file fusions.tsv is empty" && exit 1
|
|
[ ! -s "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv is empty" && exit 1
|
|
```
|
|
|
|
### Step 12: Create a `/var/software_versions.txt` file
|
|
|
|
For the sake of transparency and reproducibility, we require that the versions of the software used in the component are documented.
|
|
|
|
For now, this is managed by creating a file `/var/software_versions.txt` in the `setup` section of the Docker engine.
|
|
|
|
```yaml
|
|
engines:
|
|
- type: docker
|
|
image: quay.io/biocontainers/xxx:0.1.0--py_0
|
|
setup:
|
|
- type: docker
|
|
run: |
|
|
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
|
|
```
|