Build branch main with version main (781f668)
Build pipeline: viash-hub.craftbox.main-5sjkz
Source commit: 781f668839
Source message: FEAT: add untar (#3)
* FEAT: add untar
* Use meta_executable
This commit is contained in:
383
CONTRIBUTING.md
Normal file
383
CONTRIBUTING.md
Normal file
@@ -0,0 +1,383 @@
|
||||
|
||||
# Contributing guidelines
|
||||
|
||||
We encourage contributions from the community. To contribute:
|
||||
|
||||
1. **Fork the Repository**: Start by forking this repository to your account.
|
||||
2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
|
||||
3. **Submit a Pull Request**: After testing your component, submit a pull request for review.
|
||||
|
||||
## Procedure of adding a component
|
||||
|
||||
### Step 1: Find a component to contribute
|
||||
|
||||
* Find a tool to contribute to this repo.
|
||||
|
||||
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
|
||||
|
||||
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
|
||||
|
||||
* Create an issue to show that you are working on this component.
|
||||
|
||||
|
||||
### Step 2: Add config template
|
||||
|
||||
Change all occurrences of `xxx` to the name of the component.
|
||||
|
||||
Create a file at `src/xxx/config.vsh.yaml` with contents:
|
||||
|
||||
```yaml
|
||||
name: xxx
|
||||
description: xxx
|
||||
keywords: [tag1, tag2]
|
||||
links:
|
||||
homepage: yyy
|
||||
documentation: yyy
|
||||
issue_tracker: yyy
|
||||
repository: yyy
|
||||
references:
|
||||
doi: 12345/12345678.yz
|
||||
license: MIT/Apache-2.0/GPL-3.0/...
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments: <...>
|
||||
- name: Outputs
|
||||
arguments: <...>
|
||||
- name: Arguments
|
||||
arguments: <...>
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
engines:
|
||||
- <...>
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
```
|
||||
|
||||
### Step 3: Fill in the metadata
|
||||
|
||||
Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
|
||||
|
||||
```yaml
|
||||
functionality:
|
||||
name: arriba
|
||||
description: Detect gene fusions from RNA-Seq data
|
||||
keywords: [Gene fusion, RNA-Seq]
|
||||
links:
|
||||
homepage: https://arriba.readthedocs.io/en/latest/
|
||||
documentation: https://arriba.readthedocs.io/en/latest/
|
||||
repository: https://github.com/suhrig/arriba
|
||||
issue_tracker: https://github.com/suhrig/arriba/issues
|
||||
references:
|
||||
doi: 10.1101/gr.257246.119
|
||||
bibtex: |
|
||||
@article{
|
||||
... a bibtex entry in case the doi is not available ...
|
||||
}
|
||||
license: MIT
|
||||
```
|
||||
|
||||
### Step 4: Find a suitable container
|
||||
|
||||
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
|
||||
|
||||
If no such container is found, you can create a custom container in the next step.
|
||||
|
||||
|
||||
### Step 5: Create help file
|
||||
|
||||
To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
|
||||
|
||||
````bash
|
||||
cat <<EOF > src/xxx/help.txt
|
||||
```sh
|
||||
xxx --help
|
||||
```
|
||||
EOF
|
||||
|
||||
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
|
||||
````
|
||||
|
||||
Notes:
|
||||
|
||||
* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
|
||||
|
||||
* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`:
|
||||
|
||||
```bash
|
||||
docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h
|
||||
```
|
||||
|
||||
|
||||
### Step 6: Create or fetch test data
|
||||
|
||||
To help develop the component, it's interesting to have some test data available. In most cases, we can use the test data from the Snakemake wrappers.
|
||||
|
||||
To make sure we can reproduce the test data in the future, we store the command to fetch the test data in a file at `src/xxx/test_data/script.sh`.
|
||||
|
||||
```bash
|
||||
cat <<EOF > src/xxx/test_data/script.sh
|
||||
|
||||
# clone repo
|
||||
if [ ! -d /tmp/snakemake-wrappers ]; then
|
||||
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
|
||||
fi
|
||||
|
||||
# copy test data
|
||||
cp -r /tmp/snakemake-wrappers/bio/xxx/test/* src/xxx/test_data
|
||||
EOF
|
||||
```
|
||||
|
||||
The test data should be suitable for testing this component. Ensure that the test data is small enough: ideally <1KB, preferably <10KB, if need be <100KB.
|
||||
|
||||
### Step 7: Add arguments for the input files
|
||||
|
||||
By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component.
|
||||
|
||||
For instance, in the [arriba help file](src/arriba/help.txt), we see the following:
|
||||
|
||||
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
|
||||
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
|
||||
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
|
||||
-o fusions.tsv [-O fusions.discarded.tsv] \
|
||||
[OPTIONS]
|
||||
|
||||
-x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
||||
(Aligned.out.sam). Arriba extracts candidate reads from this file.
|
||||
|
||||
Based on this information, we can add the following input arguments to the config file.
|
||||
|
||||
```yaml
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: --bam
|
||||
alternatives: -x
|
||||
type: file
|
||||
description: |
|
||||
File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
||||
(Aligned.out.sam). Arriba extracts candidate reads from this file.
|
||||
required: true
|
||||
example: Aligned.out.bam
|
||||
```
|
||||
|
||||
Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments.
|
||||
|
||||
Several notes:
|
||||
|
||||
* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`.
|
||||
|
||||
* Input arguments can have `multiple: true` to allow the user to specify multiple files.
|
||||
|
||||
|
||||
|
||||
### Step 8: Add arguments for the output files
|
||||
|
||||
By looking at the help file, we now also add output arguments to the config file.
|
||||
|
||||
For example, in the [arriba help file](src/arriba/help.txt), we see the following:
|
||||
|
||||
|
||||
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
|
||||
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
|
||||
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
|
||||
-o fusions.tsv [-O fusions.discarded.tsv] \
|
||||
[OPTIONS]
|
||||
|
||||
-o FILE Output file with fusions that have passed all filters.
|
||||
|
||||
-O FILE Output file with fusions that were discarded due to filtering.
|
||||
|
||||
Based on this information, we can add the following output arguments to the config file.
|
||||
|
||||
```yaml
|
||||
argument_groups:
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: --fusions
|
||||
alternatives: -o
|
||||
type: file
|
||||
direction: output
|
||||
description: |
|
||||
Output file with fusions that have passed all filters.
|
||||
required: true
|
||||
example: fusions.tsv
|
||||
- name: --fusions_discarded
|
||||
alternatives: -O
|
||||
type: file
|
||||
direction: output
|
||||
description: |
|
||||
Output file with fusions that were discarded due to filtering.
|
||||
required: false
|
||||
example: fusions.discarded.tsv
|
||||
```
|
||||
|
||||
Note:
|
||||
|
||||
* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
|
||||
|
||||
### Step 9: Add arguments for the other arguments
|
||||
|
||||
Finally, add all other arguments to the config file. There are a few exceptions:
|
||||
|
||||
* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file.
|
||||
|
||||
* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.
|
||||
|
||||
|
||||
### Step 10: Add a Docker engine
|
||||
|
||||
To ensure reproducibility of components, we require that all components are run in a Docker container.
|
||||
|
||||
```yaml
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/xxx:0.1.0--py_0
|
||||
```
|
||||
|
||||
The container should have your tool installed, as well as `ps`.
|
||||
|
||||
If you didn't find a suitable container in the previous step, you can create a custom container. For example:
|
||||
|
||||
```yaml
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.10
|
||||
setup:
|
||||
- type: python
|
||||
packages: numpy
|
||||
```
|
||||
|
||||
For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform).
|
||||
|
||||
Here is a list of base containers we can recommend:
|
||||
|
||||
* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu)
|
||||
* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script)
|
||||
* JavaScript: [`node`](https://hub.docker.com/_/node)
|
||||
* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
|
||||
* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse)
|
||||
* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt)
|
||||
|
||||
### Step 11: Write a runner script
|
||||
|
||||
Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
xxx \
|
||||
--input "$par_input" \
|
||||
--output "$par_output" \
|
||||
$([ "$par_option" = "true" ] && echo "--option")
|
||||
```
|
||||
|
||||
When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
|
||||
|
||||
As an example, this is what the Bash script for the `arriba` component looks like:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
arriba \
|
||||
-x "$par_bam" \
|
||||
-a "$par_genome" \
|
||||
-g "$par_gene_annotation" \
|
||||
-o "$par_fusions" \
|
||||
${par_known_fusions:+-k "${par_known_fusions}"} \
|
||||
${par_blacklist:+-b "${par_blacklist}"} \
|
||||
${par_structural_variants:+-d "${par_structural_variants}"} \
|
||||
$([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
|
||||
$([ "$par_extra_information" = "true" ] && echo "-X") \
|
||||
$([ "$par_fill_gaps" = "true" ] && echo "-I")
|
||||
```
|
||||
|
||||
|
||||
### Step 12: Create test script
|
||||
|
||||
|
||||
If the unit test requires test resources, these should be provided in the `test_resources` section of the component.
|
||||
|
||||
```yaml
|
||||
functionality:
|
||||
# ...
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
```
|
||||
|
||||
Create a test script at `src/xxx/test.sh` that runs the component with the test data. This script should run the component (available with `$meta_executable`) with the test data and check if the output is as expected. The script should exit with a non-zero exit code if the output is not as expected. For example:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
echo "> Run xxx with test data"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/input.txt" \
|
||||
--output "output.txt" \
|
||||
--option
|
||||
|
||||
echo ">> Checking output"
|
||||
[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
|
||||
```
|
||||
|
||||
|
||||
For example, this is what the test script for the `arriba` component looks like:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
echo "> Run arriba with blacklist"
|
||||
"$meta_executable" \
|
||||
--bam "$meta_resources_dir/test_data/A.bam" \
|
||||
--genome "$meta_resources_dir/test_data/genome.fasta" \
|
||||
--gene_annotation "$meta_resources_dir/test_data/annotation.gtf" \
|
||||
--blacklist "$meta_resources_dir/test_data/blacklist.tsv" \
|
||||
--fusions "fusions.tsv" \
|
||||
--fusions_discarded "fusions_discarded.tsv" \
|
||||
--interesting_contigs "1,2"
|
||||
|
||||
echo ">> Checking output"
|
||||
[ ! -f "fusions.tsv" ] && echo "Output file fusions.tsv does not exist" && exit 1
|
||||
[ ! -f "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv does not exist" && exit 1
|
||||
|
||||
echo ">> Check if output is empty"
|
||||
[ ! -s "fusions.tsv" ] && echo "Output file fusions.tsv is empty" && exit 1
|
||||
[ ! -s "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv is empty" && exit 1
|
||||
```
|
||||
|
||||
### Step 12: Create a `/var/software_versions.txt` file
|
||||
|
||||
For the sake of transparency and reproducibility, we require that the versions of the software used in the component are documented.
|
||||
|
||||
For now, this is managed by creating a file `/var/software_versions.txt` in the `setup` section of the Docker engine.
|
||||
|
||||
```yaml
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/xxx:0.1.0--py_0
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
|
||||
```
|
||||
Reference in New Issue
Block a user