Build pipeline: viash-hub.biobox.main-tb4cv
Source commit: 7158daa5f6
Source message: Fix bases2fastq component, update to latest practices (#190)
* wip updates
* refactor component
* assume bases2fastq follows semver
* fix version command
* add entry to changelog
* move to minor changes
269 lines
8.2 KiB
Markdown
269 lines
8.2 KiB
Markdown
# Component Development Guide
|
|
|
|
This guide provides detailed step-by-step instructions for creating a new component in biobox.
|
|
|
|
## Table of Contents
|
|
- [Initial Setup](#initial-setup)
|
|
- [Configuration](#configuration)
|
|
- [Arguments](#arguments)
|
|
- [Implementation](#implementation)
|
|
- [Testing](#testing)
|
|
- [Documentation](#documentation)
|
|
|
|
## Initial Setup
|
|
|
|
### Step 1: Find a component to contribute
|
|
|
|
* Find a tool to contribute to this repo.
|
|
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
|
|
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
|
|
* Create an issue to show that you are working on this component.
|
|
|
|
### Step 2: Find a suitable container
|
|
|
|
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
|
|
|
|
If no such container is found, you can create a custom container in a later step.
|
|
|
|
### Step 3: Create help file
|
|
|
|
To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
|
|
|
|
```bash
|
|
cat <<EOF > src/xxx/help.txt
|
|
\```sh
|
|
xxx --help
|
|
\```
|
|
EOF
|
|
|
|
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
|
|
```
|
|
|
|
**Notes:**
|
|
* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
|
|
* Some tools might not have a `--help` argument but instead have a `-h` argument.
|
|
|
|
## Configuration
|
|
|
|
### Metadata Setup
|
|
|
|
Fill in the relevant metadata fields in the config:
|
|
|
|
```yaml
|
|
name: bowtie2_build
|
|
namespace: bowtie2
|
|
description: |
|
|
Build Bowtie2 index files from reference sequences.
|
|
keywords: [Alignment, Indexing]
|
|
links:
|
|
homepage: https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
|
|
documentation: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
|
|
repository: https://github.com/BenLangmead/bowtie2
|
|
references:
|
|
doi: 10.1038/nmeth.1923
|
|
license: GPL-3.0
|
|
requirements:
|
|
commands: [bowtie2-build]
|
|
authors:
|
|
- __merge__: /src/_authors/robrecht_cannoodt.yaml
|
|
roles: [author, maintainer]
|
|
```
|
|
|
|
### Requirements Specification
|
|
|
|
The `requirements` section documents the dependencies needed by your component:
|
|
|
|
```yaml
|
|
requirements:
|
|
commands: [bowtie2-build, bowtie2]
|
|
```
|
|
|
|
**Why specify commands:**
|
|
- Documents which executables the component expects
|
|
- Enables validation that the Docker container has required tools
|
|
- Helps users understand dependencies
|
|
- Facilitates automated testing and CI/CD
|
|
|
|
## Arguments
|
|
|
|
### Input Arguments
|
|
|
|
By looking at the help file, add input arguments to the config file:
|
|
|
|
```yaml
|
|
argument_groups:
|
|
- name: Inputs
|
|
arguments:
|
|
- name: --bam
|
|
alternatives: -x
|
|
type: file
|
|
description: |
|
|
File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
|
(`Aligned.out.sam`). Arriba extracts candidate reads from this file.
|
|
required: true
|
|
example: Aligned.out.bam
|
|
```
|
|
|
|
**Key principles:**
|
|
* Argument names should be formatted in `--snake_case`
|
|
* Input arguments can have `multiple: true` to allow multiple files
|
|
* **Descriptions must be formatted in markdown** - they will be used downstream for rendering documentation
|
|
* You can make minor changes to the formatting of arguments to improve clarity and better utilize markdown structure
|
|
* Use markdown features like code blocks, lists, emphasis, and links to enhance readability
|
|
|
|
### Output Arguments
|
|
|
|
Add output arguments based on the tool's help:
|
|
|
|
```yaml
|
|
argument_groups:
|
|
- name: Outputs
|
|
arguments:
|
|
- name: --fusions
|
|
alternatives: -o
|
|
type: file
|
|
direction: output
|
|
description: |
|
|
Output file with fusions that have passed all filters.
|
|
required: true
|
|
example: fusions.tsv
|
|
```
|
|
|
|
**Note:** Preferably, outputs should be files rather than directories.
|
|
|
|
### Other Arguments
|
|
|
|
Add all other arguments with these exceptions:
|
|
* Arguments related to CPU and memory requirements are handled separately
|
|
* Version (`-v`, `--version`) or help (`-h`, `--help`) arguments should be excluded
|
|
* If the help file lists defaults, add them to description rather than as defaults
|
|
|
|
**Boolean handling:**
|
|
* Prefer using `boolean_true` over `boolean_false` to avoid confusion in Nextflow workflows
|
|
|
|
### Description Formatting Guidelines
|
|
|
|
Argument descriptions should always be written in **markdown format** as they are used downstream for documentation rendering. Here are best practices:
|
|
|
|
**Good markdown formatting examples:**
|
|
|
|
```yaml
|
|
description: |
|
|
Input FASTQ file containing reads. Supports compressed files (`.gz`, `.bz2`).
|
|
|
|
**Supported formats:**
|
|
- FASTQ (`.fastq`, `.fq`)
|
|
- Compressed FASTQ (`.fastq.gz`, `.fq.gz`)
|
|
|
|
See the [FASTQ format specification](https://en.wikipedia.org/wiki/FASTQ_format) for details.
|
|
```
|
|
|
|
```yaml
|
|
description: |
|
|
Maximum number of mismatches allowed during alignment.
|
|
|
|
**Default behavior:**
|
|
- For reads ≤50bp: 2 mismatches
|
|
- For reads >50bp: 3 mismatches
|
|
|
|
Set to `0` for exact matches only.
|
|
```
|
|
|
|
**Formatting improvements you can make:**
|
|
- Add code formatting for file extensions, parameters, and values
|
|
- Use lists and bullet points for multiple options
|
|
- Add emphasis with **bold** or *italic* text
|
|
- Include links to external documentation
|
|
- Structure complex descriptions with headers
|
|
- Use code blocks for examples
|
|
|
|
**Original tool help vs. improved description:**
|
|
|
|
```
|
|
# Original: "Input file in BAM format"
|
|
# Improved:
|
|
description: |
|
|
Input file in BAM format containing aligned sequences.
|
|
|
|
The file must be coordinate-sorted and indexed. Use `samtools sort`
|
|
and `samtools index` if needed.
|
|
```
|
|
|
|
## Meta Variables
|
|
|
|
**Important:** Never add `threads`, `cores`, `cpus`, or `memory` as regular parameters. Instead, use Viash's built-in meta variables.
|
|
|
|
### Available Meta Variables
|
|
|
|
Viash provides several meta variables that are automatically available in your scripts:
|
|
|
|
- **`meta_cpus`** (integer): Maximum number of logical CPUs the component can use
|
|
- **`meta_memory_*`** (long): Maximum memory allocation in various units:
|
|
- `meta_memory_b`, `meta_memory_kb`, `meta_memory_mb`
|
|
- `meta_memory_gb`, `meta_memory_tb`, `meta_memory_pb`
|
|
- `meta_memory_kib`, `meta_memory_mib`, `meta_memory_gib`, `meta_memory_tib`, `meta_memory_pib`
|
|
- **`meta_temp_dir`** (string): Temporary directory for the component
|
|
- **`meta_resources_dir`** (string): Path to component resources
|
|
- **`meta_name`** (string): Component name (useful for logging)
|
|
- **`meta_executable`** (string): Path to the wrapped executable
|
|
- **`meta_config`** (string): Path to the processed config YAML
|
|
|
|
### Usage Example
|
|
|
|
```bash
|
|
# Use meta_cpus instead of a threads parameter
|
|
./tool --threads ${meta_cpus:-1} --input $par_input --output $par_output
|
|
|
|
# Use meta_memory_gb for memory-intensive tools
|
|
./tool --memory ${meta_memory_gb:-8}G --input $par_input --output $par_output
|
|
```
|
|
|
|
### Setting Meta Values
|
|
|
|
```bash
|
|
# When running with viash
|
|
viash run config.vsh.yaml --cpus 8 --memory 16GB -- --input file.txt
|
|
|
|
# When using built executables
|
|
./my_tool ---cpus 8 ---memory 16GB --input file.txt
|
|
```
|
|
|
|
For more details, see the [Viash Variables Documentation](https://viash.io/guide/component/variables.html).
|
|
|
|
## Implementation
|
|
|
|
See [Script Development Guide](SCRIPT_DEVELOPMENT.md) for detailed script writing guidelines.
|
|
|
|
## Testing
|
|
|
|
See [Testing Guide](TESTING.md) for comprehensive testing practices.
|
|
|
|
## Documentation
|
|
|
|
### Version Documentation
|
|
|
|
Add version detection to the Docker engine setup:
|
|
|
|
```yaml
|
|
engines:
|
|
- type: docker
|
|
image: quay.io/biocontainers/xxx:2.5.4--he96a11b_6
|
|
setup:
|
|
- type: docker
|
|
run:
|
|
- xxx --version 2>&1 | head -1 | sed 's/.*version /xxx: /' > /var/software_versions.txt
|
|
```
|
|
|
|
**Common version extraction patterns:**
|
|
|
|
```bash
|
|
# For tools that output "Tool version X.Y.Z"
|
|
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt
|
|
|
|
# For tools that output just the version number
|
|
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt
|
|
|
|
# For tools with complex version output
|
|
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt
|
|
```
|