Files
biobox/docs/COMPONENT_DEVELOPMENT.md
CI 04a5851ff8 Build branch biobox/main with version main to biobox on branch main (7158daa)
Build pipeline: viash-hub.biobox.main-tb4cv

Source commit: 7158daa5f6

Source message: Fix bases2fastq component, update to latest practices (#190)

* wip updates

* refactor component

* assume bases2fastq follows semver

* fix version command

* add entry to changelog

* move to minor changes
2025-09-01 11:04:56 +00:00

269 lines
8.2 KiB
Markdown

# Component Development Guide
This guide provides detailed step-by-step instructions for creating a new component in biobox.
## Table of Contents
- [Initial Setup](#initial-setup)
- [Configuration](#configuration)
- [Arguments](#arguments)
- [Implementation](#implementation)
- [Testing](#testing)
- [Documentation](#documentation)
## Initial Setup
### Step 1: Find a component to contribute
* Find a tool to contribute to this repo.
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
* Create an issue to show that you are working on this component.
### Step 2: Find a suitable container
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
If no such container is found, you can create a custom container in a later step.
### Step 3: Create help file
To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
```bash
cat <<EOF > src/xxx/help.txt
\```sh
xxx --help
\```
EOF
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
```
**Notes:**
* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
* Some tools might not have a `--help` argument but instead have a `-h` argument.
## Configuration
### Metadata Setup
Fill in the relevant metadata fields in the config:
```yaml
name: bowtie2_build
namespace: bowtie2
description: |
Build Bowtie2 index files from reference sequences.
keywords: [Alignment, Indexing]
links:
homepage: https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
documentation: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
repository: https://github.com/BenLangmead/bowtie2
references:
doi: 10.1038/nmeth.1923
license: GPL-3.0
requirements:
commands: [bowtie2-build]
authors:
- __merge__: /src/_authors/robrecht_cannoodt.yaml
roles: [author, maintainer]
```
### Requirements Specification
The `requirements` section documents the dependencies needed by your component:
```yaml
requirements:
commands: [bowtie2-build, bowtie2]
```
**Why specify commands:**
- Documents which executables the component expects
- Enables validation that the Docker container has required tools
- Helps users understand dependencies
- Facilitates automated testing and CI/CD
## Arguments
### Input Arguments
By looking at the help file, add input arguments to the config file:
```yaml
argument_groups:
- name: Inputs
arguments:
- name: --bam
alternatives: -x
type: file
description: |
File in SAM/BAM/CRAM format with main alignments as generated by STAR
(`Aligned.out.sam`). Arriba extracts candidate reads from this file.
required: true
example: Aligned.out.bam
```
**Key principles:**
* Argument names should be formatted in `--snake_case`
* Input arguments can have `multiple: true` to allow multiple files
* **Descriptions must be formatted in markdown** - they will be used downstream for rendering documentation
* You can make minor changes to the formatting of arguments to improve clarity and better utilize markdown structure
* Use markdown features like code blocks, lists, emphasis, and links to enhance readability
### Output Arguments
Add output arguments based on the tool's help:
```yaml
argument_groups:
- name: Outputs
arguments:
- name: --fusions
alternatives: -o
type: file
direction: output
description: |
Output file with fusions that have passed all filters.
required: true
example: fusions.tsv
```
**Note:** Preferably, outputs should be files rather than directories.
### Other Arguments
Add all other arguments with these exceptions:
* Arguments related to CPU and memory requirements are handled separately
* Version (`-v`, `--version`) or help (`-h`, `--help`) arguments should be excluded
* If the help file lists defaults, add them to description rather than as defaults
**Boolean handling:**
* Prefer using `boolean_true` over `boolean_false` to avoid confusion in Nextflow workflows
### Description Formatting Guidelines
Argument descriptions should always be written in **markdown format** as they are used downstream for documentation rendering. Here are best practices:
**Good markdown formatting examples:**
```yaml
description: |
Input FASTQ file containing reads. Supports compressed files (`.gz`, `.bz2`).
**Supported formats:**
- FASTQ (`.fastq`, `.fq`)
- Compressed FASTQ (`.fastq.gz`, `.fq.gz`)
See the [FASTQ format specification](https://en.wikipedia.org/wiki/FASTQ_format) for details.
```
```yaml
description: |
Maximum number of mismatches allowed during alignment.
**Default behavior:**
- For reads ≤50bp: 2 mismatches
- For reads >50bp: 3 mismatches
Set to `0` for exact matches only.
```
**Formatting improvements you can make:**
- Add code formatting for file extensions, parameters, and values
- Use lists and bullet points for multiple options
- Add emphasis with **bold** or *italic* text
- Include links to external documentation
- Structure complex descriptions with headers
- Use code blocks for examples
**Original tool help vs. improved description:**
```
# Original: "Input file in BAM format"
# Improved:
description: |
Input file in BAM format containing aligned sequences.
The file must be coordinate-sorted and indexed. Use `samtools sort`
and `samtools index` if needed.
```
## Meta Variables
**Important:** Never add `threads`, `cores`, `cpus`, or `memory` as regular parameters. Instead, use Viash's built-in meta variables.
### Available Meta Variables
Viash provides several meta variables that are automatically available in your scripts:
- **`meta_cpus`** (integer): Maximum number of logical CPUs the component can use
- **`meta_memory_*`** (long): Maximum memory allocation in various units:
- `meta_memory_b`, `meta_memory_kb`, `meta_memory_mb`
- `meta_memory_gb`, `meta_memory_tb`, `meta_memory_pb`
- `meta_memory_kib`, `meta_memory_mib`, `meta_memory_gib`, `meta_memory_tib`, `meta_memory_pib`
- **`meta_temp_dir`** (string): Temporary directory for the component
- **`meta_resources_dir`** (string): Path to component resources
- **`meta_name`** (string): Component name (useful for logging)
- **`meta_executable`** (string): Path to the wrapped executable
- **`meta_config`** (string): Path to the processed config YAML
### Usage Example
```bash
# Use meta_cpus instead of a threads parameter
./tool --threads ${meta_cpus:-1} --input $par_input --output $par_output
# Use meta_memory_gb for memory-intensive tools
./tool --memory ${meta_memory_gb:-8}G --input $par_input --output $par_output
```
### Setting Meta Values
```bash
# When running with viash
viash run config.vsh.yaml --cpus 8 --memory 16GB -- --input file.txt
# When using built executables
./my_tool ---cpus 8 ---memory 16GB --input file.txt
```
For more details, see the [Viash Variables Documentation](https://viash.io/guide/component/variables.html).
## Implementation
See [Script Development Guide](SCRIPT_DEVELOPMENT.md) for detailed script writing guidelines.
## Testing
See [Testing Guide](TESTING.md) for comprehensive testing practices.
## Documentation
### Version Documentation
Add version detection to the Docker engine setup:
```yaml
engines:
- type: docker
image: quay.io/biocontainers/xxx:2.5.4--he96a11b_6
setup:
- type: docker
run:
- xxx --version 2>&1 | head -1 | sed 's/.*version /xxx: /' > /var/software_versions.txt
```
**Common version extraction patterns:**
```bash
# For tools that output "Tool version X.Y.Z"
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt
# For tools that output just the version number
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt
# For tools with complex version output
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt
```