Build pipeline: viash-hub.biobox.v0.4.x-58lg9
Source commit: 736f18e988
Source message: Merge remote-tracking branch 'origin/main' into v0.4.x
8.2 KiB
Component Development Guide
This guide provides detailed step-by-step instructions for creating a new component in biobox.
Table of Contents
Initial Setup
Step 1: Find a component to contribute
- Find a tool to contribute to this repo.
- Check whether it is already in the Project board.
- Check whether there is a corresponding Snakemake wrapper or nf-core module which we can use as inspiration.
- Create an issue to show that you are working on this component.
Step 2: Find a suitable container
Google biocontainer <name of component> and find the container that is most suitable. Typically the link will be https://quay.io/repository/biocontainers/xxx?tab=tags.
If no such container is found, you can create a custom container in a later step.
Step 3: Create help file
To help develop the component, we store the --help output of the tool in a file at src/xxx/help.txt.
cat <<EOF > src/xxx/help.txt
\```sh
xxx --help
\```
EOF
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
Notes:
- This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
- Some tools might not have a
--helpargument but instead have a-hargument.
Configuration
Metadata Setup
Fill in the relevant metadata fields in the config:
name: bowtie2_build
namespace: bowtie2
description: |
Build Bowtie2 index files from reference sequences.
keywords: [Alignment, Indexing]
links:
homepage: https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
documentation: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
repository: https://github.com/BenLangmead/bowtie2
references:
doi: 10.1038/nmeth.1923
license: GPL-3.0
requirements:
commands: [bowtie2-build]
authors:
- __merge__: /src/_authors/robrecht_cannoodt.yaml
roles: [author, maintainer]
Requirements Specification
The requirements section documents the dependencies needed by your component:
requirements:
commands: [bowtie2-build, bowtie2]
Why specify commands:
- Documents which executables the component expects
- Enables validation that the Docker container has required tools
- Helps users understand dependencies
- Facilitates automated testing and CI/CD
Arguments
Input Arguments
By looking at the help file, add input arguments to the config file:
argument_groups:
- name: Inputs
arguments:
- name: --bam
alternatives: -x
type: file
description: |
File in SAM/BAM/CRAM format with main alignments as generated by STAR
(`Aligned.out.sam`). Arriba extracts candidate reads from this file.
required: true
example: Aligned.out.bam
Key principles:
- Argument names should be formatted in
--snake_case - Input arguments can have
multiple: trueto allow multiple files - Descriptions must be formatted in markdown - they will be used downstream for rendering documentation
- You can make minor changes to the formatting of arguments to improve clarity and better utilize markdown structure
- Use markdown features like code blocks, lists, emphasis, and links to enhance readability
Output Arguments
Add output arguments based on the tool's help:
argument_groups:
- name: Outputs
arguments:
- name: --fusions
alternatives: -o
type: file
direction: output
description: |
Output file with fusions that have passed all filters.
required: true
example: fusions.tsv
Note: Preferably, outputs should be files rather than directories.
Other Arguments
Add all other arguments with these exceptions:
- Arguments related to CPU and memory requirements are handled separately
- Version (
-v,--version) or help (-h,--help) arguments should be excluded - If the help file lists defaults, add them to description rather than as defaults
Boolean handling:
- Prefer using
boolean_trueoverboolean_falseto avoid confusion in Nextflow workflows
Description Formatting Guidelines
Argument descriptions should always be written in markdown format as they are used downstream for documentation rendering. Here are best practices:
Good markdown formatting examples:
description: |
Input FASTQ file containing reads. Supports compressed files (`.gz`, `.bz2`).
**Supported formats:**
- FASTQ (`.fastq`, `.fq`)
- Compressed FASTQ (`.fastq.gz`, `.fq.gz`)
See the [FASTQ format specification](https://en.wikipedia.org/wiki/FASTQ_format) for details.
description: |
Maximum number of mismatches allowed during alignment.
**Default behavior:**
- For reads ≤50bp: 2 mismatches
- For reads >50bp: 3 mismatches
Set to `0` for exact matches only.
Formatting improvements you can make:
- Add code formatting for file extensions, parameters, and values
- Use lists and bullet points for multiple options
- Add emphasis with bold or italic text
- Include links to external documentation
- Structure complex descriptions with headers
- Use code blocks for examples
Original tool help vs. improved description:
# Original: "Input file in BAM format"
# Improved:
description: |
Input file in BAM format containing aligned sequences.
The file must be coordinate-sorted and indexed. Use `samtools sort`
and `samtools index` if needed.
Meta Variables
Important: Never add threads, cores, cpus, or memory as regular parameters. Instead, use Viash's built-in meta variables.
Available Meta Variables
Viash provides several meta variables that are automatically available in your scripts:
meta_cpus(integer): Maximum number of logical CPUs the component can usemeta_memory_*(long): Maximum memory allocation in various units:meta_memory_b,meta_memory_kb,meta_memory_mbmeta_memory_gb,meta_memory_tb,meta_memory_pbmeta_memory_kib,meta_memory_mib,meta_memory_gib,meta_memory_tib,meta_memory_pib
meta_temp_dir(string): Temporary directory for the componentmeta_resources_dir(string): Path to component resourcesmeta_name(string): Component name (useful for logging)meta_executable(string): Path to the wrapped executablemeta_config(string): Path to the processed config YAML
Usage Example
# Use meta_cpus instead of a threads parameter
./tool --threads ${meta_cpus:-1} --input $par_input --output $par_output
# Use meta_memory_gb for memory-intensive tools
./tool --memory ${meta_memory_gb:-8}G --input $par_input --output $par_output
Setting Meta Values
# When running with viash
viash run config.vsh.yaml --cpus 8 --memory 16GB -- --input file.txt
# When using built executables
./my_tool ---cpus 8 ---memory 16GB --input file.txt
For more details, see the Viash Variables Documentation.
Implementation
See Script Development Guide for detailed script writing guidelines.
Testing
See Testing Guide for comprehensive testing practices.
Documentation
Version Documentation
Add version detection to the Docker engine setup:
engines:
- type: docker
image: quay.io/biocontainers/xxx:2.5.4--he96a11b_6
setup:
- type: docker
run:
- xxx --version 2>&1 | head -1 | sed 's/.*version /xxx: /' > /var/software_versions.txt
Common version extraction patterns:
# For tools that output "Tool version X.Y.Z"
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt
# For tools that output just the version number
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt
# For tools with complex version output
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt