Files
biobox/docs/COMPONENT_DEVELOPMENT.md
CI 04a5851ff8 Build branch biobox/main with version main to biobox on branch main (7158daa)
Build pipeline: viash-hub.biobox.main-tb4cv

Source commit: 7158daa5f6

Source message: Fix bases2fastq component, update to latest practices (#190)

* wip updates

* refactor component

* assume bases2fastq follows semver

* fix version command

* add entry to changelog

* move to minor changes
2025-09-01 11:04:56 +00:00

8.2 KiB

Component Development Guide

This guide provides detailed step-by-step instructions for creating a new component in biobox.

Table of Contents

Initial Setup

Step 1: Find a component to contribute

  • Find a tool to contribute to this repo.
  • Check whether it is already in the Project board.
  • Check whether there is a corresponding Snakemake wrapper or nf-core module which we can use as inspiration.
  • Create an issue to show that you are working on this component.

Step 2: Find a suitable container

Google biocontainer <name of component> and find the container that is most suitable. Typically the link will be https://quay.io/repository/biocontainers/xxx?tab=tags.

If no such container is found, you can create a custom container in a later step.

Step 3: Create help file

To help develop the component, we store the --help output of the tool in a file at src/xxx/help.txt.

cat <<EOF > src/xxx/help.txt
\```sh
xxx --help
\```
EOF

docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt

Notes:

  • This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
  • Some tools might not have a --help argument but instead have a -h argument.

Configuration

Metadata Setup

Fill in the relevant metadata fields in the config:

name: bowtie2_build
namespace: bowtie2
description: |
  Build Bowtie2 index files from reference sequences.
keywords: [Alignment, Indexing]
links:
  homepage: https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
  documentation: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
  repository: https://github.com/BenLangmead/bowtie2
references:
  doi: 10.1038/nmeth.1923
license: GPL-3.0
requirements:
  commands: [bowtie2-build]
authors:
  - __merge__: /src/_authors/robrecht_cannoodt.yaml
    roles: [author, maintainer]

Requirements Specification

The requirements section documents the dependencies needed by your component:

requirements:
  commands: [bowtie2-build, bowtie2]

Why specify commands:

  • Documents which executables the component expects
  • Enables validation that the Docker container has required tools
  • Helps users understand dependencies
  • Facilitates automated testing and CI/CD

Arguments

Input Arguments

By looking at the help file, add input arguments to the config file:

argument_groups:
  - name: Inputs
    arguments:
    - name: --bam
      alternatives: -x
      type: file
      description: |
        File in SAM/BAM/CRAM format with main alignments as generated by STAR
        (`Aligned.out.sam`). Arriba extracts candidate reads from this file.
      required: true
      example: Aligned.out.bam

Key principles:

  • Argument names should be formatted in --snake_case
  • Input arguments can have multiple: true to allow multiple files
  • Descriptions must be formatted in markdown - they will be used downstream for rendering documentation
  • You can make minor changes to the formatting of arguments to improve clarity and better utilize markdown structure
  • Use markdown features like code blocks, lists, emphasis, and links to enhance readability

Output Arguments

Add output arguments based on the tool's help:

argument_groups:
  - name: Outputs
    arguments:
      - name: --fusions
        alternatives: -o
        type: file
        direction: output
        description: |
          Output file with fusions that have passed all filters.
        required: true
        example: fusions.tsv

Note: Preferably, outputs should be files rather than directories.

Other Arguments

Add all other arguments with these exceptions:

  • Arguments related to CPU and memory requirements are handled separately
  • Version (-v, --version) or help (-h, --help) arguments should be excluded
  • If the help file lists defaults, add them to description rather than as defaults

Boolean handling:

  • Prefer using boolean_true over boolean_false to avoid confusion in Nextflow workflows

Description Formatting Guidelines

Argument descriptions should always be written in markdown format as they are used downstream for documentation rendering. Here are best practices:

Good markdown formatting examples:

description: |
  Input FASTQ file containing reads. Supports compressed files (`.gz`, `.bz2`).
  
  **Supported formats:**
  - FASTQ (`.fastq`, `.fq`)
  - Compressed FASTQ (`.fastq.gz`, `.fq.gz`)
  
  See the [FASTQ format specification](https://en.wikipedia.org/wiki/FASTQ_format) for details.
description: |
  Maximum number of mismatches allowed during alignment.
  
  **Default behavior:**
  - For reads ≤50bp: 2 mismatches
  - For reads >50bp: 3 mismatches
  
  Set to `0` for exact matches only.

Formatting improvements you can make:

  • Add code formatting for file extensions, parameters, and values
  • Use lists and bullet points for multiple options
  • Add emphasis with bold or italic text
  • Include links to external documentation
  • Structure complex descriptions with headers
  • Use code blocks for examples

Original tool help vs. improved description:

# Original: "Input file in BAM format"
# Improved:
description: |
  Input file in BAM format containing aligned sequences.
  
  The file must be coordinate-sorted and indexed. Use `samtools sort` 
  and `samtools index` if needed.

Meta Variables

Important: Never add threads, cores, cpus, or memory as regular parameters. Instead, use Viash's built-in meta variables.

Available Meta Variables

Viash provides several meta variables that are automatically available in your scripts:

  • meta_cpus (integer): Maximum number of logical CPUs the component can use
  • meta_memory_* (long): Maximum memory allocation in various units:
    • meta_memory_b, meta_memory_kb, meta_memory_mb
    • meta_memory_gb, meta_memory_tb, meta_memory_pb
    • meta_memory_kib, meta_memory_mib, meta_memory_gib, meta_memory_tib, meta_memory_pib
  • meta_temp_dir (string): Temporary directory for the component
  • meta_resources_dir (string): Path to component resources
  • meta_name (string): Component name (useful for logging)
  • meta_executable (string): Path to the wrapped executable
  • meta_config (string): Path to the processed config YAML

Usage Example

# Use meta_cpus instead of a threads parameter
./tool --threads ${meta_cpus:-1} --input $par_input --output $par_output

# Use meta_memory_gb for memory-intensive tools
./tool --memory ${meta_memory_gb:-8}G --input $par_input --output $par_output

Setting Meta Values

# When running with viash
viash run config.vsh.yaml --cpus 8 --memory 16GB -- --input file.txt

# When using built executables
./my_tool ---cpus 8 ---memory 16GB --input file.txt

For more details, see the Viash Variables Documentation.

Implementation

See Script Development Guide for detailed script writing guidelines.

Testing

See Testing Guide for comprehensive testing practices.

Documentation

Version Documentation

Add version detection to the Docker engine setup:

engines:
  - type: docker
    image: quay.io/biocontainers/xxx:2.5.4--he96a11b_6
    setup:
      - type: docker
        run:
          - xxx --version 2>&1 | head -1 | sed 's/.*version /xxx: /' > /var/software_versions.txt

Common version extraction patterns:

# For tools that output "Tool version X.Y.Z"
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt

# For tools that output just the version number
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt

# For tools with complex version output
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt