# Component Development Guide This guide provides detailed step-by-step instructions for creating a new component in biobox. ## Table of Contents - [Initial Setup](#initial-setup) - [Configuration](#configuration) - [Arguments](#arguments) - [Implementation](#implementation) - [Testing](#testing) - [Documentation](#documentation) ## Initial Setup ### Step 1: Find a component to contribute * Find a tool to contribute to this repo. * Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1). * Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration. * Create an issue to show that you are working on this component. ### Step 2: Find a suitable container Google `biocontainer ` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`. If no such container is found, you can create a custom container in a later step. ### Step 3: Create help file To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`. ```bash cat < src/xxx/help.txt \```sh xxx --help \``` EOF docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt ``` **Notes:** * This help file has no functional purpose, but it is useful for the developer to see the help output of the tool. * Some tools might not have a `--help` argument but instead have a `-h` argument. ## Configuration ### Metadata Setup Fill in the relevant metadata fields in the config: ```yaml name: bowtie2_build namespace: bowtie2 description: | Build Bowtie2 index files from reference sequences. keywords: [Alignment, Indexing] links: homepage: https://bowtie-bio.sourceforge.net/bowtie2/index.shtml documentation: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml repository: https://github.com/BenLangmead/bowtie2 references: doi: 10.1038/nmeth.1923 license: GPL-3.0 requirements: commands: [bowtie2-build] authors: - __merge__: /src/_authors/robrecht_cannoodt.yaml roles: [author, maintainer] ``` ### Requirements Specification The `requirements` section documents the dependencies needed by your component: ```yaml requirements: commands: [bowtie2-build, bowtie2] ``` **Why specify commands:** - Documents which executables the component expects - Enables validation that the Docker container has required tools - Helps users understand dependencies - Facilitates automated testing and CI/CD ## Arguments ### Input Arguments By looking at the help file, add input arguments to the config file: ```yaml argument_groups: - name: Inputs arguments: - name: --bam alternatives: -x type: file description: | File in SAM/BAM/CRAM format with main alignments as generated by STAR (`Aligned.out.sam`). Arriba extracts candidate reads from this file. required: true example: Aligned.out.bam ``` **Key principles:** * Argument names should be formatted in `--snake_case` * Input arguments can have `multiple: true` to allow multiple files * **Descriptions must be formatted in markdown** - they will be used downstream for rendering documentation * You can make minor changes to the formatting of arguments to improve clarity and better utilize markdown structure * Use markdown features like code blocks, lists, emphasis, and links to enhance readability ### Output Arguments Add output arguments based on the tool's help: ```yaml argument_groups: - name: Outputs arguments: - name: --fusions alternatives: -o type: file direction: output description: | Output file with fusions that have passed all filters. required: true example: fusions.tsv ``` **Note:** Preferably, outputs should be files rather than directories. ### Other Arguments Add all other arguments with these exceptions: * Arguments related to CPU and memory requirements are handled separately * Version (`-v`, `--version`) or help (`-h`, `--help`) arguments should be excluded * If the help file lists defaults, add them to description rather than as defaults **Boolean handling:** * Prefer using `boolean_true` over `boolean_false` to avoid confusion in Nextflow workflows ### Description Formatting Guidelines Argument descriptions should always be written in **markdown format** as they are used downstream for documentation rendering. Here are best practices: **Good markdown formatting examples:** ```yaml description: | Input FASTQ file containing reads. Supports compressed files (`.gz`, `.bz2`). **Supported formats:** - FASTQ (`.fastq`, `.fq`) - Compressed FASTQ (`.fastq.gz`, `.fq.gz`) See the [FASTQ format specification](https://en.wikipedia.org/wiki/FASTQ_format) for details. ``` ```yaml description: | Maximum number of mismatches allowed during alignment. **Default behavior:** - For reads ≤50bp: 2 mismatches - For reads >50bp: 3 mismatches Set to `0` for exact matches only. ``` **Formatting improvements you can make:** - Add code formatting for file extensions, parameters, and values - Use lists and bullet points for multiple options - Add emphasis with **bold** or *italic* text - Include links to external documentation - Structure complex descriptions with headers - Use code blocks for examples **Original tool help vs. improved description:** ``` # Original: "Input file in BAM format" # Improved: description: | Input file in BAM format containing aligned sequences. The file must be coordinate-sorted and indexed. Use `samtools sort` and `samtools index` if needed. ``` ## Meta Variables **Important:** Never add `threads`, `cores`, `cpus`, or `memory` as regular parameters. Instead, use Viash's built-in meta variables. ### Available Meta Variables Viash provides several meta variables that are automatically available in your scripts: - **`meta_cpus`** (integer): Maximum number of logical CPUs the component can use - **`meta_memory_*`** (long): Maximum memory allocation in various units: - `meta_memory_b`, `meta_memory_kb`, `meta_memory_mb` - `meta_memory_gb`, `meta_memory_tb`, `meta_memory_pb` - `meta_memory_kib`, `meta_memory_mib`, `meta_memory_gib`, `meta_memory_tib`, `meta_memory_pib` - **`meta_temp_dir`** (string): Temporary directory for the component - **`meta_resources_dir`** (string): Path to component resources - **`meta_name`** (string): Component name (useful for logging) - **`meta_executable`** (string): Path to the wrapped executable - **`meta_config`** (string): Path to the processed config YAML ### Usage Example ```bash # Use meta_cpus instead of a threads parameter ./tool --threads ${meta_cpus:-1} --input $par_input --output $par_output # Use meta_memory_gb for memory-intensive tools ./tool --memory ${meta_memory_gb:-8}G --input $par_input --output $par_output ``` ### Setting Meta Values ```bash # When running with viash viash run config.vsh.yaml --cpus 8 --memory 16GB -- --input file.txt # When using built executables ./my_tool ---cpus 8 ---memory 16GB --input file.txt ``` For more details, see the [Viash Variables Documentation](https://viash.io/guide/component/variables.html). ## Implementation See [Script Development Guide](SCRIPT_DEVELOPMENT.md) for detailed script writing guidelines. ## Testing See [Testing Guide](TESTING.md) for comprehensive testing practices. ## Documentation ### Version Documentation Add version detection to the Docker engine setup: ```yaml engines: - type: docker image: quay.io/biocontainers/xxx:2.5.4--he96a11b_6 setup: - type: docker run: - xxx --version 2>&1 | head -1 | sed 's/.*version /xxx: /' > /var/software_versions.txt ``` **Common version extraction patterns:** ```bash # For tools that output "Tool version X.Y.Z" tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt # For tools that output just the version number echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt # For tools with complex version output tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt ```