Files
biobox/docs/SCRIPT_DEVELOPMENT.md
CI 04a5851ff8 Build branch biobox/main with version main to biobox on branch main (7158daa)
Build pipeline: viash-hub.biobox.main-tb4cv

Source commit: 7158daa5f6

Source message: Fix bases2fastq component, update to latest practices (#190)

* wip updates

* refactor component

* assume bases2fastq follows semver

* fix version command

* add entry to changelog

* move to minor changes
2025-09-01 11:04:56 +00:00

10 KiB

Script Development Guide

This guide covers best practices for writing runner scripts in biobox components.

Table of Contents

Script Structure and Template

All Viash component scripts follow a standard structure with best practices for error handling and parameter management.

Basic Template

#!/bin/bash

## VIASH START
## VIASH END

set -eo pipefail

# unset flags
[[ "$par_option1" == "false" ]] && unset par_option1
[[ "$par_option2" == "false" ]] && unset par_option2

# Build command arguments array
cmd_args=(
    --input "$par_input"
    --output "$par_output"
    ${par_option1:+--option1}
    ${par_option2:+--option2}
    ${meta_cpus:+--threads "$meta_cpus"}
    ${meta_memory_gb:+--memory "${meta_memory_gb}G"}
)

# Execute command
xxx "${cmd_args[@]}"

Understanding the Viash Code Block

The ## VIASH START and ## VIASH END comments mark a special placeholder block where Viash injects runtime parameters and metadata when the component is executed.

At runtime, Viash replaces this placeholder with:

  • par_* variables containing argument values (e.g., par_input, par_output)
  • meta_* variables containing runtime metadata (e.g., meta_name, meta_cpus, meta_temp_dir)

For debugging, you can put example code between these markers to test your script locally:

## VIASH START
par_input="test_input.txt"
par_output="test_output.txt"
par_verbose="true"
meta_cpus="4"
meta_memory_gb="8"
meta_temp_dir="/tmp"
## VIASH END

This allows you to run your script directly with bash script.sh during development.

Code Style Guidelines

Indentation

Use 2-space indentation consistently throughout your scripts:

# Correct - 2 spaces
unset_if_false=(
  par_verbose
  par_quiet
  par_force
)

for par in "${unset_if_false[@]}"; do
  test_val="${!par}"
  [[ "$test_val" == "false" ]] && unset $par
done

cmd_args=(
  --input "$par_input"
  --output "$par_output"
  ${par_verbose:+--verbose}
)
# Incorrect - 4 spaces or tabs
unset_if_false=(
    par_verbose
    par_quiet
    par_force
)

for par in "${unset_if_false[@]}"; do
    test_val="${!par}"
    [[ "$test_val" == "false" ]] && unset $par
done

Why 2 spaces:

  • Consistent with other biobox components
  • Better readability in terminal and code editors
  • Reduces line width for complex nested structures
  • Standard practice in many shell script projects

Key Principles

1. Error Handling

Always use set -eo pipefail:

  • set -e: Exit immediately if a command exits with a non-zero status
  • set -o pipefail: Exit if any command in a pipeline fails

2. Array-Based Arguments

Preferred approach:

cmd_args=(
    --input "$par_input"
    --output "$par_output"
    ${par_option:+--option "$par_option"}
)

xxx "${cmd_args[@]}"

Avoid repetitive appending:

# Don't do this
cmd_args+=("--input")
cmd_args+=("$par_input")
cmd_args+=("--output")
cmd_args+=("$par_output")

3. Conditional Parameter Inclusion

Use Bash parameter expansion for optional parameters:

# Include parameter only if variable is set and not empty
${meta_cpus:+--threads "$meta_cpus"}

# Include flag only if boolean is true (after unsetting false values)
${par_verbose:+--verbose}

4. Boolean Handling

Unset boolean parameters that are "false":

# Single parameter
[[ "$par_verbose" == "false" ]] && unset par_verbose

# For multiple parameters, you can use either approach:

# Option 1: Individual approach (recommended for 1-4 parameters)
[[ "$par_verbose" == "false" ]] && unset par_verbose
[[ "$par_quiet" == "false" ]] && unset par_quiet
[[ "$par_force" == "false" ]] && unset par_force
[[ "$par_recursive" == "false" ]] && unset par_recursive

# Option 2: Loop approach (recommended for 5+ parameters)
unset_if_false=(
    par_verbose
    par_quiet
    par_force
    par_recursive
    par_follow_symlinks
    par_ignore_case
    par_preserve_permissions
)

for par in "${unset_if_false[@]}"; do
    test_val="${!par}"
    [[ "$test_val" == "false" ]] && unset $par
done

When to use which approach:

  • Individual approach: Recommended for 1-4 boolean parameters, clearer and more direct
  • Loop approach: Recommended for many parameters (5+), reduces code duplication

The individual approach is preferred for fewer parameters because:

  • Each parameter is explicit and easy to find
  • No variable indirection complexity (${!par})
  • Simple to add/remove individual parameters
  • More readable at a glance

5. Meta Variables Usage

Important: Never use par_threads, par_cores, par_cpus, or par_memory parameters. Use Viash's built-in meta variables instead.

Available meta variables:

  • meta_cpus: Number of CPU cores available
  • meta_memory_*: Memory limits in various units (b, kb, mb, gb, tb, pb, kib, mib, gib, tib, pib)
  • meta_temp_dir: Temporary directory for the component
  • meta_resources_dir: Path to component resources

Examples:

# CPU cores with fallback
${meta_cpus:+--threads "$meta_cpus"}
${meta_cpus:+--cores "${meta_cpus:-1}"}

# Memory with fallback and unit conversion
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
${meta_memory_mb:+--max-memory "${meta_memory_mb:-1024}M"}

# Temporary directory
--tmp-dir "${meta_temp_dir:-/tmp}"

Why use meta variables:

  • Integrates seamlessly with workflow systems like Nextflow
  • Automatically managed by Viash runtime
  • Consistent across all components
  • Prevents parameter duplication and conflicts

For complete details, see Viash Variables Documentation.

6. Proper Quoting

Always quote variables that might contain spaces or special characters:

# Correct
--input "$par_input"
--output "$par_output"

# For special characters, use @Q expansion
--pattern "${par_pattern@Q}"

7. Multiple Parameter Values

When using arguments with multiple: true in your Viash configuration, values are passed as semicolon-separated strings that need to be split into bash arrays.

In script.sh - Converting to Arrays

# Convert semicolon-separated values to bash array
IFS=';' read -ra files_array <<< "$par_files"

# Example: Use in command arguments
cmd_args=(
    -i "$par_input"
    -files "${files_array[@]}"
    -o "$par_output"
)

# Execute command
bedtools annotate "${cmd_args[@]}"

In test.sh - Passing Multiple Values

When testing components with multiple: true parameters, you can use either format:

# Method 1: Repeated flags (recommended for readability)
"$meta_executable" \
    --input "$meta_temp_dir/query.bed" \
    --files "$meta_temp_dir/db1.bed" \
    --files "$meta_temp_dir/db2.bed" \
    --output "$meta_temp_dir/result.bed"

# Method 2: Semicolon-separated values  
"$meta_executable" \
    --input "$meta_temp_dir/query.bed" \
    --files "$meta_temp_dir/db1.bed;$meta_temp_dir/db2.bed" \
    --output "$meta_temp_dir/result.bed"

Both methods work identically - Viash automatically converts repeated flags to semicolon-separated strings internally.

Complete Example

#!/bin/bash

## VIASH START  
## VIASH END

set -eo pipefail

# Convert semicolon-separated files to array
IFS=';' read -ra files_array <<< "$par_files"

# Convert semicolon-separated names to array if provided
if [[ -n "${par_names}" ]]; then
    IFS=';' read -ra names_array <<< "$par_names"
fi

# Build command arguments array
cmd_args=(
    -i "$par_input"
    ${par_names:+-names "${names_array[@]}"}
    -files "${files_array[@]}"
)

# Execute command
bedtools annotate "${cmd_args[@]}" > "$par_output"

Real-World Example

Here's an example from the bowtie2_build component:

#!/bin/bash

## VIASH START
## VIASH END

set -eo pipefail

# unset flags
[[ "$par_large_index" == "false" ]] && unset par_large_index
[[ "$par_noauto" == "false" ]] && unset par_noauto
[[ "$par_packed" == "false" ]] && unset par_packed

# Create output directory
mkdir -p "$par_output"

# Determine index basename
if [ -n "$par_index_name" ]; then
    index_basename="$par_index_name"
else
    index_basename=$(basename "$par_input" .fasta)
fi

# Build command arguments
cmd_args=(
    ${par_fasta:+-f}
    ${par_cmdline:+-c}
    ${par_large_index:+--large-index}
    ${par_noauto:+-a}
    ${par_packed:+-p}
    ${par_bmax:+--bmax "$par_bmax"}
    ${par_offrate:+-o "$par_offrate"}
    "$par_input"
    "$par_output/$index_basename"
)

# Execute bowtie2-build
bowtie2-build "${cmd_args[@]}"

Advanced Patterns

Multiple Input Handling

If your tool accepts multiple inputs with custom separators:

# Convert Viash's semicolon separator to comma
par_disable_filters=$(echo "$par_disable_filters" | tr ';' ',')

cmd_args=(
    --disable-filters "$par_disable_filters"
)

Complex File Handling

# Ensure output directory exists
mkdir -p "$(dirname "$par_output")"

# Handle relative paths
input_path=$(realpath "$par_input")
output_path=$(realpath "$par_output")

Resource Management

# Use available resources
cmd_args=(
    ${meta_cpus:+--threads "$meta_cpus"}
    ${meta_memory_mb:+--memory "${meta_memory_mb}M"}
)

Common Pitfalls

1. Unquoted Variables

# Wrong - can break with spaces
cmd_args=(--input $par_input)

# Correct
cmd_args=(--input "$par_input")

2. Improper Boolean Handling

# Wrong - will include false booleans
cmd_args=(${par_verbose:+--verbose})

# Correct - unset false values first
[[ "$par_verbose" == "false" ]] && unset par_verbose
cmd_args=(${par_verbose:+--verbose})

3. Array Expansion

# Wrong - treats array as single string
tool $cmd_args

# Correct - expands array elements
tool "${cmd_args[@]}"

Testing Your Script

Always test your script with:

  • Empty/missing optional parameters
  • Parameters with spaces
  • Boolean true/false values
  • Edge cases specific to your tool

See Testing Guide for extensive test best practices.