Build pipeline: viash-hub.biobox.main-tb4cv
Source commit: 7158daa5f6
Source message: Fix bases2fastq component, update to latest practices (#190)
* wip updates
* refactor component
* assume bases2fastq follows semver
* fix version command
* add entry to changelog
* move to minor changes
10 KiB
Script Development Guide
This guide covers best practices for writing runner scripts in biobox components.
Table of Contents
- Script Structure and Template
- Key Principles
- Real-World Example
- Advanced Patterns
- Common Pitfalls
- Testing Your Script
Script Structure and Template
All Viash component scripts follow a standard structure with best practices for error handling and parameter management.
Basic Template
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# unset flags
[[ "$par_option1" == "false" ]] && unset par_option1
[[ "$par_option2" == "false" ]] && unset par_option2
# Build command arguments array
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_option1:+--option1}
${par_option2:+--option2}
${meta_cpus:+--threads "$meta_cpus"}
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
)
# Execute command
xxx "${cmd_args[@]}"
Understanding the Viash Code Block
The ## VIASH START and ## VIASH END comments mark a special placeholder block where Viash injects runtime parameters and metadata when the component is executed.
At runtime, Viash replaces this placeholder with:
par_*variables containing argument values (e.g.,par_input,par_output)meta_*variables containing runtime metadata (e.g.,meta_name,meta_cpus,meta_temp_dir)
For debugging, you can put example code between these markers to test your script locally:
## VIASH START
par_input="test_input.txt"
par_output="test_output.txt"
par_verbose="true"
meta_cpus="4"
meta_memory_gb="8"
meta_temp_dir="/tmp"
## VIASH END
This allows you to run your script directly with bash script.sh during development.
Code Style Guidelines
Indentation
Use 2-space indentation consistently throughout your scripts:
# Correct - 2 spaces
unset_if_false=(
par_verbose
par_quiet
par_force
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_verbose:+--verbose}
)
# Incorrect - 4 spaces or tabs
unset_if_false=(
par_verbose
par_quiet
par_force
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
Why 2 spaces:
- Consistent with other biobox components
- Better readability in terminal and code editors
- Reduces line width for complex nested structures
- Standard practice in many shell script projects
Key Principles
1. Error Handling
Always use set -eo pipefail:
set -e: Exit immediately if a command exits with a non-zero statusset -o pipefail: Exit if any command in a pipeline fails
2. Array-Based Arguments
Preferred approach:
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_option:+--option "$par_option"}
)
xxx "${cmd_args[@]}"
Avoid repetitive appending:
# Don't do this
cmd_args+=("--input")
cmd_args+=("$par_input")
cmd_args+=("--output")
cmd_args+=("$par_output")
3. Conditional Parameter Inclusion
Use Bash parameter expansion for optional parameters:
# Include parameter only if variable is set and not empty
${meta_cpus:+--threads "$meta_cpus"}
# Include flag only if boolean is true (after unsetting false values)
${par_verbose:+--verbose}
4. Boolean Handling
Unset boolean parameters that are "false":
# Single parameter
[[ "$par_verbose" == "false" ]] && unset par_verbose
# For multiple parameters, you can use either approach:
# Option 1: Individual approach (recommended for 1-4 parameters)
[[ "$par_verbose" == "false" ]] && unset par_verbose
[[ "$par_quiet" == "false" ]] && unset par_quiet
[[ "$par_force" == "false" ]] && unset par_force
[[ "$par_recursive" == "false" ]] && unset par_recursive
# Option 2: Loop approach (recommended for 5+ parameters)
unset_if_false=(
par_verbose
par_quiet
par_force
par_recursive
par_follow_symlinks
par_ignore_case
par_preserve_permissions
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
When to use which approach:
- Individual approach: Recommended for 1-4 boolean parameters, clearer and more direct
- Loop approach: Recommended for many parameters (5+), reduces code duplication
The individual approach is preferred for fewer parameters because:
- Each parameter is explicit and easy to find
- No variable indirection complexity (
${!par}) - Simple to add/remove individual parameters
- More readable at a glance
5. Meta Variables Usage
Important: Never use par_threads, par_cores, par_cpus, or par_memory parameters. Use Viash's built-in meta variables instead.
Available meta variables:
meta_cpus: Number of CPU cores availablemeta_memory_*: Memory limits in various units (b, kb, mb, gb, tb, pb, kib, mib, gib, tib, pib)meta_temp_dir: Temporary directory for the componentmeta_resources_dir: Path to component resources
Examples:
# CPU cores with fallback
${meta_cpus:+--threads "$meta_cpus"}
${meta_cpus:+--cores "${meta_cpus:-1}"}
# Memory with fallback and unit conversion
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
${meta_memory_mb:+--max-memory "${meta_memory_mb:-1024}M"}
# Temporary directory
--tmp-dir "${meta_temp_dir:-/tmp}"
Why use meta variables:
- Integrates seamlessly with workflow systems like Nextflow
- Automatically managed by Viash runtime
- Consistent across all components
- Prevents parameter duplication and conflicts
For complete details, see Viash Variables Documentation.
6. Proper Quoting
Always quote variables that might contain spaces or special characters:
# Correct
--input "$par_input"
--output "$par_output"
# For special characters, use @Q expansion
--pattern "${par_pattern@Q}"
7. Multiple Parameter Values
When using arguments with multiple: true in your Viash configuration, values are passed as semicolon-separated strings that need to be split into bash arrays.
In script.sh - Converting to Arrays
# Convert semicolon-separated values to bash array
IFS=';' read -ra files_array <<< "$par_files"
# Example: Use in command arguments
cmd_args=(
-i "$par_input"
-files "${files_array[@]}"
-o "$par_output"
)
# Execute command
bedtools annotate "${cmd_args[@]}"
In test.sh - Passing Multiple Values
When testing components with multiple: true parameters, you can use either format:
# Method 1: Repeated flags (recommended for readability)
"$meta_executable" \
--input "$meta_temp_dir/query.bed" \
--files "$meta_temp_dir/db1.bed" \
--files "$meta_temp_dir/db2.bed" \
--output "$meta_temp_dir/result.bed"
# Method 2: Semicolon-separated values
"$meta_executable" \
--input "$meta_temp_dir/query.bed" \
--files "$meta_temp_dir/db1.bed;$meta_temp_dir/db2.bed" \
--output "$meta_temp_dir/result.bed"
Both methods work identically - Viash automatically converts repeated flags to semicolon-separated strings internally.
Complete Example
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# Convert semicolon-separated files to array
IFS=';' read -ra files_array <<< "$par_files"
# Convert semicolon-separated names to array if provided
if [[ -n "${par_names}" ]]; then
IFS=';' read -ra names_array <<< "$par_names"
fi
# Build command arguments array
cmd_args=(
-i "$par_input"
${par_names:+-names "${names_array[@]}"}
-files "${files_array[@]}"
)
# Execute command
bedtools annotate "${cmd_args[@]}" > "$par_output"
Real-World Example
Here's an example from the bowtie2_build component:
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# unset flags
[[ "$par_large_index" == "false" ]] && unset par_large_index
[[ "$par_noauto" == "false" ]] && unset par_noauto
[[ "$par_packed" == "false" ]] && unset par_packed
# Create output directory
mkdir -p "$par_output"
# Determine index basename
if [ -n "$par_index_name" ]; then
index_basename="$par_index_name"
else
index_basename=$(basename "$par_input" .fasta)
fi
# Build command arguments
cmd_args=(
${par_fasta:+-f}
${par_cmdline:+-c}
${par_large_index:+--large-index}
${par_noauto:+-a}
${par_packed:+-p}
${par_bmax:+--bmax "$par_bmax"}
${par_offrate:+-o "$par_offrate"}
"$par_input"
"$par_output/$index_basename"
)
# Execute bowtie2-build
bowtie2-build "${cmd_args[@]}"
Advanced Patterns
Multiple Input Handling
If your tool accepts multiple inputs with custom separators:
# Convert Viash's semicolon separator to comma
par_disable_filters=$(echo "$par_disable_filters" | tr ';' ',')
cmd_args=(
--disable-filters "$par_disable_filters"
)
Complex File Handling
# Ensure output directory exists
mkdir -p "$(dirname "$par_output")"
# Handle relative paths
input_path=$(realpath "$par_input")
output_path=$(realpath "$par_output")
Resource Management
# Use available resources
cmd_args=(
${meta_cpus:+--threads "$meta_cpus"}
${meta_memory_mb:+--memory "${meta_memory_mb}M"}
)
Common Pitfalls
1. Unquoted Variables
# Wrong - can break with spaces
cmd_args=(--input $par_input)
# Correct
cmd_args=(--input "$par_input")
2. Improper Boolean Handling
# Wrong - will include false booleans
cmd_args=(${par_verbose:+--verbose})
# Correct - unset false values first
[[ "$par_verbose" == "false" ]] && unset par_verbose
cmd_args=(${par_verbose:+--verbose})
3. Array Expansion
# Wrong - treats array as single string
tool $cmd_args
# Correct - expands array elements
tool "${cmd_args[@]}"
Testing Your Script
Always test your script with:
- Empty/missing optional parameters
- Parameters with spaces
- Boolean true/false values
- Edge cases specific to your tool
See Testing Guide for extensive test best practices.