# Script Development Guide This guide covers best practices for writing runner scripts in biobox components. ## Table of Contents - [Script Structure and Template](#script-structure-and-template) - [Key Principles](#key-principles) - [Real-World Example](#real-world-example) - [Advanced Patterns](#advanced-patterns) - [Common Pitfalls](#common-pitfalls) - [Testing Your Script](#testing-your-script) ## Script Structure and Template All Viash component scripts follow a standard structure with best practices for error handling and parameter management. ### Basic Template ```bash #!/bin/bash ## VIASH START ## VIASH END set -eo pipefail # unset flags [[ "$par_option1" == "false" ]] && unset par_option1 [[ "$par_option2" == "false" ]] && unset par_option2 # Build command arguments array cmd_args=( --input "$par_input" --output "$par_output" ${par_option1:+--option1} ${par_option2:+--option2} ${meta_cpus:+--threads "$meta_cpus"} ${meta_memory_gb:+--memory "${meta_memory_gb}G"} ) # Execute command xxx "${cmd_args[@]}" ``` ### Understanding the Viash Code Block The `## VIASH START` and `## VIASH END` comments mark a special placeholder block where Viash injects runtime parameters and metadata when the component is executed. **At runtime**, Viash replaces this placeholder with: - `par_*` variables containing argument values (e.g., `par_input`, `par_output`) - `meta_*` variables containing runtime metadata (e.g., `meta_name`, `meta_cpus`, `meta_temp_dir`) **For debugging**, you can put example code between these markers to test your script locally: ```bash ## VIASH START par_input="test_input.txt" par_output="test_output.txt" par_verbose="true" meta_cpus="4" meta_memory_gb="8" meta_temp_dir="/tmp" ## VIASH END ``` This allows you to run your script directly with `bash script.sh` during development. ## Code Style Guidelines ### Indentation **Use 2-space indentation consistently throughout your scripts:** ```bash # Correct - 2 spaces unset_if_false=( par_verbose par_quiet par_force ) for par in "${unset_if_false[@]}"; do test_val="${!par}" [[ "$test_val" == "false" ]] && unset $par done cmd_args=( --input "$par_input" --output "$par_output" ${par_verbose:+--verbose} ) ``` ```bash # Incorrect - 4 spaces or tabs unset_if_false=( par_verbose par_quiet par_force ) for par in "${unset_if_false[@]}"; do test_val="${!par}" [[ "$test_val" == "false" ]] && unset $par done ``` **Why 2 spaces:** - Consistent with other biobox components - Better readability in terminal and code editors - Reduces line width for complex nested structures - Standard practice in many shell script projects ## Key Principles ### 1. Error Handling Always use `set -eo pipefail`: - `set -e`: Exit immediately if a command exits with a non-zero status - `set -o pipefail`: Exit if any command in a pipeline fails ### 2. Array-Based Arguments **Preferred approach:** ```bash cmd_args=( --input "$par_input" --output "$par_output" ${par_option:+--option "$par_option"} ) xxx "${cmd_args[@]}" ``` **Avoid repetitive appending:** ```bash # Don't do this cmd_args+=("--input") cmd_args+=("$par_input") cmd_args+=("--output") cmd_args+=("$par_output") ``` ### 3. Conditional Parameter Inclusion Use Bash parameter expansion for optional parameters: ```bash # Include parameter only if variable is set and not empty ${meta_cpus:+--threads "$meta_cpus"} # Include flag only if boolean is true (after unsetting false values) ${par_verbose:+--verbose} ``` ### 4. Boolean Handling Unset boolean parameters that are "false": ```bash # Single parameter [[ "$par_verbose" == "false" ]] && unset par_verbose # For multiple parameters, you can use either approach: # Option 1: Individual approach (recommended for 1-4 parameters) [[ "$par_verbose" == "false" ]] && unset par_verbose [[ "$par_quiet" == "false" ]] && unset par_quiet [[ "$par_force" == "false" ]] && unset par_force [[ "$par_recursive" == "false" ]] && unset par_recursive # Option 2: Loop approach (recommended for 5+ parameters) unset_if_false=( par_verbose par_quiet par_force par_recursive par_follow_symlinks par_ignore_case par_preserve_permissions ) for par in "${unset_if_false[@]}"; do test_val="${!par}" [[ "$test_val" == "false" ]] && unset $par done ``` **When to use which approach:** - **Individual approach**: Recommended for 1-4 boolean parameters, clearer and more direct - **Loop approach**: Recommended for many parameters (5+), reduces code duplication The individual approach is preferred for fewer parameters because: - Each parameter is explicit and easy to find - No variable indirection complexity (`${!par}`) - Simple to add/remove individual parameters - More readable at a glance ### 5. Meta Variables Usage **Important:** Never use `par_threads`, `par_cores`, `par_cpus`, or `par_memory` parameters. Use Viash's built-in meta variables instead. **Available meta variables:** - `meta_cpus`: Number of CPU cores available - `meta_memory_*`: Memory limits in various units (b, kb, mb, gb, tb, pb, kib, mib, gib, tib, pib) - `meta_temp_dir`: Temporary directory for the component - `meta_resources_dir`: Path to component resources **Examples:** ```bash # CPU cores with fallback ${meta_cpus:+--threads "$meta_cpus"} ${meta_cpus:+--cores "${meta_cpus:-1}"} # Memory with fallback and unit conversion ${meta_memory_gb:+--memory "${meta_memory_gb}G"} ${meta_memory_mb:+--max-memory "${meta_memory_mb:-1024}M"} # Temporary directory --tmp-dir "${meta_temp_dir:-/tmp}" ``` **Why use meta variables:** - Integrates seamlessly with workflow systems like Nextflow - Automatically managed by Viash runtime - Consistent across all components - Prevents parameter duplication and conflicts For complete details, see [Viash Variables Documentation](https://viash.io/guide/component/variables.html). ### 6. Proper Quoting Always quote variables that might contain spaces or special characters: ```bash # Correct --input "$par_input" --output "$par_output" # For special characters, use @Q expansion --pattern "${par_pattern@Q}" ``` ### 7. Multiple Parameter Values When using arguments with `multiple: true` in your Viash configuration, values are passed as semicolon-separated strings that need to be split into bash arrays. #### In script.sh - Converting to Arrays ```bash # Convert semicolon-separated values to bash array IFS=';' read -ra files_array <<< "$par_files" # Example: Use in command arguments cmd_args=( -i "$par_input" -files "${files_array[@]}" -o "$par_output" ) # Execute command bedtools annotate "${cmd_args[@]}" ``` #### In test.sh - Passing Multiple Values When testing components with `multiple: true` parameters, you can use either format: ```bash # Method 1: Repeated flags (recommended for readability) "$meta_executable" \ --input "$meta_temp_dir/query.bed" \ --files "$meta_temp_dir/db1.bed" \ --files "$meta_temp_dir/db2.bed" \ --output "$meta_temp_dir/result.bed" # Method 2: Semicolon-separated values "$meta_executable" \ --input "$meta_temp_dir/query.bed" \ --files "$meta_temp_dir/db1.bed;$meta_temp_dir/db2.bed" \ --output "$meta_temp_dir/result.bed" ``` Both methods work identically - Viash automatically converts repeated flags to semicolon-separated strings internally. #### Complete Example ```bash #!/bin/bash ## VIASH START ## VIASH END set -eo pipefail # Convert semicolon-separated files to array IFS=';' read -ra files_array <<< "$par_files" # Convert semicolon-separated names to array if provided if [[ -n "${par_names}" ]]; then IFS=';' read -ra names_array <<< "$par_names" fi # Build command arguments array cmd_args=( -i "$par_input" ${par_names:+-names "${names_array[@]}"} -files "${files_array[@]}" ) # Execute command bedtools annotate "${cmd_args[@]}" > "$par_output" ``` ## Real-World Example Here's an example from the bowtie2_build component: ```bash #!/bin/bash ## VIASH START ## VIASH END set -eo pipefail # unset flags [[ "$par_large_index" == "false" ]] && unset par_large_index [[ "$par_noauto" == "false" ]] && unset par_noauto [[ "$par_packed" == "false" ]] && unset par_packed # Create output directory mkdir -p "$par_output" # Determine index basename if [ -n "$par_index_name" ]; then index_basename="$par_index_name" else index_basename=$(basename "$par_input" .fasta) fi # Build command arguments cmd_args=( ${par_fasta:+-f} ${par_cmdline:+-c} ${par_large_index:+--large-index} ${par_noauto:+-a} ${par_packed:+-p} ${par_bmax:+--bmax "$par_bmax"} ${par_offrate:+-o "$par_offrate"} "$par_input" "$par_output/$index_basename" ) # Execute bowtie2-build bowtie2-build "${cmd_args[@]}" ``` ## Advanced Patterns ### Multiple Input Handling If your tool accepts multiple inputs with custom separators: ```bash # Convert Viash's semicolon separator to comma par_disable_filters=$(echo "$par_disable_filters" | tr ';' ',') cmd_args=( --disable-filters "$par_disable_filters" ) ``` ### Complex File Handling ```bash # Ensure output directory exists mkdir -p "$(dirname "$par_output")" # Handle relative paths input_path=$(realpath "$par_input") output_path=$(realpath "$par_output") ``` ### Resource Management ```bash # Use available resources cmd_args=( ${meta_cpus:+--threads "$meta_cpus"} ${meta_memory_mb:+--memory "${meta_memory_mb}M"} ) ``` ## Common Pitfalls ### 1. Unquoted Variables ```bash # Wrong - can break with spaces cmd_args=(--input $par_input) # Correct cmd_args=(--input "$par_input") ``` ### 2. Improper Boolean Handling ```bash # Wrong - will include false booleans cmd_args=(${par_verbose:+--verbose}) # Correct - unset false values first [[ "$par_verbose" == "false" ]] && unset par_verbose cmd_args=(${par_verbose:+--verbose}) ``` ### 3. Array Expansion ```bash # Wrong - treats array as single string tool $cmd_args # Correct - expands array elements tool "${cmd_args[@]}" ``` ## Testing Your Script Always test your script with: - Empty/missing optional parameters - Parameters with spaces - Boolean true/false values - Edge cases specific to your tool See [Testing Guide](docs/TESTING.md) for extensive test best practices.