Files
biobox/docs/SCRIPT_DEVELOPMENT.md

435 lines
10 KiB
Markdown
Raw Permalink Normal View History

# Script Development Guide
This guide covers best practices for writing runner scripts in biobox components.
## Table of Contents
- [Script Structure and Template](#script-structure-and-template)
- [Key Principles](#key-principles)
- [Real-World Example](#real-world-example)
- [Advanced Patterns](#advanced-patterns)
- [Common Pitfalls](#common-pitfalls)
- [Testing Your Script](#testing-your-script)
## Script Structure and Template
All Viash component scripts follow a standard structure with best practices for error handling and parameter management.
### Basic Template
```bash
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# unset flags
[[ "$par_option1" == "false" ]] && unset par_option1
[[ "$par_option2" == "false" ]] && unset par_option2
# Build command arguments array
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_option1:+--option1}
${par_option2:+--option2}
${meta_cpus:+--threads "$meta_cpus"}
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
)
# Execute command
xxx "${cmd_args[@]}"
```
### Understanding the Viash Code Block
The `## VIASH START` and `## VIASH END` comments mark a special placeholder block where Viash injects runtime parameters and metadata when the component is executed.
**At runtime**, Viash replaces this placeholder with:
- `par_*` variables containing argument values (e.g., `par_input`, `par_output`)
- `meta_*` variables containing runtime metadata (e.g., `meta_name`, `meta_cpus`, `meta_temp_dir`)
**For debugging**, you can put example code between these markers to test your script locally:
```bash
## VIASH START
par_input="test_input.txt"
par_output="test_output.txt"
par_verbose="true"
meta_cpus="4"
meta_memory_gb="8"
meta_temp_dir="/tmp"
## VIASH END
```
This allows you to run your script directly with `bash script.sh` during development.
## Code Style Guidelines
### Indentation
**Use 2-space indentation consistently throughout your scripts:**
```bash
# Correct - 2 spaces
unset_if_false=(
par_verbose
par_quiet
par_force
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_verbose:+--verbose}
)
```
```bash
# Incorrect - 4 spaces or tabs
unset_if_false=(
par_verbose
par_quiet
par_force
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
```
**Why 2 spaces:**
- Consistent with other biobox components
- Better readability in terminal and code editors
- Reduces line width for complex nested structures
- Standard practice in many shell script projects
## Key Principles
### 1. Error Handling
Always use `set -eo pipefail`:
- `set -e`: Exit immediately if a command exits with a non-zero status
- `set -o pipefail`: Exit if any command in a pipeline fails
### 2. Array-Based Arguments
**Preferred approach:**
```bash
cmd_args=(
--input "$par_input"
--output "$par_output"
${par_option:+--option "$par_option"}
)
xxx "${cmd_args[@]}"
```
**Avoid repetitive appending:**
```bash
# Don't do this
cmd_args+=("--input")
cmd_args+=("$par_input")
cmd_args+=("--output")
cmd_args+=("$par_output")
```
### 3. Conditional Parameter Inclusion
Use Bash parameter expansion for optional parameters:
```bash
# Include parameter only if variable is set and not empty
${meta_cpus:+--threads "$meta_cpus"}
# Include flag only if boolean is true (after unsetting false values)
${par_verbose:+--verbose}
```
### 4. Boolean Handling
Unset boolean parameters that are "false":
```bash
# Single parameter
[[ "$par_verbose" == "false" ]] && unset par_verbose
# For multiple parameters, you can use either approach:
# Option 1: Individual approach (recommended for 1-4 parameters)
[[ "$par_verbose" == "false" ]] && unset par_verbose
[[ "$par_quiet" == "false" ]] && unset par_quiet
[[ "$par_force" == "false" ]] && unset par_force
[[ "$par_recursive" == "false" ]] && unset par_recursive
# Option 2: Loop approach (recommended for 5+ parameters)
unset_if_false=(
par_verbose
par_quiet
par_force
par_recursive
par_follow_symlinks
par_ignore_case
par_preserve_permissions
)
for par in "${unset_if_false[@]}"; do
test_val="${!par}"
[[ "$test_val" == "false" ]] && unset $par
done
```
**When to use which approach:**
- **Individual approach**: Recommended for 1-4 boolean parameters, clearer and more direct
- **Loop approach**: Recommended for many parameters (5+), reduces code duplication
The individual approach is preferred for fewer parameters because:
- Each parameter is explicit and easy to find
- No variable indirection complexity (`${!par}`)
- Simple to add/remove individual parameters
- More readable at a glance
### 5. Meta Variables Usage
**Important:** Never use `par_threads`, `par_cores`, `par_cpus`, or `par_memory` parameters. Use Viash's built-in meta variables instead.
**Available meta variables:**
- `meta_cpus`: Number of CPU cores available
- `meta_memory_*`: Memory limits in various units (b, kb, mb, gb, tb, pb, kib, mib, gib, tib, pib)
- `meta_temp_dir`: Temporary directory for the component
- `meta_resources_dir`: Path to component resources
**Examples:**
```bash
# CPU cores with fallback
${meta_cpus:+--threads "$meta_cpus"}
${meta_cpus:+--cores "${meta_cpus:-1}"}
# Memory with fallback and unit conversion
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
${meta_memory_mb:+--max-memory "${meta_memory_mb:-1024}M"}
# Temporary directory
--tmp-dir "${meta_temp_dir:-/tmp}"
```
**Why use meta variables:**
- Integrates seamlessly with workflow systems like Nextflow
- Automatically managed by Viash runtime
- Consistent across all components
- Prevents parameter duplication and conflicts
For complete details, see [Viash Variables Documentation](https://viash.io/guide/component/variables.html).
### 6. Proper Quoting
Always quote variables that might contain spaces or special characters:
```bash
# Correct
--input "$par_input"
--output "$par_output"
# For special characters, use @Q expansion
--pattern "${par_pattern@Q}"
```
### 7. Multiple Parameter Values
When using arguments with `multiple: true` in your Viash configuration, values are passed as semicolon-separated strings that need to be split into bash arrays.
#### In script.sh - Converting to Arrays
```bash
# Convert semicolon-separated values to bash array
IFS=';' read -ra files_array <<< "$par_files"
# Example: Use in command arguments
cmd_args=(
-i "$par_input"
-files "${files_array[@]}"
-o "$par_output"
)
# Execute command
bedtools annotate "${cmd_args[@]}"
```
#### In test.sh - Passing Multiple Values
When testing components with `multiple: true` parameters, you can use either format:
```bash
# Method 1: Repeated flags (recommended for readability)
"$meta_executable" \
--input "$meta_temp_dir/query.bed" \
--files "$meta_temp_dir/db1.bed" \
--files "$meta_temp_dir/db2.bed" \
--output "$meta_temp_dir/result.bed"
# Method 2: Semicolon-separated values
"$meta_executable" \
--input "$meta_temp_dir/query.bed" \
--files "$meta_temp_dir/db1.bed;$meta_temp_dir/db2.bed" \
--output "$meta_temp_dir/result.bed"
```
Both methods work identically - Viash automatically converts repeated flags to semicolon-separated strings internally.
#### Complete Example
```bash
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# Convert semicolon-separated files to array
IFS=';' read -ra files_array <<< "$par_files"
# Convert semicolon-separated names to array if provided
if [[ -n "${par_names}" ]]; then
IFS=';' read -ra names_array <<< "$par_names"
fi
# Build command arguments array
cmd_args=(
-i "$par_input"
${par_names:+-names "${names_array[@]}"}
-files "${files_array[@]}"
)
# Execute command
bedtools annotate "${cmd_args[@]}" > "$par_output"
```
## Real-World Example
Here's an example from the bowtie2_build component:
```bash
#!/bin/bash
## VIASH START
## VIASH END
set -eo pipefail
# unset flags
[[ "$par_large_index" == "false" ]] && unset par_large_index
[[ "$par_noauto" == "false" ]] && unset par_noauto
[[ "$par_packed" == "false" ]] && unset par_packed
# Create output directory
mkdir -p "$par_output"
# Determine index basename
if [ -n "$par_index_name" ]; then
index_basename="$par_index_name"
else
index_basename=$(basename "$par_input" .fasta)
fi
# Build command arguments
cmd_args=(
${par_fasta:+-f}
${par_cmdline:+-c}
${par_large_index:+--large-index}
${par_noauto:+-a}
${par_packed:+-p}
${par_bmax:+--bmax "$par_bmax"}
${par_offrate:+-o "$par_offrate"}
"$par_input"
"$par_output/$index_basename"
)
# Execute bowtie2-build
bowtie2-build "${cmd_args[@]}"
```
## Advanced Patterns
### Multiple Input Handling
If your tool accepts multiple inputs with custom separators:
```bash
# Convert Viash's semicolon separator to comma
par_disable_filters=$(echo "$par_disable_filters" | tr ';' ',')
cmd_args=(
--disable-filters "$par_disable_filters"
)
```
### Complex File Handling
```bash
# Ensure output directory exists
mkdir -p "$(dirname "$par_output")"
# Handle relative paths
input_path=$(realpath "$par_input")
output_path=$(realpath "$par_output")
```
### Resource Management
```bash
# Use available resources
cmd_args=(
${meta_cpus:+--threads "$meta_cpus"}
${meta_memory_mb:+--memory "${meta_memory_mb}M"}
)
```
## Common Pitfalls
### 1. Unquoted Variables
```bash
# Wrong - can break with spaces
cmd_args=(--input $par_input)
# Correct
cmd_args=(--input "$par_input")
```
### 2. Improper Boolean Handling
```bash
# Wrong - will include false booleans
cmd_args=(${par_verbose:+--verbose})
# Correct - unset false values first
[[ "$par_verbose" == "false" ]] && unset par_verbose
cmd_args=(${par_verbose:+--verbose})
```
### 3. Array Expansion
```bash
# Wrong - treats array as single string
tool $cmd_args
# Correct - expands array elements
tool "${cmd_args[@]}"
```
## Testing Your Script
Always test your script with:
- Empty/missing optional parameters
- Parameters with spaces
- Boolean true/false values
- Edge cases specific to your tool
See [Testing Guide](docs/TESTING.md) for extensive test best practices.