435 lines
10 KiB
Markdown
435 lines
10 KiB
Markdown
|
|
# Script Development Guide
|
||
|
|
|
||
|
|
This guide covers best practices for writing runner scripts in biobox components.
|
||
|
|
|
||
|
|
## Table of Contents
|
||
|
|
- [Script Structure and Template](#script-structure-and-template)
|
||
|
|
- [Key Principles](#key-principles)
|
||
|
|
- [Real-World Example](#real-world-example)
|
||
|
|
- [Advanced Patterns](#advanced-patterns)
|
||
|
|
- [Common Pitfalls](#common-pitfalls)
|
||
|
|
- [Testing Your Script](#testing-your-script)
|
||
|
|
|
||
|
|
## Script Structure and Template
|
||
|
|
|
||
|
|
All Viash component scripts follow a standard structure with best practices for error handling and parameter management.
|
||
|
|
|
||
|
|
### Basic Template
|
||
|
|
|
||
|
|
```bash
|
||
|
|
#!/bin/bash
|
||
|
|
|
||
|
|
## VIASH START
|
||
|
|
## VIASH END
|
||
|
|
|
||
|
|
set -eo pipefail
|
||
|
|
|
||
|
|
# unset flags
|
||
|
|
[[ "$par_option1" == "false" ]] && unset par_option1
|
||
|
|
[[ "$par_option2" == "false" ]] && unset par_option2
|
||
|
|
|
||
|
|
# Build command arguments array
|
||
|
|
cmd_args=(
|
||
|
|
--input "$par_input"
|
||
|
|
--output "$par_output"
|
||
|
|
${par_option1:+--option1}
|
||
|
|
${par_option2:+--option2}
|
||
|
|
${meta_cpus:+--threads "$meta_cpus"}
|
||
|
|
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
|
||
|
|
)
|
||
|
|
|
||
|
|
# Execute command
|
||
|
|
xxx "${cmd_args[@]}"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Understanding the Viash Code Block
|
||
|
|
|
||
|
|
The `## VIASH START` and `## VIASH END` comments mark a special placeholder block where Viash injects runtime parameters and metadata when the component is executed.
|
||
|
|
|
||
|
|
**At runtime**, Viash replaces this placeholder with:
|
||
|
|
- `par_*` variables containing argument values (e.g., `par_input`, `par_output`)
|
||
|
|
- `meta_*` variables containing runtime metadata (e.g., `meta_name`, `meta_cpus`, `meta_temp_dir`)
|
||
|
|
|
||
|
|
**For debugging**, you can put example code between these markers to test your script locally:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
## VIASH START
|
||
|
|
par_input="test_input.txt"
|
||
|
|
par_output="test_output.txt"
|
||
|
|
par_verbose="true"
|
||
|
|
meta_cpus="4"
|
||
|
|
meta_memory_gb="8"
|
||
|
|
meta_temp_dir="/tmp"
|
||
|
|
## VIASH END
|
||
|
|
```
|
||
|
|
|
||
|
|
This allows you to run your script directly with `bash script.sh` during development.
|
||
|
|
|
||
|
|
## Code Style Guidelines
|
||
|
|
|
||
|
|
### Indentation
|
||
|
|
|
||
|
|
**Use 2-space indentation consistently throughout your scripts:**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Correct - 2 spaces
|
||
|
|
unset_if_false=(
|
||
|
|
par_verbose
|
||
|
|
par_quiet
|
||
|
|
par_force
|
||
|
|
)
|
||
|
|
|
||
|
|
for par in "${unset_if_false[@]}"; do
|
||
|
|
test_val="${!par}"
|
||
|
|
[[ "$test_val" == "false" ]] && unset $par
|
||
|
|
done
|
||
|
|
|
||
|
|
cmd_args=(
|
||
|
|
--input "$par_input"
|
||
|
|
--output "$par_output"
|
||
|
|
${par_verbose:+--verbose}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Incorrect - 4 spaces or tabs
|
||
|
|
unset_if_false=(
|
||
|
|
par_verbose
|
||
|
|
par_quiet
|
||
|
|
par_force
|
||
|
|
)
|
||
|
|
|
||
|
|
for par in "${unset_if_false[@]}"; do
|
||
|
|
test_val="${!par}"
|
||
|
|
[[ "$test_val" == "false" ]] && unset $par
|
||
|
|
done
|
||
|
|
```
|
||
|
|
|
||
|
|
**Why 2 spaces:**
|
||
|
|
- Consistent with other biobox components
|
||
|
|
- Better readability in terminal and code editors
|
||
|
|
- Reduces line width for complex nested structures
|
||
|
|
- Standard practice in many shell script projects
|
||
|
|
|
||
|
|
## Key Principles
|
||
|
|
|
||
|
|
### 1. Error Handling
|
||
|
|
|
||
|
|
Always use `set -eo pipefail`:
|
||
|
|
- `set -e`: Exit immediately if a command exits with a non-zero status
|
||
|
|
- `set -o pipefail`: Exit if any command in a pipeline fails
|
||
|
|
|
||
|
|
### 2. Array-Based Arguments
|
||
|
|
|
||
|
|
**Preferred approach:**
|
||
|
|
```bash
|
||
|
|
cmd_args=(
|
||
|
|
--input "$par_input"
|
||
|
|
--output "$par_output"
|
||
|
|
${par_option:+--option "$par_option"}
|
||
|
|
)
|
||
|
|
|
||
|
|
xxx "${cmd_args[@]}"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Avoid repetitive appending:**
|
||
|
|
```bash
|
||
|
|
# Don't do this
|
||
|
|
cmd_args+=("--input")
|
||
|
|
cmd_args+=("$par_input")
|
||
|
|
cmd_args+=("--output")
|
||
|
|
cmd_args+=("$par_output")
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Conditional Parameter Inclusion
|
||
|
|
|
||
|
|
Use Bash parameter expansion for optional parameters:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Include parameter only if variable is set and not empty
|
||
|
|
${meta_cpus:+--threads "$meta_cpus"}
|
||
|
|
|
||
|
|
# Include flag only if boolean is true (after unsetting false values)
|
||
|
|
${par_verbose:+--verbose}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Boolean Handling
|
||
|
|
|
||
|
|
Unset boolean parameters that are "false":
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Single parameter
|
||
|
|
[[ "$par_verbose" == "false" ]] && unset par_verbose
|
||
|
|
|
||
|
|
# For multiple parameters, you can use either approach:
|
||
|
|
|
||
|
|
# Option 1: Individual approach (recommended for 1-4 parameters)
|
||
|
|
[[ "$par_verbose" == "false" ]] && unset par_verbose
|
||
|
|
[[ "$par_quiet" == "false" ]] && unset par_quiet
|
||
|
|
[[ "$par_force" == "false" ]] && unset par_force
|
||
|
|
[[ "$par_recursive" == "false" ]] && unset par_recursive
|
||
|
|
|
||
|
|
# Option 2: Loop approach (recommended for 5+ parameters)
|
||
|
|
unset_if_false=(
|
||
|
|
par_verbose
|
||
|
|
par_quiet
|
||
|
|
par_force
|
||
|
|
par_recursive
|
||
|
|
par_follow_symlinks
|
||
|
|
par_ignore_case
|
||
|
|
par_preserve_permissions
|
||
|
|
)
|
||
|
|
|
||
|
|
for par in "${unset_if_false[@]}"; do
|
||
|
|
test_val="${!par}"
|
||
|
|
[[ "$test_val" == "false" ]] && unset $par
|
||
|
|
done
|
||
|
|
```
|
||
|
|
|
||
|
|
**When to use which approach:**
|
||
|
|
|
||
|
|
- **Individual approach**: Recommended for 1-4 boolean parameters, clearer and more direct
|
||
|
|
- **Loop approach**: Recommended for many parameters (5+), reduces code duplication
|
||
|
|
|
||
|
|
The individual approach is preferred for fewer parameters because:
|
||
|
|
- Each parameter is explicit and easy to find
|
||
|
|
- No variable indirection complexity (`${!par}`)
|
||
|
|
- Simple to add/remove individual parameters
|
||
|
|
- More readable at a glance
|
||
|
|
|
||
|
|
### 5. Meta Variables Usage
|
||
|
|
|
||
|
|
**Important:** Never use `par_threads`, `par_cores`, `par_cpus`, or `par_memory` parameters. Use Viash's built-in meta variables instead.
|
||
|
|
|
||
|
|
**Available meta variables:**
|
||
|
|
- `meta_cpus`: Number of CPU cores available
|
||
|
|
- `meta_memory_*`: Memory limits in various units (b, kb, mb, gb, tb, pb, kib, mib, gib, tib, pib)
|
||
|
|
- `meta_temp_dir`: Temporary directory for the component
|
||
|
|
- `meta_resources_dir`: Path to component resources
|
||
|
|
|
||
|
|
**Examples:**
|
||
|
|
```bash
|
||
|
|
# CPU cores with fallback
|
||
|
|
${meta_cpus:+--threads "$meta_cpus"}
|
||
|
|
${meta_cpus:+--cores "${meta_cpus:-1}"}
|
||
|
|
|
||
|
|
# Memory with fallback and unit conversion
|
||
|
|
${meta_memory_gb:+--memory "${meta_memory_gb}G"}
|
||
|
|
${meta_memory_mb:+--max-memory "${meta_memory_mb:-1024}M"}
|
||
|
|
|
||
|
|
# Temporary directory
|
||
|
|
--tmp-dir "${meta_temp_dir:-/tmp}"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Why use meta variables:**
|
||
|
|
- Integrates seamlessly with workflow systems like Nextflow
|
||
|
|
- Automatically managed by Viash runtime
|
||
|
|
- Consistent across all components
|
||
|
|
- Prevents parameter duplication and conflicts
|
||
|
|
|
||
|
|
For complete details, see [Viash Variables Documentation](https://viash.io/guide/component/variables.html).
|
||
|
|
|
||
|
|
### 6. Proper Quoting
|
||
|
|
|
||
|
|
Always quote variables that might contain spaces or special characters:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Correct
|
||
|
|
--input "$par_input"
|
||
|
|
--output "$par_output"
|
||
|
|
|
||
|
|
# For special characters, use @Q expansion
|
||
|
|
--pattern "${par_pattern@Q}"
|
||
|
|
```
|
||
|
|
|
||
|
|
### 7. Multiple Parameter Values
|
||
|
|
|
||
|
|
When using arguments with `multiple: true` in your Viash configuration, values are passed as semicolon-separated strings that need to be split into bash arrays.
|
||
|
|
|
||
|
|
#### In script.sh - Converting to Arrays
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Convert semicolon-separated values to bash array
|
||
|
|
IFS=';' read -ra files_array <<< "$par_files"
|
||
|
|
|
||
|
|
# Example: Use in command arguments
|
||
|
|
cmd_args=(
|
||
|
|
-i "$par_input"
|
||
|
|
-files "${files_array[@]}"
|
||
|
|
-o "$par_output"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Execute command
|
||
|
|
bedtools annotate "${cmd_args[@]}"
|
||
|
|
```
|
||
|
|
|
||
|
|
#### In test.sh - Passing Multiple Values
|
||
|
|
|
||
|
|
When testing components with `multiple: true` parameters, you can use either format:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Method 1: Repeated flags (recommended for readability)
|
||
|
|
"$meta_executable" \
|
||
|
|
--input "$meta_temp_dir/query.bed" \
|
||
|
|
--files "$meta_temp_dir/db1.bed" \
|
||
|
|
--files "$meta_temp_dir/db2.bed" \
|
||
|
|
--output "$meta_temp_dir/result.bed"
|
||
|
|
|
||
|
|
# Method 2: Semicolon-separated values
|
||
|
|
"$meta_executable" \
|
||
|
|
--input "$meta_temp_dir/query.bed" \
|
||
|
|
--files "$meta_temp_dir/db1.bed;$meta_temp_dir/db2.bed" \
|
||
|
|
--output "$meta_temp_dir/result.bed"
|
||
|
|
```
|
||
|
|
|
||
|
|
Both methods work identically - Viash automatically converts repeated flags to semicolon-separated strings internally.
|
||
|
|
|
||
|
|
#### Complete Example
|
||
|
|
|
||
|
|
```bash
|
||
|
|
#!/bin/bash
|
||
|
|
|
||
|
|
## VIASH START
|
||
|
|
## VIASH END
|
||
|
|
|
||
|
|
set -eo pipefail
|
||
|
|
|
||
|
|
# Convert semicolon-separated files to array
|
||
|
|
IFS=';' read -ra files_array <<< "$par_files"
|
||
|
|
|
||
|
|
# Convert semicolon-separated names to array if provided
|
||
|
|
if [[ -n "${par_names}" ]]; then
|
||
|
|
IFS=';' read -ra names_array <<< "$par_names"
|
||
|
|
fi
|
||
|
|
|
||
|
|
# Build command arguments array
|
||
|
|
cmd_args=(
|
||
|
|
-i "$par_input"
|
||
|
|
${par_names:+-names "${names_array[@]}"}
|
||
|
|
-files "${files_array[@]}"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Execute command
|
||
|
|
bedtools annotate "${cmd_args[@]}" > "$par_output"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Real-World Example
|
||
|
|
|
||
|
|
Here's an example from the bowtie2_build component:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
#!/bin/bash
|
||
|
|
|
||
|
|
## VIASH START
|
||
|
|
## VIASH END
|
||
|
|
|
||
|
|
set -eo pipefail
|
||
|
|
|
||
|
|
# unset flags
|
||
|
|
[[ "$par_large_index" == "false" ]] && unset par_large_index
|
||
|
|
[[ "$par_noauto" == "false" ]] && unset par_noauto
|
||
|
|
[[ "$par_packed" == "false" ]] && unset par_packed
|
||
|
|
|
||
|
|
# Create output directory
|
||
|
|
mkdir -p "$par_output"
|
||
|
|
|
||
|
|
# Determine index basename
|
||
|
|
if [ -n "$par_index_name" ]; then
|
||
|
|
index_basename="$par_index_name"
|
||
|
|
else
|
||
|
|
index_basename=$(basename "$par_input" .fasta)
|
||
|
|
fi
|
||
|
|
|
||
|
|
# Build command arguments
|
||
|
|
cmd_args=(
|
||
|
|
${par_fasta:+-f}
|
||
|
|
${par_cmdline:+-c}
|
||
|
|
${par_large_index:+--large-index}
|
||
|
|
${par_noauto:+-a}
|
||
|
|
${par_packed:+-p}
|
||
|
|
${par_bmax:+--bmax "$par_bmax"}
|
||
|
|
${par_offrate:+-o "$par_offrate"}
|
||
|
|
"$par_input"
|
||
|
|
"$par_output/$index_basename"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Execute bowtie2-build
|
||
|
|
bowtie2-build "${cmd_args[@]}"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Advanced Patterns
|
||
|
|
|
||
|
|
### Multiple Input Handling
|
||
|
|
|
||
|
|
If your tool accepts multiple inputs with custom separators:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Convert Viash's semicolon separator to comma
|
||
|
|
par_disable_filters=$(echo "$par_disable_filters" | tr ';' ',')
|
||
|
|
|
||
|
|
cmd_args=(
|
||
|
|
--disable-filters "$par_disable_filters"
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Complex File Handling
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Ensure output directory exists
|
||
|
|
mkdir -p "$(dirname "$par_output")"
|
||
|
|
|
||
|
|
# Handle relative paths
|
||
|
|
input_path=$(realpath "$par_input")
|
||
|
|
output_path=$(realpath "$par_output")
|
||
|
|
```
|
||
|
|
|
||
|
|
### Resource Management
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Use available resources
|
||
|
|
cmd_args=(
|
||
|
|
${meta_cpus:+--threads "$meta_cpus"}
|
||
|
|
${meta_memory_mb:+--memory "${meta_memory_mb}M"}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Common Pitfalls
|
||
|
|
|
||
|
|
### 1. Unquoted Variables
|
||
|
|
```bash
|
||
|
|
# Wrong - can break with spaces
|
||
|
|
cmd_args=(--input $par_input)
|
||
|
|
|
||
|
|
# Correct
|
||
|
|
cmd_args=(--input "$par_input")
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Improper Boolean Handling
|
||
|
|
```bash
|
||
|
|
# Wrong - will include false booleans
|
||
|
|
cmd_args=(${par_verbose:+--verbose})
|
||
|
|
|
||
|
|
# Correct - unset false values first
|
||
|
|
[[ "$par_verbose" == "false" ]] && unset par_verbose
|
||
|
|
cmd_args=(${par_verbose:+--verbose})
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Array Expansion
|
||
|
|
```bash
|
||
|
|
# Wrong - treats array as single string
|
||
|
|
tool $cmd_args
|
||
|
|
|
||
|
|
# Correct - expands array elements
|
||
|
|
tool "${cmd_args[@]}"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing Your Script
|
||
|
|
|
||
|
|
Always test your script with:
|
||
|
|
- Empty/missing optional parameters
|
||
|
|
- Parameters with spaces
|
||
|
|
- Boolean true/false values
|
||
|
|
- Edge cases specific to your tool
|
||
|
|
|
||
|
|
See [Testing Guide](docs/TESTING.md) for extensive test best practices.
|