Files
biobox/docs/TESTING.md
CI 04a5851ff8 Build branch biobox/main with version main to biobox on branch main (7158daa)
Build pipeline: viash-hub.biobox.main-tb4cv

Source commit: 7158daa5f6

Source message: Fix bases2fastq component, update to latest practices (#190)

* wip updates

* refactor component

* assume bases2fastq follows semver

* fix version command

* add entry to changelog

* move to minor changes
2025-09-01 11:04:56 +00:00

15 KiB

Testing Guide

This guide covers best practices for writing comprehensive test scripts for biobox components.

📌 Important: All new test scripts should use the centralized test helpers located at src/_utils/test_helpers.sh. This eliminates code duplication and ensures consistency across all components.

Table of Contents

Core Principles

1. Generate Test Data in Scripts

Preferred approach: Generate test data within the test script using the centralized helper functions.

# Generate test data using centralized helpers
create_test_fasta "$meta_temp_dir/input.fasta" 3 50
create_test_fastq "$meta_temp_dir/reads.fastq" 10 35

Avoid:

  • Storing static test files in the repository
  • Fetching test data from external sources
  • Large test datasets

2. Self-Contained Tests

Tests should be completely self-contained and not depend on external resources:

test_resources:
  - type: bash_script
    path: test.sh
  - type: file
    path: /src/_utils/test_helpers.sh

Only add static test files if absolutely necessary:

test_resources:
  - type: bash_script
    path: test.sh
  - type: file
    path: /src/_utils/test_helpers.sh
  - type: file
    path: test_data  # Only if data generation is impractical

Test Script Structure

Configuration Setup

Add the test helpers as a resource in your component configuration:

test_resources:
  - type: bash_script
    path: test.sh
  - type: file
    path: /src/_utils/test_helpers.sh

Basic Test Template

#!/bin/bash

## VIASH START
## VIASH END

# Source the centralized test helpers
source "$meta_resources_dir/test_helpers.sh"

# Initialize test environment with strict error handling
setup_test_env

#############################################
# Test execution with centralized functions
#############################################

log "Starting tests for $meta_name"

# --- Test Case 1: Basic functionality ---
log "Starting TEST 1: Basic functionality"

# Create and validate test data
test_data_dir="$meta_temp_dir/test_data"
mkdir -p "$test_data_dir"
create_test_fasta "$test_data_dir/input.fasta" 3 50
check_file_exists "$test_data_dir/input.fasta" "input FASTA file"

log "Executing $meta_name with basic parameters..."
"$meta_executable" \
  --input "$test_data_dir/input.fasta" \
  --output "$meta_temp_dir/test1"

log "Validating TEST 1 outputs..."
check_dir_exists "$meta_temp_dir/test1" "output directory"
check_file_exists "$meta_temp_dir/test1/result.txt" "result file"
check_file_not_empty "$meta_temp_dir/test1/result.txt" "result file"

log "✅ TEST 1 completed successfully"

# --- Test Case 2: Advanced parameters ---
log "Starting TEST 2: Advanced parameters"

# Create different test data
create_test_fastq "$test_data_dir/input.fastq" 10 35
check_file_exists "$test_data_dir/input.fastq" "input FASTQ file"

log "Executing $meta_name with advanced parameters..."
"$meta_executable" \
  --input "$test_data_dir/input.fastq" \
  --output "$meta_temp_dir/test2" \
  --threads 2 \
  --verbose

log "Validating TEST 2 outputs..."
check_file_exists "$meta_temp_dir/test2/advanced_result.txt" "advanced result file"
check_file_contains "$meta_temp_dir/test2/advanced_result.txt" "expected_pattern" "advanced result file"

log "✅ TEST 2 completed successfully"

print_test_summary "All tests completed successfully"

Centralized Test Helpers

The centralized test helpers located at src/_utils/test_helpers.sh provide comprehensive testing functionality to ensure consistency across all biobox components.

Available Functions

Logging Functions

  • log "message" - Log with timestamp
  • log_warn "message" - Warning message
  • log_error "message" - Error message

File/Directory Validation

  • check_file_exists path "description" - Verify file exists
  • check_dir_exists path "description" - Verify directory exists
  • check_file_not_exists path "description" - Verify file doesn't exist
  • check_dir_not_exists path "description" - Verify directory doesn't exist
  • check_file_empty path "description" - Verify file is empty
  • check_file_not_empty path "description" - Verify file is not empty

Content Validation

  • check_file_contains path "text" "description" - Verify file contains text
  • check_file_not_contains path "text" "description" - Verify file doesn't contain text
  • check_file_matches_regex path "pattern" "description" - Verify file matches regex
  • check_file_line_count path count "description" - Verify line count

Test Data Generation

  • create_test_fasta path [num_seqs] [seq_length] - Generate FASTA file
  • create_test_fastq path [num_reads] [read_length] - Generate FASTQ file
  • create_test_gtf path [num_genes] - Generate GTF file
  • create_test_gff path [num_features] - Generate GFF file
  • create_test_bed path [num_intervals] - Generate BED file
  • create_test_csv path [num_rows] - Generate CSV file
  • create_test_tsv path [num_rows] - Generate TSV file

Utility Functions

  • setup_test_env - Initialize test environment with strict error handling
  • print_test_summary "test_name" - Print completion message

Usage Example

#!/bin/bash

## VIASH START
## VIASH END

# Source centralized helpers
source "$meta_resources_dir/test_helpers.sh"
setup_test_env

log "Starting tests for $meta_name"

# Generate test data
create_test_fasta "$meta_temp_dir/input.fasta" 3 50
check_file_exists "$meta_temp_dir/input.fasta" "input FASTA file"

# Run component
"$meta_executable" \
  --input "$meta_temp_dir/input.fasta" \
  --output "$meta_temp_dir/output.txt"

# Validate output
check_file_exists "$meta_temp_dir/output.txt" "result file"
check_file_contains "$meta_temp_dir/output.txt" "expected_pattern" "result file"

print_test_summary "Basic functionality test"

Test Scenarios

1. Basic Functionality

Test the component with minimal, essential parameters:

log "Starting TEST 1: Basic functionality"

create_test_fasta "$meta_temp_dir/input.fasta" 3 50

"$meta_executable" \
  --input "$meta_temp_dir/input.fasta" \
  --output "$meta_temp_dir/output.txt"

check_file_exists "$meta_temp_dir/output.txt" "output file"
check_file_not_empty "$meta_temp_dir/output.txt" "output file"

log "✅ TEST 1 completed successfully"

2. Multiple Input Files

Test with multiple input files or complex input scenarios:

log "Starting TEST 2: Multiple input files"

create_test_fasta "$meta_temp_dir/input1.fasta" 2 30
create_test_fasta "$meta_temp_dir/input2.fasta" 2 30

"$meta_executable" \
  --input "$meta_temp_dir/input1.fasta;$meta_temp_dir/input2.fasta" \
  --output "$meta_temp_dir/output.txt"

check_file_exists "$meta_temp_dir/output.txt" "merged output file"

log "✅ TEST 2 completed successfully"

3. Optional Parameters

Test with optional parameters and advanced features:

log "Starting TEST 3: Optional parameters"

create_test_fastq "$meta_temp_dir/input.fastq" 10 35

"$meta_executable" \
  --input "$meta_temp_dir/input.fastq" \
  --output "$meta_temp_dir/output.txt" \
  --threads 2 \
  --verbose

check_file_exists "$meta_temp_dir/output.txt" "output file with options"
check_file_contains "$meta_temp_dir/output.txt" "verbose" "verbose output"

log "✅ TEST 3 completed successfully"

4. Edge Cases

Test with edge cases like empty files or unusual inputs:

log "Starting TEST 4: Edge case - empty input"

# Create empty input file
touch "$meta_temp_dir/empty.fasta"

# Test should handle empty input gracefully
if "$meta_executable" \
  --input "$meta_temp_dir/empty.fasta" \
  --output "$meta_temp_dir/output.txt" 2>/dev/null; then
  log_warn "Component succeeded with empty input - checking output"
  check_file_exists "$meta_temp_dir/output.txt" "output file for empty input"
else
  log "Expected behavior: Component properly rejected empty input"
fi

log "✅ TEST 4 completed successfully"

5. Error Handling

Test proper error handling for invalid inputs:

log "Starting TEST 5: Error handling"

# Test with non-existent input file
if "$meta_executable" \
  --input "/non/existent/file.txt" \
  --output "$meta_temp_dir/output.txt" 2>/dev/null; then
  log_error "Component should have failed with non-existent input"
  exit 1
else
  log "✅ Component properly handled non-existent input file"
fi

log "✅ TEST 5 completed successfully"

Best Practices

1. Use Centralized Test Helpers

Always use the centralized test helpers instead of defining functions individually:

# ✅ Recommended: Use centralized helpers
source "$meta_resources_dir/test_helpers.sh"
setup_test_env

# ❌ NOT recommended: Defining functions individually
set -euo pipefail
log() { echo "$(date '+%Y-%m-%d %H:%M:%S') [TEST] $*"; }

2. Strict Error Handling

The centralized helpers automatically provide strict error handling via setup_test_env:

# Automatically enabled by setup_test_env:
set -euo pipefail  # Exit on errors, undefined variables, pipe failures
export LC_ALL=C    # Consistent locale for reproducible results

3. Descriptive Validation

Use descriptive validation functions with meaningful descriptions:

# ✅ Good: Descriptive validation
check_file_exists "$output_file" "filtered feature matrix"
check_file_not_exists "$bam_file" "BAM file (should be disabled by default)"
check_file_contains "$result_file" "expected_pattern" "analysis results"

# ❌ Less helpful: Basic validation without context
check_file_exists "$output_file"

4. Organized Structure

Use $meta_temp_dir and create organized test structure:

# Create organized test structure
test_data_dir="$meta_temp_dir/test_data"
test_output_dir="$meta_temp_dir/test_output"
mkdir -p "$test_data_dir" "$test_output_dir"

create_test_fasta "$test_data_dir/input.fasta" 3 50

5. Clear Test Output

Use consistent logging with clear test boundaries:

log "Starting TEST 1: Basic functionality"
log "Executing $meta_name with basic parameters..."
log "Validating TEST 1 outputs..."
log "✅ TEST 1 completed successfully"

# Final summary
print_test_summary "All tests completed successfully"

6. Comprehensive Content Validation

Don't just check that files exist - validate their content:

# Check existence and content
check_file_exists "$meta_temp_dir/output.txt" "analysis results"
check_file_not_empty "$meta_temp_dir/output.txt" "analysis results"
check_file_contains "$meta_temp_dir/output.txt" "Number of sequences" "result summary"
check_file_line_count "$meta_temp_dir/output.txt" 10 "expected number of results"

7. Multiple Test Scenarios

Include comprehensive test coverage:

# Test 1: Basic functionality
log "Starting TEST 1: Basic functionality"
# ... test implementation ...
log "✅ TEST 1 completed successfully"

# Test 2: Advanced options
log "Starting TEST 2: Advanced options"
# ... test implementation ...
log "✅ TEST 2 completed successfully"

# Test 3: Edge cases
log "Starting TEST 3: Edge case handling"
# ... test implementation ...
log "✅ TEST 3 completed successfully"

print_test_summary "All tests completed successfully"

Viash Testing Features

Running Tests

# Test a single component
viash test config.vsh.yaml

# Test with specific resources
viash test config.vsh.yaml --cpus 4 --memory 8GB

# Test with specific setup strategy
viash test config.vsh.yaml --setup build --verbose

# Keep temporary files for debugging
viash test config.vsh.yaml --keep true

# Test all components in parallel
viash ns test --parallel

# Test specific namespace
viash ns test -q alignment --parallel

Test Execution Flow

When running viash test, Viash automatically:

  1. Creates temporary directory (available as $meta_temp_dir)
  2. Builds the main executable
  3. Builds/pulls Docker image (if using Docker engine)
  4. Iterates over all test scripts in test_resources
  5. Builds each test into executable and runs it
  6. Cleans up temporary files (unless --keep true)
  7. Returns exit code 0 if all tests succeed

Meta Variables in Tests

Your test scripts automatically have access to important meta variables:

  • $meta_executable - Path to the built component executable
  • $meta_temp_dir - Temporary directory for test files (automatically cleaned up)
  • $meta_name - Component name for logging
  • $meta_resources_dir - Path to test resources

Multiple Test Scripts

You can add multiple test scripts to cover different scenarios:

test_resources:
  - type: bash_script
    path: test_basic.sh
  - type: bash_script
    path: test_edge_cases.sh
  - type: bash_script
    path: test_large_data.sh
  - type: file
    path: /src/_utils/test_helpers.sh

Advanced Testing Options

# Test with different container setup strategies
viash test config.vsh.yaml --setup cachedbuild  # Use cached layers (faster)
viash test config.vsh.yaml --setup build        # Clean build from scratch
viash test config.vsh.yaml --setup alwaysbuild  # Always rebuild container

# Test with configuration modifications
viash test config.vsh.yaml -c '.engines[0].image = "ubuntu:22.04"'

# Test with debug mode for troubleshooting
viash test config.vsh.yaml --keep true --verbose

For more details, see the Viash Unit Testing Documentation.

Static Test Data

When to Use Static Test Data

Only use static test files when:

  • The tool requires very specific, complex file formats that are difficult to generate
  • Generating equivalent test data is impractical or overly complex
  • You need real-world data to validate complex algorithms
  • Test data is very small (<1KB preferred, <10KB maximum)

Guidelines for Static Test Data

If you must use static test data:

  1. Keep files small - Prefer <1KB, maximum <10KB
  2. Document the source - How was it created?
  3. Use minimal examples - Strip down to essential features
  4. Consider alternatives - Can you generate equivalent data?
# test_data/README.md
# Test data for complex_tool component
# Source: https://github.com/example/dataset
# Generated with: tool --export-sample --format minimal
# Date: 2025-01-01
# Size: 847 bytes
# Purpose: Tests complex file format parsing

Referencing Static Test Data

test_resources:
  - type: bash_script
    path: test.sh
  - type: file
    path: /src/_utils/test_helpers.sh
  - type: file
    path: test_data
# In your test script
static_data="$meta_resources_dir/test_data/sample.complex"
check_file_exists "$static_data" "static test data"

"$meta_executable" --input "$static_data" --output "$meta_temp_dir/output.txt"