Build branch biobox/main with version main to biobox on branch main (7158daa)

Build pipeline: viash-hub.biobox.main-tb4cv Source commit: 7158daa5f6 Source message: Fix bases2fastq component, update to latest practices (#190) * wip updates * refactor component * assume bases2fastq follows semver * fix version command * add entry to changelog * move to minor changes
2025-09-01 11:04:56 +00:00
parent 9cc17eaa6f
commit 04a5851ff8
859 changed files with 311497 additions and 6746 deletions
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -0,0 +1,536 @@
+# Testing Guide
+
+This guide covers best practices for writing comprehensive test scripts for biobox components.
+
+> **📌 Important:** All new test scripts should use the **centralized test helpers** located at `src/_utils/test_helpers.sh`. This eliminates code duplication and ensures consistency across all components.
+
+## Table of Contents
+
+- [Core Principles](#core-principles)
+- [Test Script Structure](#test-script-structure)
+- [Centralized Test Helpers](#centralized-test-helpers)
+- [Test Scenarios](#test-scenarios)
+- [Best Practices](#best-practices)
+- [Viash Testing Features](#viash-testing-features)
+- [Static Test Data](#static-test-data)
+
+## Core Principles
+
+### 1. Generate Test Data in Scripts
+
+**Preferred approach:** Generate test data within the test script using the centralized helper functions.
+
+```bash
+# Generate test data using centralized helpers
+create_test_fasta "$meta_temp_dir/input.fasta" 3 50
+create_test_fastq "$meta_temp_dir/reads.fastq" 10 35
+```
+
+**Avoid:**
+- Storing static test files in the repository
+- Fetching test data from external sources
+- Large test datasets
+
+### 2. Self-Contained Tests
+
+Tests should be completely self-contained and not depend on external resources:
+
+```yaml
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: /src/_utils/test_helpers.sh
+```
+
+Only add static test files if absolutely necessary:
+
+```yaml
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: /src/_utils/test_helpers.sh
+  - type: file
+    path: test_data  # Only if data generation is impractical
+```
+
+## Test Script Structure
+
+### Configuration Setup
+
+Add the test helpers as a resource in your component configuration:
+
+```yaml
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: /src/_utils/test_helpers.sh
+```
+
+### Basic Test Template
+
+```bash
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Source the centralized test helpers
+source "$meta_resources_dir/test_helpers.sh"
+
+# Initialize test environment with strict error handling
+setup_test_env
+
+#############################################
+# Test execution with centralized functions
+#############################################
+
+log "Starting tests for $meta_name"
+
+# --- Test Case 1: Basic functionality ---
+log "Starting TEST 1: Basic functionality"
+
+# Create and validate test data
+test_data_dir="$meta_temp_dir/test_data"
+mkdir -p "$test_data_dir"
+create_test_fasta "$test_data_dir/input.fasta" 3 50
+check_file_exists "$test_data_dir/input.fasta" "input FASTA file"
+
+log "Executing $meta_name with basic parameters..."
+"$meta_executable" \
+  --input "$test_data_dir/input.fasta" \
+  --output "$meta_temp_dir/test1"
+
+log "Validating TEST 1 outputs..."
+check_dir_exists "$meta_temp_dir/test1" "output directory"
+check_file_exists "$meta_temp_dir/test1/result.txt" "result file"
+check_file_not_empty "$meta_temp_dir/test1/result.txt" "result file"
+
+log "✅ TEST 1 completed successfully"
+
+# --- Test Case 2: Advanced parameters ---
+log "Starting TEST 2: Advanced parameters"
+
+# Create different test data
+create_test_fastq "$test_data_dir/input.fastq" 10 35
+check_file_exists "$test_data_dir/input.fastq" "input FASTQ file"
+
+log "Executing $meta_name with advanced parameters..."
+"$meta_executable" \
+  --input "$test_data_dir/input.fastq" \
+  --output "$meta_temp_dir/test2" \
+  --threads 2 \
+  --verbose
+
+log "Validating TEST 2 outputs..."
+check_file_exists "$meta_temp_dir/test2/advanced_result.txt" "advanced result file"
+check_file_contains "$meta_temp_dir/test2/advanced_result.txt" "expected_pattern" "advanced result file"
+
+log "✅ TEST 2 completed successfully"
+
+print_test_summary "All tests completed successfully"
+```
+
+## Centralized Test Helpers
+
+The centralized test helpers located at `src/_utils/test_helpers.sh` provide comprehensive testing functionality to ensure consistency across all biobox components.
+
+### Available Functions
+
+#### Logging Functions
+- `log "message"` - Log with timestamp
+- `log_warn "message"` - Warning message  
+- `log_error "message"` - Error message
+
+#### File/Directory Validation
+- `check_file_exists path "description"` - Verify file exists
+- `check_dir_exists path "description"` - Verify directory exists
+- `check_file_not_exists path "description"` - Verify file doesn't exist
+- `check_dir_not_exists path "description"` - Verify directory doesn't exist
+- `check_file_empty path "description"` - Verify file is empty
+- `check_file_not_empty path "description"` - Verify file is not empty
+
+#### Content Validation
+- `check_file_contains path "text" "description"` - Verify file contains text
+- `check_file_not_contains path "text" "description"` - Verify file doesn't contain text
+- `check_file_matches_regex path "pattern" "description"` - Verify file matches regex
+- `check_file_line_count path count "description"` - Verify line count
+
+#### Test Data Generation
+- `create_test_fasta path [num_seqs] [seq_length]` - Generate FASTA file
+- `create_test_fastq path [num_reads] [read_length]` - Generate FASTQ file
+- `create_test_gtf path [num_genes]` - Generate GTF file
+- `create_test_gff path [num_features]` - Generate GFF file
+- `create_test_bed path [num_intervals]` - Generate BED file
+- `create_test_csv path [num_rows]` - Generate CSV file
+- `create_test_tsv path [num_rows]` - Generate TSV file
+
+#### Utility Functions
+- `setup_test_env` - Initialize test environment with strict error handling
+- `print_test_summary "test_name"` - Print completion message
+
+### Usage Example
+
+```bash
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+# Source centralized helpers
+source "$meta_resources_dir/test_helpers.sh"
+setup_test_env
+
+log "Starting tests for $meta_name"
+
+# Generate test data
+create_test_fasta "$meta_temp_dir/input.fasta" 3 50
+check_file_exists "$meta_temp_dir/input.fasta" "input FASTA file"
+
+# Run component
+"$meta_executable" \
+  --input "$meta_temp_dir/input.fasta" \
+  --output "$meta_temp_dir/output.txt"
+
+# Validate output
+check_file_exists "$meta_temp_dir/output.txt" "result file"
+check_file_contains "$meta_temp_dir/output.txt" "expected_pattern" "result file"
+
+print_test_summary "Basic functionality test"
+```
+
+## Test Scenarios
+
+### 1. Basic Functionality
+
+Test the component with minimal, essential parameters:
+
+```bash
+log "Starting TEST 1: Basic functionality"
+
+create_test_fasta "$meta_temp_dir/input.fasta" 3 50
+
+"$meta_executable" \
+  --input "$meta_temp_dir/input.fasta" \
+  --output "$meta_temp_dir/output.txt"
+
+check_file_exists "$meta_temp_dir/output.txt" "output file"
+check_file_not_empty "$meta_temp_dir/output.txt" "output file"
+
+log "✅ TEST 1 completed successfully"
+```
+
+### 2. Multiple Input Files
+
+Test with multiple input files or complex input scenarios:
+
+```bash
+log "Starting TEST 2: Multiple input files"
+
+create_test_fasta "$meta_temp_dir/input1.fasta" 2 30
+create_test_fasta "$meta_temp_dir/input2.fasta" 2 30
+
+"$meta_executable" \
+  --input "$meta_temp_dir/input1.fasta;$meta_temp_dir/input2.fasta" \
+  --output "$meta_temp_dir/output.txt"
+
+check_file_exists "$meta_temp_dir/output.txt" "merged output file"
+
+log "✅ TEST 2 completed successfully"
+```
+
+### 3. Optional Parameters
+
+Test with optional parameters and advanced features:
+
+```bash
+log "Starting TEST 3: Optional parameters"
+
+create_test_fastq "$meta_temp_dir/input.fastq" 10 35
+
+"$meta_executable" \
+  --input "$meta_temp_dir/input.fastq" \
+  --output "$meta_temp_dir/output.txt" \
+  --threads 2 \
+  --verbose
+
+check_file_exists "$meta_temp_dir/output.txt" "output file with options"
+check_file_contains "$meta_temp_dir/output.txt" "verbose" "verbose output"
+
+log "✅ TEST 3 completed successfully"
+```
+
+### 4. Edge Cases
+
+Test with edge cases like empty files or unusual inputs:
+
+```bash
+log "Starting TEST 4: Edge case - empty input"
+
+# Create empty input file
+touch "$meta_temp_dir/empty.fasta"
+
+# Test should handle empty input gracefully
+if "$meta_executable" \
+  --input "$meta_temp_dir/empty.fasta" \
+  --output "$meta_temp_dir/output.txt" 2>/dev/null; then
+  log_warn "Component succeeded with empty input - checking output"
+  check_file_exists "$meta_temp_dir/output.txt" "output file for empty input"
+else
+  log "Expected behavior: Component properly rejected empty input"
+fi
+
+log "✅ TEST 4 completed successfully"
+```
+
+### 5. Error Handling
+
+Test proper error handling for invalid inputs:
+
+```bash
+log "Starting TEST 5: Error handling"
+
+# Test with non-existent input file
+if "$meta_executable" \
+  --input "/non/existent/file.txt" \
+  --output "$meta_temp_dir/output.txt" 2>/dev/null; then
+  log_error "Component should have failed with non-existent input"
+  exit 1
+else
+  log "✅ Component properly handled non-existent input file"
+fi
+
+log "✅ TEST 5 completed successfully"
+```
+
+## Best Practices
+
+### 1. Use Centralized Test Helpers
+
+Always use the centralized test helpers instead of defining functions individually:
+
+```bash
+# ✅ Recommended: Use centralized helpers
+source "$meta_resources_dir/test_helpers.sh"
+setup_test_env
+
+# ❌ NOT recommended: Defining functions individually
+set -euo pipefail
+log() { echo "$(date '+%Y-%m-%d %H:%M:%S') [TEST] $*"; }
+```
+
+### 2. Strict Error Handling
+
+The centralized helpers automatically provide strict error handling via `setup_test_env`:
+
+```bash
+# Automatically enabled by setup_test_env:
+set -euo pipefail  # Exit on errors, undefined variables, pipe failures
+export LC_ALL=C    # Consistent locale for reproducible results
+```
+
+### 3. Descriptive Validation
+
+Use descriptive validation functions with meaningful descriptions:
+
+```bash
+# ✅ Good: Descriptive validation
+check_file_exists "$output_file" "filtered feature matrix"
+check_file_not_exists "$bam_file" "BAM file (should be disabled by default)"
+check_file_contains "$result_file" "expected_pattern" "analysis results"
+
+# ❌ Less helpful: Basic validation without context
+check_file_exists "$output_file"
+```
+
+### 4. Organized Structure
+
+Use `$meta_temp_dir` and create organized test structure:
+
+```bash
+# Create organized test structure
+test_data_dir="$meta_temp_dir/test_data"
+test_output_dir="$meta_temp_dir/test_output"
+mkdir -p "$test_data_dir" "$test_output_dir"
+
+create_test_fasta "$test_data_dir/input.fasta" 3 50
+```
+
+### 5. Clear Test Output
+
+Use consistent logging with clear test boundaries:
+
+```bash
+log "Starting TEST 1: Basic functionality"
+log "Executing $meta_name with basic parameters..."
+log "Validating TEST 1 outputs..."
+log "✅ TEST 1 completed successfully"
+
+# Final summary
+print_test_summary "All tests completed successfully"
+```
+
+### 6. Comprehensive Content Validation
+
+Don't just check that files exist - validate their content:
+
+```bash
+# Check existence and content
+check_file_exists "$meta_temp_dir/output.txt" "analysis results"
+check_file_not_empty "$meta_temp_dir/output.txt" "analysis results"
+check_file_contains "$meta_temp_dir/output.txt" "Number of sequences" "result summary"
+check_file_line_count "$meta_temp_dir/output.txt" 10 "expected number of results"
+```
+
+### 7. Multiple Test Scenarios
+
+Include comprehensive test coverage:
+
+```bash
+# Test 1: Basic functionality
+log "Starting TEST 1: Basic functionality"
+# ... test implementation ...
+log "✅ TEST 1 completed successfully"
+
+# Test 2: Advanced options
+log "Starting TEST 2: Advanced options"
+# ... test implementation ...
+log "✅ TEST 2 completed successfully"
+
+# Test 3: Edge cases
+log "Starting TEST 3: Edge case handling"
+# ... test implementation ...
+log "✅ TEST 3 completed successfully"
+
+print_test_summary "All tests completed successfully"
+```
+
+## Viash Testing Features
+
+### Running Tests
+
+```bash
+# Test a single component
+viash test config.vsh.yaml
+
+# Test with specific resources
+viash test config.vsh.yaml --cpus 4 --memory 8GB
+
+# Test with specific setup strategy
+viash test config.vsh.yaml --setup build --verbose
+
+# Keep temporary files for debugging
+viash test config.vsh.yaml --keep true
+
+# Test all components in parallel
+viash ns test --parallel
+
+# Test specific namespace
+viash ns test -q alignment --parallel
+```
+
+### Test Execution Flow
+
+When running `viash test`, Viash automatically:
+
+1. **Creates temporary directory** (available as `$meta_temp_dir`)
+2. **Builds the main executable**
+3. **Builds/pulls Docker image** (if using Docker engine)
+4. **Iterates over all test scripts** in `test_resources`
+5. **Builds each test into executable** and runs it
+6. **Cleans up** temporary files (unless `--keep true`)
+7. **Returns exit code 0** if all tests succeed
+
+### Meta Variables in Tests
+
+Your test scripts automatically have access to important meta variables:
+
+- `$meta_executable` - Path to the built component executable
+- `$meta_temp_dir` - Temporary directory for test files (automatically cleaned up)
+- `$meta_name` - Component name for logging
+- `$meta_resources_dir` - Path to test resources
+
+### Multiple Test Scripts
+
+You can add multiple test scripts to cover different scenarios:
+
+```yaml
+test_resources:
+  - type: bash_script
+    path: test_basic.sh
+  - type: bash_script
+    path: test_edge_cases.sh
+  - type: bash_script
+    path: test_large_data.sh
+  - type: file
+    path: /src/_utils/test_helpers.sh
+```
+
+### Advanced Testing Options
+
+```bash
+# Test with different container setup strategies
+viash test config.vsh.yaml --setup cachedbuild  # Use cached layers (faster)
+viash test config.vsh.yaml --setup build        # Clean build from scratch
+viash test config.vsh.yaml --setup alwaysbuild  # Always rebuild container
+
+# Test with configuration modifications
+viash test config.vsh.yaml -c '.engines[0].image = "ubuntu:22.04"'
+
+# Test with debug mode for troubleshooting
+viash test config.vsh.yaml --keep true --verbose
+```
+
+For more details, see the [Viash Unit Testing Documentation](https://viash.io/guide/component/unit-testing.html).
+
+## Static Test Data
+
+### When to Use Static Test Data
+
+Only use static test files when:
+
+- The tool requires very specific, complex file formats that are difficult to generate
+- Generating equivalent test data is impractical or overly complex
+- You need real-world data to validate complex algorithms
+- Test data is very small (<1KB preferred, <10KB maximum)
+
+### Guidelines for Static Test Data
+
+If you must use static test data:
+
+1. **Keep files small** - Prefer <1KB, maximum <10KB
+2. **Document the source** - How was it created?
+3. **Use minimal examples** - Strip down to essential features
+4. **Consider alternatives** - Can you generate equivalent data?
+
+```bash
+# test_data/README.md
+# Test data for complex_tool component
+# Source: https://github.com/example/dataset
+# Generated with: tool --export-sample --format minimal
+# Date: 2025-01-01
+# Size: 847 bytes
+# Purpose: Tests complex file format parsing
+```
+
+### Referencing Static Test Data
+
+```yaml
+test_resources:
+  - type: bash_script
+    path: test.sh
+  - type: file
+    path: /src/_utils/test_helpers.sh
+  - type: file
+    path: test_data
+```
+
+```bash
+# In your test script
+static_data="$meta_resources_dir/test_data/sample.complex"
+check_file_exists "$static_data" "static test data"
+
+"$meta_executable" --input "$static_data" --output "$meta_temp_dir/output.txt"
+```