Build pipeline: viash-hub.biobox.main-tb4cv
Source commit: 7158daa5f6
Source message: Fix bases2fastq component, update to latest practices (#190)
* wip updates
* refactor component
* assume bases2fastq follows semver
* fix version command
* add entry to changelog
* move to minor changes
7.2 KiB
7.2 KiB
Docker and Engine Best Practices
This guide covers best practices for setting up Docker engines and managing dependencies in biobox components.
Table of Contents
- Preferred Approach: Biocontainers
- Finding Biocontainers
- Version Detection
- Docker Run Syntax
- Custom Containers
- Recommended Base Containers
- Multi-tool Containers
- Container Optimization
- Testing Docker Setup
Preferred Approach: Biocontainers
Basic Setup
engines:
- type: docker
image: quay.io/biocontainers/bowtie2:2.5.4--he96a11b_6
setup:
- type: docker
run:
- bowtie2 --version 2>&1 | head -1 | sed 's/.*version /bowtie2: /' > /var/software_versions.txt
Key Requirements
- Use specific versions: Always pin to specific versions with build strings
- Include version detection: Add setup commands to create
/var/software_versions.txt - Verify command availability: Ensure the container has the required commands from
requirements.commands
Finding Biocontainers
Search Strategy
- Google search:
biocontainer <tool_name> - Direct URL:
https://quay.io/repository/biocontainers/<tool_name>?tab=tags - Check version compatibility: Choose the most recent stable version
- Verify build string: Include the complete version tag with build string
Version Selection
# Good: Specific version with build string
image: quay.io/biocontainers/samtools:1.17--hd87286a_2
# Bad: Latest or incomplete version
image: quay.io/biocontainers/samtools:latest
image: quay.io/biocontainers/samtools:1.17
Version Detection
Common Patterns
# Pattern 1: Tool outputs "Tool version X.Y.Z"
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt
# Pattern 2: Tool outputs just version number
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt
# Pattern 3: Complex version output, extract numeric part
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt
# Pattern 4: Version in specific format
tool --version 2>&1 | awk '{print "tool: " $NF}' > /var/software_versions.txt
Real Examples
# bowtie2
bowtie2 --version 2>&1 | head -1 | sed 's/.*version /bowtie2: /' > /var/software_versions.txt
# samtools
samtools --version 2>&1 | head -1 | sed 's/samtools /samtools: /' > /var/software_versions.txt
# fastqc
fastqc --version 2>&1 | sed 's/FastQC v/fastqc: /' > /var/software_versions.txt
Testing Version Detection
Always test your version detection command:
# Test in the container
docker run quay.io/biocontainers/tool:version bash -c "
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /'
"
Docker Run Syntax
List vs Multiline Strings
Preferred: List format
run:
# Single commands
- command1 arg1 arg2
- command2 arg1 arg2
# Chained commands
- command1 && command2 && command3
Alternative: Multiline strings (for complex commands)
run: |
command1 arg1 arg2 && \
command2 arg1 arg2 && \
command3 arg1 arg2
Important: Comments inside multiline strings (run: |) become Dockerfile RUN commands and will break the build. Use comments before the run: key or use the list format.
Custom Containers
When to Use Custom Containers
Use custom containers when:
- No suitable biocontainer exists
- You need to install additional dependencies
- You need a specific base environment (R, Python, etc.)
Python-based Tools
engines:
- type: docker
image: python:3.10-slim
setup:
- type: python
packages:
- numpy~=x.x.x
- pandas~=x.x.x
- scipy~=x.x.x
R-based Tools
engines:
- type: docker
image: rocker/r2u:24.04
setup:
- type: r
cran: [devtools, BiocManager]
bioc: [Biostrings, GenomicRanges]
Compilation from Source
engines:
- type: docker
image: ubuntu:22.04
setup:
- type: apt
packages: [build-essential, cmake, git, wget]
- type: docker
run:
- wget https://github.com/user/tool/archive/v1.0.tar.gz && tar -xzf v1.0.tar.gz
- cd tool-1.0 && make && make install
- echo "tool: 1.0" > /var/software_versions.txt
Recommended Base Containers
General Purpose
- Ubuntu:
ubuntu:22.04- Good for compilation and apt packages - Alpine:
alpine:latest- Minimal size, apk packages - Debian:
debian:bookworm-slim- Stable, well-supported
Language-Specific
Python
# Basic Python
image: python:3.10-slim
# With scientific packages
image: python:3.10
# GPU-enabled
image: nvcr.io/nvidia/pytorch:23.08-py3
R
# Fast package installation
image: rocker/r2u:24.04
# Tidyverse included
image: rocker/tidyverse:4.3.0
# Bioconductor base
image: bioconductor/bioconductor_docker:RELEASE_3_17
Node.js
# LTS version
image: node:18-slim
# Alpine variant
image: node:18-alpine
Other Languages
# Java
image: openjdk:11-jre-slim
# Go
image: golang:1.20-alpine
# Rust
image: rust:1.70-slim
# Ruby
image: ruby:3.1-slim
Multi-tool Containers
Installing Multiple Tools
engines:
- type: docker
image: ubuntu:22.04
setup:
- type: apt
packages: [wget, curl, build-essential]
- type: docker
run:
# Install tool 1
- wget https://tool1.com/download && install_tool1
# Install tool 2
- wget https://tool2.com/download && install_tool2
# Create version file
- echo "tool1: $(tool1 --version)" > /var/software_versions.txt
- echo "tool2: $(tool2 --version)" >> /var/software_versions.txt
Container Optimization
Layer Efficiency
# Good: Combine related commands
setup:
- type: docker
run: |
apt-get update && \
apt-get install -y wget curl && \
wget https://tool.com/download && \
install_tool && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Bad: Separate layers for each command
setup:
- type: apt
packages: [wget, curl]
- type: docker
run: wget https://tool.com/download
- type: docker
run: install_tool
- type: docker
run: apt-get clean
Testing Docker Setup
Viash Docker Debugging
# Inspect the generated Dockerfile
viash run config.vsh.yaml -- ---dockerfile
# Build with cached layers (faster)
viash run config.vsh.yaml -- ---setup cachedbuild ---verbose
# Build from scratch (clean build)
viash run config.vsh.yaml -- ---setup build ---verbose
# Enter interactive debugging session
viash run config.vsh.yaml -- ---debug
# Check installed tools (inside container)
which tool
tool --version
# Verify version file
cat /var/software_versions.txt
Common Issues
- Command not found: Tool not in PATH or not installed
- Version detection fails: Command syntax varies between tools
- Permission issues: Tools installed in wrong location
- Missing dependencies: Tool requires additional libraries