Files
biobox/docs/DOCKER_GUIDE.md
CI 04a5851ff8 Build branch biobox/main with version main to biobox on branch main (7158daa)
Build pipeline: viash-hub.biobox.main-tb4cv

Source commit: 7158daa5f6

Source message: Fix bases2fastq component, update to latest practices (#190)

* wip updates

* refactor component

* assume bases2fastq follows semver

* fix version command

* add entry to changelog

* move to minor changes
2025-09-01 11:04:56 +00:00

7.2 KiB

Docker and Engine Best Practices

This guide covers best practices for setting up Docker engines and managing dependencies in biobox components.

Table of Contents

Preferred Approach: Biocontainers

Basic Setup

engines:
  - type: docker
    image: quay.io/biocontainers/bowtie2:2.5.4--he96a11b_6
    setup:
      - type: docker
        run:
          - bowtie2 --version 2>&1 | head -1 | sed 's/.*version /bowtie2: /' > /var/software_versions.txt

Key Requirements

  1. Use specific versions: Always pin to specific versions with build strings
  2. Include version detection: Add setup commands to create /var/software_versions.txt
  3. Verify command availability: Ensure the container has the required commands from requirements.commands

Finding Biocontainers

Search Strategy

  1. Google search: biocontainer <tool_name>
  2. Direct URL: https://quay.io/repository/biocontainers/<tool_name>?tab=tags
  3. Check version compatibility: Choose the most recent stable version
  4. Verify build string: Include the complete version tag with build string

Version Selection

# Good: Specific version with build string
image: quay.io/biocontainers/samtools:1.17--hd87286a_2

# Bad: Latest or incomplete version
image: quay.io/biocontainers/samtools:latest
image: quay.io/biocontainers/samtools:1.17

Version Detection

Common Patterns

# Pattern 1: Tool outputs "Tool version X.Y.Z"
tool --version 2>&1 | head -1 | sed 's/.*version /tool: /' > /var/software_versions.txt

# Pattern 2: Tool outputs just version number
echo "tool: $(tool --version 2>&1 | head -1)" > /var/software_versions.txt

# Pattern 3: Complex version output, extract numeric part
tool --version 2>&1 | grep -E "^[0-9]" | head -1 | sed 's/^/tool: /' > /var/software_versions.txt

# Pattern 4: Version in specific format
tool --version 2>&1 | awk '{print "tool: " $NF}' > /var/software_versions.txt

Real Examples

# bowtie2
bowtie2 --version 2>&1 | head -1 | sed 's/.*version /bowtie2: /' > /var/software_versions.txt

# samtools
samtools --version 2>&1 | head -1 | sed 's/samtools /samtools: /' > /var/software_versions.txt

# fastqc
fastqc --version 2>&1 | sed 's/FastQC v/fastqc: /' > /var/software_versions.txt

Testing Version Detection

Always test your version detection command:

# Test in the container
docker run quay.io/biocontainers/tool:version bash -c "
  tool --version 2>&1 | head -1 | sed 's/.*version /tool: /'
"

Docker Run Syntax

List vs Multiline Strings

Preferred: List format

run:
  # Single commands
  - command1 arg1 arg2
  - command2 arg1 arg2
  # Chained commands
  - command1 && command2 && command3

Alternative: Multiline strings (for complex commands)

run: |
  command1 arg1 arg2 && \
  command2 arg1 arg2 && \
  command3 arg1 arg2

Important: Comments inside multiline strings (run: |) become Dockerfile RUN commands and will break the build. Use comments before the run: key or use the list format.

Custom Containers

When to Use Custom Containers

Use custom containers when:

  • No suitable biocontainer exists
  • You need to install additional dependencies
  • You need a specific base environment (R, Python, etc.)

Python-based Tools

engines:
  - type: docker
    image: python:3.10-slim
    setup:
      - type: python
        packages: 
        - numpy~=x.x.x
        - pandas~=x.x.x
        - scipy~=x.x.x

R-based Tools

engines:
  - type: docker
    image: rocker/r2u:24.04
    setup:
      - type: r
        cran: [devtools, BiocManager]
        bioc: [Biostrings, GenomicRanges]

Compilation from Source

engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [build-essential, cmake, git, wget]
      - type: docker
        run:
          - wget https://github.com/user/tool/archive/v1.0.tar.gz && tar -xzf v1.0.tar.gz
          - cd tool-1.0 && make && make install
          - echo "tool: 1.0" > /var/software_versions.txt

General Purpose

  • Ubuntu: ubuntu:22.04 - Good for compilation and apt packages
  • Alpine: alpine:latest - Minimal size, apk packages
  • Debian: debian:bookworm-slim - Stable, well-supported

Language-Specific

Python

# Basic Python
image: python:3.10-slim

# With scientific packages
image: python:3.10

# GPU-enabled
image: nvcr.io/nvidia/pytorch:23.08-py3

R

# Fast package installation
image: rocker/r2u:24.04

# Tidyverse included
image: rocker/tidyverse:4.3.0

# Bioconductor base
image: bioconductor/bioconductor_docker:RELEASE_3_17

Node.js

# LTS version
image: node:18-slim

# Alpine variant
image: node:18-alpine

Other Languages

# Java
image: openjdk:11-jre-slim

# Go
image: golang:1.20-alpine

# Rust
image: rust:1.70-slim

# Ruby
image: ruby:3.1-slim

Multi-tool Containers

Installing Multiple Tools

engines:
  - type: docker
    image: ubuntu:22.04
    setup:
      - type: apt
        packages: [wget, curl, build-essential]
      - type: docker
        run:
          # Install tool 1
          - wget https://tool1.com/download && install_tool1
          # Install tool 2  
          - wget https://tool2.com/download && install_tool2
          # Create version file
          - echo "tool1: $(tool1 --version)" > /var/software_versions.txt
          - echo "tool2: $(tool2 --version)" >> /var/software_versions.txt

Container Optimization

Layer Efficiency

# Good: Combine related commands
setup:
  - type: docker
    run: |
      apt-get update && \
      apt-get install -y wget curl && \
      wget https://tool.com/download && \
      install_tool && \
      apt-get clean && \
      rm -rf /var/lib/apt/lists/*

# Bad: Separate layers for each command
setup:
  - type: apt
    packages: [wget, curl]
  - type: docker
    run: wget https://tool.com/download
  - type: docker
    run: install_tool
  - type: docker
    run: apt-get clean

Testing Docker Setup

Viash Docker Debugging

# Inspect the generated Dockerfile
viash run config.vsh.yaml -- ---dockerfile

# Build with cached layers (faster)
viash run config.vsh.yaml -- ---setup cachedbuild ---verbose

# Build from scratch (clean build)
viash run config.vsh.yaml -- ---setup build ---verbose

# Enter interactive debugging session
viash run config.vsh.yaml -- ---debug

# Check installed tools (inside container)
which tool
tool --version

# Verify version file
cat /var/software_versions.txt

Common Issues

  1. Command not found: Tool not in PATH or not installed
  2. Version detection fails: Command syntax varies between tools
  3. Permission issues: Tools installed in wrong location
  4. Missing dependencies: Tool requires additional libraries