Files
biobox/CHANGELOG.md
CI 27d6133895 Build branch biobox/main with version main to biobox on branch main (9991e9a)
Build pipeline: viash-hub.biobox.main-l6hlj

Source commit: 9991e9a4f5

Source message: Bump bases2fastq to 2.2.1 (#202)
2025-10-03 08:31:27 +00:00

20 KiB

unreleased

MINOR CHANGES

  • falco: Update falco to 1.2.5 (PR #201).

  • bases2fastq: Bump from 2.2.0 to 2.2.1 (PR #202).

biobox 0.4.0

BREAKING CHANGES

  • fq_subsample has been removed after its functionality was previously copied to fq/fq_subsample. Please use the latter instead. (PR #182).

  • snpeff has been removed. Please use snpeff/snpeff_ann (which is a functional copy of snpeff) as this is the default subcommand when running this tool (PR #194)

NEW FUNCTIONALITY

  • fq: Added two new components for FASTQ file processing (PR #182):

    • fq/fq_filter: Filter FASTQ files based on record names or sequence patterns.
    • fq/fq_generate: Generate a random FASTQ file pair for testing and simulation purposes.
  • bwa: Added BWA support for single-end and paired-end read alignment (PR #183).

    • bwa/bwa_index: Create BWA index files for reference genome alignment.
    • bwa/bwa_mem: BWA-MEM algorithm for sequence alignment supporting single-end and paired-end reads.
    • bwa/bwa_aln: BWA aln algorithm for aligning short sequence reads to a reference genome.
    • bwa/bwa_samse: BWA samse - generate single-end alignment in SAM format from BWA aln SAI files.
    • bwa/bwa_sampe: BWA sampe - generate paired-end alignment in SAM format from BWA aln SAI files.
  • bowtie2: Add support for Bowtie2 alignment and indexing (PR #184).

    • bowtie2/bowtie2_build: Build Bowtie2 index files from reference sequences.
    • bowtie2/bowtie2_align: Align single-end and paired-end reads using Bowtie2.
    • bowtie2/bowtie2_inspect: Extract information from Bowtie2 index files.
  • bedtools: Major expansion with 32 new components providing comprehensive genomic interval analysis (PR #188):

    • bedtools/bedtools_annotate: Annotate coverage based on overlaps with interval files
    • bedtools/bedtools_bedpetobam: Convert BEDPE to BAM format
    • bedtools/bedtools_closest: Find closest features between two interval files
    • bedtools/bedtools_cluster: Cluster nearby intervals
    • bedtools/bedtools_complement: Report intervals not covered by features
    • bedtools/bedtools_coverage: Compute coverage of features
    • bedtools/bedtools_expand: Expand blocked BED features
    • bedtools/bedtools_fisher: Compute Fisher's exact test for overlaps
    • bedtools/bedtools_flank: Create flanking intervals around features
    • bedtools/bedtools_igv: Create IGV batch scripts for visualization
    • bedtools/bedtools_jaccard: Compute Jaccard statistic between interval sets
    • bedtools/bedtools_makewindows: Make windows across genome or intervals
    • bedtools/bedtools_map: Map values from overlapping intervals
    • bedtools/bedtools_maskfasta: Mask FASTA sequences using intervals
    • bedtools/bedtools_multicov: Count coverage across multiple BAM files
    • bedtools/bedtools_multiinter: Identify common intervals across multiple files
    • bedtools/bedtools_overlap: Compute overlaps between paired-end reads and intervals
    • bedtools/bedtools_pairtobed: Find overlaps between paired-end reads and intervals
    • bedtools/bedtools_pairtopair: Find overlaps between paired-end read sets
    • bedtools/bedtools_random: Generate random intervals
    • bedtools/bedtools_reldist: Compute relative distances between features
    • bedtools/bedtools_sample: Sample random subsets of intervals
    • bedtools/bedtools_shift: Shift intervals by specified amounts
    • bedtools/bedtools_shuffle: Shuffle intervals while preserving size
    • bedtools/bedtools_slop: Extend intervals by specified amounts
    • bedtools/bedtools_spacing: Report spacing between intervals
    • bedtools/bedtools_split: Split BED12 features into individual intervals
    • bedtools/bedtools_subtract: Remove overlapping features
    • bedtools/bedtools_summary: Summarize interval statistics
    • bedtools/bedtools_tag: Tag BAM alignments with overlapping intervals
    • bedtools/bedtools_unionbedg: Combine multiple BEDGRAPH files
    • bedtools/bedtools_window: Find overlapping features within specified windows
  • Developer tools: Added GitHub Copilot integration (PR #192):

    • .github/copilot-instructions.md: Complete coding assistant guide with biobox patterns, examples, and best practices
    • .github/prompts/update-viash-component.prompt.md: Step-by-step prompt for updating existing components
    • .github/prompts/add-viash-component.prompt.md: Comprehensive prompt for creating new components from scratch

MAJOR CHANGES

  • bedtools: Enhanced 11 existing bedtools components with improved functionality and standardized interfaces (PR #188):

    • bedtools/bedtools_bamtobed: Enhanced with additional output format options
    • bedtools/bedtools_bamtofastq: Improved paired-end read handling
    • bedtools/bedtools_bed12tobed6: Standardized parameter handling
    • bedtools/bedtools_bedtobam: Enhanced genome file support
    • bedtools/bedtools_genomecov: Added scale and split options
    • bedtools/bedtools_getfasta: Improved FASTA extraction features
    • bedtools/bedtools_groupby: Enhanced grouping and operation options
    • bedtools/bedtools_intersect: Expanded intersection mode support
    • bedtools/bedtools_links: Improved link generation functionality
    • bedtools/bedtools_merge: Enhanced merging options and distance parameters
    • bedtools/bedtools_sort: Standardized sorting options
  • bcftools: Updated components to version 1.22 with comprehensive improvements including enhanced argument coverage, improved script patterns, biobox standard compliance, and comprehensive testing overhaul (PR #193):

    • bcftools_annotate: Added --verbosity parameter; updated to use meta_cpus instead of --threads parameter
    • bcftools_concat: Renamed --compact_PS to --compact_ps, --min_PQ to --min_pq; added --rm_dups, --drop_genotypes, --verbosity, --write_index parameters; updated to use meta_cpus instead of --threads parameter
    • bcftools_norm: Renamed --remove_duplicates to --rm_dup, added --remove_duplicates_flag as boolean alias; added --exclude, --include, --gff_annot, --multi_overlaps, --sort, --verbosity, --write_index parameters; updated to use meta_cpus instead of --threads parameter
    • bcftools_sort: Removed --max_mem and --temp_dir parameters (now use meta_memory_mb and meta_temp_dir respectively); added --verbosity, --write_index parameters
    • bcftools_stats: Renamed --allele_frequency_bins to --af_bins, --allele_frequency_bins_file removed, --allele_frequency_tag to --af_tag, --fasta_reference to --fasta_ref, --split_by_ID to --split_by_id, --targets_overlaps to --targets_overlap

MINOR CHANGES

  • bases2fastq: Updated component with comprehensive argument support and latest practices (PR #190).

  • arriba: Updated to v2.5.0 and refactored script and tests based on latest contributing guidelines (PR #187).

  • snpeff has been updated to version 5.2f (PR #194)

BUG FIXES

  • Fix the commands property from components being overwritten by the global configuration (which only included ps) (PR #196).

DOCUMENTATION

  • Major restructuring of the documentation pages (PR #185):

    • CONTRIBUTING.md: Streamlined guide with detailed sections moved to dedicated docs/ guides.
    • README.md: Streamlined content to guide people towards what they need.
    • docs/COMPONENT_DEVELOPMENT.md: New comprehensive guide covering component creation process.
    • docs/SCRIPT_DEVELOPMENT.md: New detailed guide for script development best practices.
    • docs/TESTING.md: New comprehensive testing guide.
    • docs/DOCKER_GUIDE.md: New Docker and engine best practices guide.
  • .github/PULL_REQUEST_TEMPLATE.md: Fixed repository references to point to correct biobox repository instead of base template (PR #185).

biobox 0.3.2

NEW FUNCTIONALITY

  • fq:
    • fq/fq_lint: Validate FASTQ files for common issues (PR #179).
    • fq/fq_subsample: Sample a subset of records from single or paired FASTQ files (PR #179).

MAJOR CHANGES

  • fq_subsample: This component has been deprecated in favour of fq/fq_subsample, and will be removed in biobox 0.4.0 (PR #179).

MINOR CHANGES

  • Update README (PR #177).

  • Update author information (PR #180, PR #200).

  • fastqc: add --outdir argument (PR #181).

biobox 0.3.1

NEW FUNCTIONALITY

  • bcl_convert: add force argument (PR #171).
  • cellranger/cellranger_count: Align fastq files using Cell Ranger count (PR #163).

MINOR CHANGES

  • Replace the deprecated use of the meta variable functionality_name by just name (PR #174).

  • Bump viash to 0.9.4 (PR #175).

DOCUMENTATION

  • Update README (PR #176).

biobox 0.3.0

NEW FUNCTIONALITY

  • agat:

    • agat/agat_convert_genscan2gff: convert a genscan file into a GFF file (PR #100).
    • agat/agat_sp_add_introns: add intron features to gtf/gff file without intron features (PR #104).
    • agat/agat_sp_filter_feature_from_kill_list: remove features in a GFF file based on a kill list (PR #105).
    • agat/agat_sp_merge_annotations: merge different gff annotation files in one (PR #106).
    • agat/agat_sp_statistics: provides exhaustive statistics of a gft/gff file (PR #107).
    • agat/agat_sq_stat_basic: provide basic statistics of a gtf/gff file (PR #110).
  • bd_rhapsody/bd_rhapsody_sequence_analysis: BD Rhapsody Sequence Analysis CWL pipeline (PR #96).

  • bedtools:

    • bedtools/bedtools_bamtobed: Converts BAM alignments to BED6 or BEDPE format (PR #109).
  • rsem/rsem_calculate_expression: Calculate expression levels (PR #93).

  • cellranger:

    • cellranger/cellranger_mkref: Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files (PR #164).
  • rseqc:

    • rseqc/rseqc_inner_distance: Calculate inner distance between read pairs (PR #159).
    • rseqc/rseqc_inferexperiment: Infer strandedness from sequencing reads (PR #158).
    • rseqc/bam_stat: Generate statistics from a bam file (PR #155).
  • nanoplot: Plotting tool for long read sequencing data and alignments (PR #95).

  • sgedemux: demultiplexing sequencing data generated on Singular Genomics' sequencing instruments (PR #166).

  • bases2fasta: demultiplexing sequencing data generated by Element Biosciences instruments (PR #167).

BUG FIXES

  • falco: Fix a typo in the --reverse_complement argument (PR #157).

  • cutadapt: Fix the the non-functional action parameter (PR #161).

  • bbmap_bbsplit: Change argument type of build to file and add output argument index (PR #162).

  • kallisto/kallisto_index: Fix command script to use --threads option (PR #162).

  • kallisto/kallisto_quant: Change type of argument output_dir to file and add output argument log (PR #162).

  • rsem/rsem_calculate_expression: Fix output handling (PR #162).

  • sortmerna: Change type pf argument aligned to file; update docker image; accept more than two reference files (PR #162).

  • umi_tools/umi_tools_extract: Remove umi_discard_reads option and change log2stderr to input argument (PR #162).

  • star/star_genome_generate: Fix passing of optional sjdb parameters (PR #170).

MINOR CHANGES

  • agat_convert_bed2gff: change type of argument inflate_off from boolean_false to boolean_true (PR #160).

  • cutadapt: change type of argument no_indels and no_match_adapter_wildcards from boolean_false to boolean_true (PR #160).

  • Upgrade to Viash 0.9.0.

  • bbmap_bbsplit: Move to namespace bbmap (PR #162).

biobox 0.2.0

BREAKING CHANGES

  • star/star_align_reads: Change all arguments from --camelCase to --snake_case (PR #62).

  • star/star_genome_generate: Change all arguments from --camelCase to --snake_case (PR #62).

NEW FUNCTIONALITY

  • star/star_align_reads: Add star solo related arguments (PR #62).

  • bd_rhapsody/bd_rhapsody_make_reference: Create a reference for the BD Rhapsody pipeline (PR #75).

  • umitools/umitools_dedup: Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read (PR #54).

  • seqtk:

    • seqtk/seqtk_sample: Subsamples sequences from FASTA/Q files (PR #68).
    • seqtk/seqtk_subseq: Extract the sequences (complete or subsequence) from the FASTA/FASTQ files based on a provided sequence IDs or region coordinates file (PR #85).
  • agat:

    • agat_convert_sp_gff2gtf: convert any GTF/GFF file into a proper GTF file (PR #76).
    • agat_convert_bed2gff: convert bed file to gff format (PR #97).
    • agat_convert_embl2gff: convert an EMBL file into GFF format (PR #99).
    • agat/agat_convert_sp_gff2gtf: convert any GTF/GFF file into a proper GTF file (PR #76).
    • agat/agat_convert_bed2gff: convert bed file to gff format (PR #97).
    • agat/agat_convert_mfannot2gff: convert MFannot "masterfile" annotation to gff format (PR #112).
    • agat/agat_convert_embl2gff: convert an EMBL file into GFF format (PR #99).
    • agat/agat_convert_sp_gff2tsv: convert gtf/gff file into tabulated file (PR #102).
    • agat/agat_convert_sp_gxf2gxf: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
  • bedtools:

    • bedtools/bedtools_intersect: Allows one to screen for overlaps between two sets of genomic features (PR #94).
    • bedtools/bedtools_sort: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
    • bedtools/bedtools_genomecov: Compute the coverage of a feature file (bed/gff/vcf/bam) among a genome (PR #128).
    • bedtools/bedtools_groupby: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).
    • bedtools/bedtools_merge: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
    • bedtools/bedtools_bamtofastq: Convert BAM alignments to FASTQ files (PR #101).
    • bedtools/bedtools_bedtobam: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
    • bedtools/bedtools_bed12tobed6: Converts BED12 files to BED6 files (PR #140).
    • bedtools/bedtools_links: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).
  • qualimap/qualimap_rnaseq: RNA-seq QC analysis using qualimap (PR #74).

  • rsem/rsem_prepare_reference: Prepare transcript references for RSEM (PR #89).

  • bcftools:

    • bcftools/bcftools_concat: Concatenate or combine VCF/BCF files (PR #145).
    • bcftools/bcftools_norm: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).
    • bcftools/bcftools_annotate: Add or remove annotations from a VCF/BCF file (PR #143).
    • bcftools/bcftools_stats: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
    • bcftools/bcftools_sort: Sorts BCF/VCF files by position and other criteria (PR #141).
  • fastqc: High throughput sequence quality control analysis tool (PR #92).

  • sortmerna: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic data (PR #146).

  • fq_subsample: Sample a subset of records from single or paired FASTQ files (PR #147).

  • kallisto:

    • kallisto_index: Create a kallisto index (PR #149).
    • kallisto_quant: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads (PR #152).
  • trimgalore: Quality and adapter trimming for fastq files (PR #117).

MINOR CHANGES

  • busco components: update BUSCO to 5.7.1 (PR #72).

  • Update CI to reusable workflow in viash-io/viash-actions (PR #86).

  • Update several components in order to avoid duplicate code when using unset on boolean arguments (PR #133).

  • Bump viash to 0.9.0-RC7 (PR #134)

DOCUMENTATION

  • Extend the contributing guidelines (PR #82):

    • Update format to Viash 0.9.

    • Descriptions should be formatted in markdown.

    • Add defaults to descriptions, not as a default of the argument.

    • Explain parameter expansion.

    • Mention that the contents of the output of components in tests should be checked.

  • Add authorship to existing components (PR #88).

BUG FIXES

  • pear: fix component not exiting with the correct exitcode when PEAR fails (PR #70).

  • cutadapt: fix --par_quality_cutoff_r2 argument (PR #69).

  • cutadapt: demultiplexing is now disabled by default. It can be re-enabled by using demultiplex_mode (PR #69).

  • multiqc: update multiple separator to ; (PR #81).

biobox 0.1.0

NEW FEATURES

  • arriba: Detect gene fusions from RNA-seq data (PR #1).

  • fastp: An ultra-fast all-in-one FASTQ preprocessor (PR #3).

  • busco:

    • busco/busco_run: Assess genome assembly and annotation completeness with single copy orthologs (PR #6).
    • busco/busco_list_datasets: Lists available busco datasets (PR #18).
    • busco/busco_download_datasets: Download busco datasets (PR #19).
  • cutadapt: Remove adapter sequences from high-throughput sequencing reads (PR #7).

  • featurecounts: Assign sequence reads to genomic features (PR #11).

  • bgzip: Add bgzip functionality to compress and decompress files (PR #13).

  • pear: Paired-end read merger (PR #10).

  • lofreq/call: Call variants from a BAM file (PR #17).

  • lofreq/indelqual: Insert indel qualities into BAM file (PR #17).

  • multiqc: Aggregate results from bioinformatics analyses across many samples into a single report (PR #42).

  • star:

    • star/star_align_reads: Align reads to a reference genome (PR #22).
    • star/star_genome_generate: Generate a genome index for STAR alignment (PR #58).
  • gffread: Validate, filter, convert and perform other operations on GFF files (PR #29).

  • salmon:

    • salmon/salmon_index: Create a salmon index for the transcriptome to use Salmon in the mapping-based mode (PR #24).
    • salmon/salmon_quant: Transcript quantification from RNA-seq data (PR #24).
  • samtools:

    • samtools/samtools_flagstat: Counts the number of alignments in SAM/BAM/CRAM files for each FLAG type (PR #31).
    • samtools/samtools_idxstats: Reports alignment summary statistics for a SAM/BAM/CRAM file (PR #32).
    • samtools/samtools_index: Index SAM/BAM/CRAM files (PR #35).
    • samtools/samtools_sort: Sort SAM/BAM/CRAM files (PR #36).
    • samtools/samtools_stats: Reports alignment summary statistics for a BAM file (PR #39).
    • samtools/samtools_faidx: Indexes FASTA files to enable random access to fasta and fastq files (PR #41).
    • samtools/samtools_collate: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).
    • samtools/samtools_view: Views and converts SAM/BAM/CRAM files (PR #48).
    • samtools/samtools_fastq: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).
    • samtools/samtools_fastq: Converts a SAM/BAM/CRAM file to FASTA (PR #53).
  • umi_tools:

    • umi_tools/umi_tools_extract: Flexible removal of UMI sequences from fastq reads (PR #71).
    • umi_tools/umi_tools_prepareforrsem: Fix paired-end reads in name sorted BAM file to prepare for RSEM (PR #148).
  • falco: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).

  • bedtools:

    • bedtools_getfasta: extract sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file (PR #59).
  • bbmap:

    • bbmap_bbsplit: Split sequencing reads by mapping them to multiple references simultaneously (PR #138).

MINOR CHANGES

  • Uniformize component metadata (PR #23).

  • Update to Viash 0.8.5 (PR #25).

  • Update to Viash 0.9.0-RC3 (PR #51).

  • Update to Viash 0.9.0-RC6 (PR #63).

  • Switch to viash-hub/toolbox actions (PR #64).

DOCUMENTATION

  • Update README (PR #64).

BUG FIXES

  • Add escaping character before leading hashtag in the description field of the config file (PR #50).

  • Format URL in biobase/bcl_convert description (PR #55).