Build pipeline: viash-hub.biobox.main-l6hlj
Source commit: 9991e9a4f5
Source message: Bump bases2fastq to 2.2.1 (#202)
20 KiB
unreleased
MINOR CHANGES
-
falco: Update falco to 1.2.5 (PR #201). -
bases2fastq: Bump from 2.2.0 to 2.2.1 (PR #202).
biobox 0.4.0
BREAKING CHANGES
-
fq_subsamplehas been removed after its functionality was previously copied tofq/fq_subsample. Please use the latter instead. (PR #182). -
snpeffhas been removed. Please usesnpeff/snpeff_ann(which is a functional copy ofsnpeff) as this is the default subcommand when running this tool (PR #194)
NEW FUNCTIONALITY
-
fq: Added two new components for FASTQ file processing (PR #182):fq/fq_filter: Filter FASTQ files based on record names or sequence patterns.fq/fq_generate: Generate a random FASTQ file pair for testing and simulation purposes.
-
bwa: Added BWA support for single-end and paired-end read alignment (PR #183).bwa/bwa_index: Create BWA index files for reference genome alignment.bwa/bwa_mem: BWA-MEM algorithm for sequence alignment supporting single-end and paired-end reads.bwa/bwa_aln: BWA aln algorithm for aligning short sequence reads to a reference genome.bwa/bwa_samse: BWA samse - generate single-end alignment in SAM format from BWA aln SAI files.bwa/bwa_sampe: BWA sampe - generate paired-end alignment in SAM format from BWA aln SAI files.
-
bowtie2: Add support for Bowtie2 alignment and indexing (PR #184).bowtie2/bowtie2_build: Build Bowtie2 index files from reference sequences.bowtie2/bowtie2_align: Align single-end and paired-end reads using Bowtie2.bowtie2/bowtie2_inspect: Extract information from Bowtie2 index files.
-
bedtools: Major expansion with 32 new components providing comprehensive genomic interval analysis (PR #188):bedtools/bedtools_annotate: Annotate coverage based on overlaps with interval filesbedtools/bedtools_bedpetobam: Convert BEDPE to BAM formatbedtools/bedtools_closest: Find closest features between two interval filesbedtools/bedtools_cluster: Cluster nearby intervalsbedtools/bedtools_complement: Report intervals not covered by featuresbedtools/bedtools_coverage: Compute coverage of featuresbedtools/bedtools_expand: Expand blocked BED featuresbedtools/bedtools_fisher: Compute Fisher's exact test for overlapsbedtools/bedtools_flank: Create flanking intervals around featuresbedtools/bedtools_igv: Create IGV batch scripts for visualizationbedtools/bedtools_jaccard: Compute Jaccard statistic between interval setsbedtools/bedtools_makewindows: Make windows across genome or intervalsbedtools/bedtools_map: Map values from overlapping intervalsbedtools/bedtools_maskfasta: Mask FASTA sequences using intervalsbedtools/bedtools_multicov: Count coverage across multiple BAM filesbedtools/bedtools_multiinter: Identify common intervals across multiple filesbedtools/bedtools_overlap: Compute overlaps between paired-end reads and intervalsbedtools/bedtools_pairtobed: Find overlaps between paired-end reads and intervalsbedtools/bedtools_pairtopair: Find overlaps between paired-end read setsbedtools/bedtools_random: Generate random intervalsbedtools/bedtools_reldist: Compute relative distances between featuresbedtools/bedtools_sample: Sample random subsets of intervalsbedtools/bedtools_shift: Shift intervals by specified amountsbedtools/bedtools_shuffle: Shuffle intervals while preserving sizebedtools/bedtools_slop: Extend intervals by specified amountsbedtools/bedtools_spacing: Report spacing between intervalsbedtools/bedtools_split: Split BED12 features into individual intervalsbedtools/bedtools_subtract: Remove overlapping featuresbedtools/bedtools_summary: Summarize interval statisticsbedtools/bedtools_tag: Tag BAM alignments with overlapping intervalsbedtools/bedtools_unionbedg: Combine multiple BEDGRAPH filesbedtools/bedtools_window: Find overlapping features within specified windows
-
Developer tools: Added GitHub Copilot integration (PR #192):
.github/copilot-instructions.md: Complete coding assistant guide with biobox patterns, examples, and best practices.github/prompts/update-viash-component.prompt.md: Step-by-step prompt for updating existing components.github/prompts/add-viash-component.prompt.md: Comprehensive prompt for creating new components from scratch
MAJOR CHANGES
-
bedtools: Enhanced 11 existing bedtools components with improved functionality and standardized interfaces (PR #188):bedtools/bedtools_bamtobed: Enhanced with additional output format optionsbedtools/bedtools_bamtofastq: Improved paired-end read handlingbedtools/bedtools_bed12tobed6: Standardized parameter handlingbedtools/bedtools_bedtobam: Enhanced genome file supportbedtools/bedtools_genomecov: Added scale and split optionsbedtools/bedtools_getfasta: Improved FASTA extraction featuresbedtools/bedtools_groupby: Enhanced grouping and operation optionsbedtools/bedtools_intersect: Expanded intersection mode supportbedtools/bedtools_links: Improved link generation functionalitybedtools/bedtools_merge: Enhanced merging options and distance parametersbedtools/bedtools_sort: Standardized sorting options
-
bcftools: Updated components to version 1.22 with comprehensive improvements including enhanced argument coverage, improved script patterns, biobox standard compliance, and comprehensive testing overhaul (PR #193):bcftools_annotate: Added--verbosityparameter; updated to usemeta_cpusinstead of--threadsparameterbcftools_concat: Renamed--compact_PSto--compact_ps,--min_PQto--min_pq; added--rm_dups,--drop_genotypes,--verbosity,--write_indexparameters; updated to usemeta_cpusinstead of--threadsparameterbcftools_norm: Renamed--remove_duplicatesto--rm_dup, added--remove_duplicates_flagas boolean alias; added--exclude,--include,--gff_annot,--multi_overlaps,--sort,--verbosity,--write_indexparameters; updated to usemeta_cpusinstead of--threadsparameterbcftools_sort: Removed--max_memand--temp_dirparameters (now usemeta_memory_mbandmeta_temp_dirrespectively); added--verbosity,--write_indexparametersbcftools_stats: Renamed--allele_frequency_binsto--af_bins,--allele_frequency_bins_fileremoved,--allele_frequency_tagto--af_tag,--fasta_referenceto--fasta_ref,--split_by_IDto--split_by_id,--targets_overlapsto--targets_overlap
MINOR CHANGES
-
bases2fastq: Updated component with comprehensive argument support and latest practices (PR #190). -
arriba: Updated to v2.5.0 and refactored script and tests based on latest contributing guidelines (PR #187). -
snpeffhas been updated to version5.2f(PR #194)
BUG FIXES
- Fix the
commandsproperty from components being overwritten by the global configuration (which only includedps) (PR #196).
DOCUMENTATION
-
Major restructuring of the documentation pages (PR #185):
CONTRIBUTING.md: Streamlined guide with detailed sections moved to dedicated docs/ guides.README.md: Streamlined content to guide people towards what they need.docs/COMPONENT_DEVELOPMENT.md: New comprehensive guide covering component creation process.docs/SCRIPT_DEVELOPMENT.md: New detailed guide for script development best practices.docs/TESTING.md: New comprehensive testing guide.docs/DOCKER_GUIDE.md: New Docker and engine best practices guide.
-
.github/PULL_REQUEST_TEMPLATE.md: Fixed repository references to point to correct biobox repository instead of base template (PR #185).
biobox 0.3.2
NEW FUNCTIONALITY
fq:fq/fq_lint: Validate FASTQ files for common issues (PR #179).fq/fq_subsample: Sample a subset of records from single or paired FASTQ files (PR #179).
MAJOR CHANGES
fq_subsample: This component has been deprecated in favour offq/fq_subsample, and will be removed in biobox 0.4.0 (PR #179).
MINOR CHANGES
-
Update README (PR #177).
-
Update author information (PR #180, PR #200).
-
fastqc: add--outdirargument (PR #181).
biobox 0.3.1
NEW FUNCTIONALITY
bcl_convert: addforceargument (PR #171).cellranger/cellranger_count: Align fastq files using Cell Ranger count (PR #163).
MINOR CHANGES
-
Replace the deprecated use of the meta variable
functionality_nameby justname(PR #174). -
Bump viash to
0.9.4(PR #175).
DOCUMENTATION
- Update README (PR #176).
biobox 0.3.0
NEW FUNCTIONALITY
-
agat:agat/agat_convert_genscan2gff: convert a genscan file into a GFF file (PR #100).agat/agat_sp_add_introns: add intron features to gtf/gff file without intron features (PR #104).agat/agat_sp_filter_feature_from_kill_list: remove features in a GFF file based on a kill list (PR #105).agat/agat_sp_merge_annotations: merge different gff annotation files in one (PR #106).agat/agat_sp_statistics: provides exhaustive statistics of a gft/gff file (PR #107).agat/agat_sq_stat_basic: provide basic statistics of a gtf/gff file (PR #110).
-
bd_rhapsody/bd_rhapsody_sequence_analysis: BD Rhapsody Sequence Analysis CWL pipeline (PR #96). -
bedtools:bedtools/bedtools_bamtobed: Converts BAM alignments to BED6 or BEDPE format (PR #109).
-
rsem/rsem_calculate_expression: Calculate expression levels (PR #93). -
cellranger:cellranger/cellranger_mkref: Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files (PR #164).
-
rseqc:rseqc/rseqc_inner_distance: Calculate inner distance between read pairs (PR #159).rseqc/rseqc_inferexperiment: Infer strandedness from sequencing reads (PR #158).rseqc/bam_stat: Generate statistics from a bam file (PR #155).
-
nanoplot: Plotting tool for long read sequencing data and alignments (PR #95). -
sgedemux: demultiplexing sequencing data generated on Singular Genomics' sequencing instruments (PR #166). -
bases2fasta: demultiplexing sequencing data generated by Element Biosciences instruments (PR #167).
BUG FIXES
-
falco: Fix a typo in the--reverse_complementargument (PR #157). -
cutadapt: Fix the the non-functionalactionparameter (PR #161). -
bbmap_bbsplit: Change argument type ofbuildtofileand add output argumentindex(PR #162). -
kallisto/kallisto_index: Fix command script to use--threadsoption (PR #162). -
kallisto/kallisto_quant: Change type of argumentoutput_dirtofileand add output argumentlog(PR #162). -
rsem/rsem_calculate_expression: Fix output handling (PR #162). -
sortmerna: Change type pf argumentalignedtofile; update docker image; accept more than two reference files (PR #162). -
umi_tools/umi_tools_extract: Removeumi_discard_readsoption and changelog2stderrto input argument (PR #162). -
star/star_genome_generate: Fix passing of optional sjdb parameters (PR #170).
MINOR CHANGES
-
agat_convert_bed2gff: change type of argumentinflate_offfromboolean_falsetoboolean_true(PR #160). -
cutadapt: change type of argumentno_indelsandno_match_adapter_wildcardsfromboolean_falsetoboolean_true(PR #160). -
Upgrade to Viash 0.9.0.
-
bbmap_bbsplit: Move to namespacebbmap(PR #162).
biobox 0.2.0
BREAKING CHANGES
-
star/star_align_reads: Change all arguments from--camelCaseto--snake_case(PR #62). -
star/star_genome_generate: Change all arguments from--camelCaseto--snake_case(PR #62).
NEW FUNCTIONALITY
-
star/star_align_reads: Add star solo related arguments (PR #62). -
bd_rhapsody/bd_rhapsody_make_reference: Create a reference for the BD Rhapsody pipeline (PR #75). -
umitools/umitools_dedup: Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read (PR #54). -
seqtk:seqtk/seqtk_sample: Subsamples sequences from FASTA/Q files (PR #68).seqtk/seqtk_subseq: Extract the sequences (complete or subsequence) from the FASTA/FASTQ files based on a provided sequence IDs or region coordinates file (PR #85).
-
agat:agat_convert_sp_gff2gtf: convert any GTF/GFF file into a proper GTF file (PR #76).agat_convert_bed2gff: convert bed file to gff format (PR #97).agat_convert_embl2gff: convert an EMBL file into GFF format (PR #99).agat/agat_convert_sp_gff2gtf: convert any GTF/GFF file into a proper GTF file (PR #76).agat/agat_convert_bed2gff: convert bed file to gff format (PR #97).agat/agat_convert_mfannot2gff: convert MFannot "masterfile" annotation to gff format (PR #112).agat/agat_convert_embl2gff: convert an EMBL file into GFF format (PR #99).agat/agat_convert_sp_gff2tsv: convert gtf/gff file into tabulated file (PR #102).agat/agat_convert_sp_gxf2gxf: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).
-
bedtools:bedtools/bedtools_intersect: Allows one to screen for overlaps between two sets of genomic features (PR #94).bedtools/bedtools_sort: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).bedtools/bedtools_genomecov: Compute the coverage of a feature file (bed/gff/vcf/bam) among a genome (PR #128).bedtools/bedtools_groupby: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).bedtools/bedtools_merge: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).bedtools/bedtools_bamtofastq: Convert BAM alignments to FASTQ files (PR #101).bedtools/bedtools_bedtobam: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).bedtools/bedtools_bed12tobed6: Converts BED12 files to BED6 files (PR #140).bedtools/bedtools_links: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).
-
qualimap/qualimap_rnaseq: RNA-seq QC analysis using qualimap (PR #74). -
rsem/rsem_prepare_reference: Prepare transcript references for RSEM (PR #89). -
bcftools:bcftools/bcftools_concat: Concatenate or combine VCF/BCF files (PR #145).bcftools/bcftools_norm: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).bcftools/bcftools_annotate: Add or remove annotations from a VCF/BCF file (PR #143).bcftools/bcftools_stats: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).bcftools/bcftools_sort: Sorts BCF/VCF files by position and other criteria (PR #141).
-
fastqc: High throughput sequence quality control analysis tool (PR #92). -
sortmerna: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic data (PR #146). -
fq_subsample: Sample a subset of records from single or paired FASTQ files (PR #147). -
kallisto:kallisto_index: Create a kallisto index (PR #149).kallisto_quant: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads (PR #152).
-
trimgalore: Quality and adapter trimming for fastq files (PR #117).
MINOR CHANGES
-
buscocomponents: update BUSCO to5.7.1(PR #72). -
Update CI to reusable workflow in
viash-io/viash-actions(PR #86). -
Update several components in order to avoid duplicate code when using
unseton boolean arguments (PR #133). -
Bump viash to
0.9.0-RC7(PR #134)
DOCUMENTATION
-
Extend the contributing guidelines (PR #82):
-
Update format to Viash 0.9.
-
Descriptions should be formatted in markdown.
-
Add defaults to descriptions, not as a default of the argument.
-
Explain parameter expansion.
-
Mention that the contents of the output of components in tests should be checked.
-
-
Add authorship to existing components (PR #88).
BUG FIXES
-
pear: fix component not exiting with the correct exitcode when PEAR fails (PR #70). -
cutadapt: fix--par_quality_cutoff_r2argument (PR #69). -
cutadapt: demultiplexing is now disabled by default. It can be re-enabled by usingdemultiplex_mode(PR #69). -
multiqc: update multiple separator to;(PR #81).
biobox 0.1.0
NEW FEATURES
-
arriba: Detect gene fusions from RNA-seq data (PR #1). -
fastp: An ultra-fast all-in-one FASTQ preprocessor (PR #3). -
busco:busco/busco_run: Assess genome assembly and annotation completeness with single copy orthologs (PR #6).busco/busco_list_datasets: Lists available busco datasets (PR #18).busco/busco_download_datasets: Download busco datasets (PR #19).
-
cutadapt: Remove adapter sequences from high-throughput sequencing reads (PR #7). -
featurecounts: Assign sequence reads to genomic features (PR #11). -
bgzip: Add bgzip functionality to compress and decompress files (PR #13). -
pear: Paired-end read merger (PR #10). -
lofreq/call: Call variants from a BAM file (PR #17). -
lofreq/indelqual: Insert indel qualities into BAM file (PR #17). -
multiqc: Aggregate results from bioinformatics analyses across many samples into a single report (PR #42). -
star:star/star_align_reads: Align reads to a reference genome (PR #22).star/star_genome_generate: Generate a genome index for STAR alignment (PR #58).
-
gffread: Validate, filter, convert and perform other operations on GFF files (PR #29). -
salmon:salmon/salmon_index: Create a salmon index for the transcriptome to use Salmon in the mapping-based mode (PR #24).salmon/salmon_quant: Transcript quantification from RNA-seq data (PR #24).
-
samtools:samtools/samtools_flagstat: Counts the number of alignments in SAM/BAM/CRAM files for each FLAG type (PR #31).samtools/samtools_idxstats: Reports alignment summary statistics for a SAM/BAM/CRAM file (PR #32).samtools/samtools_index: Index SAM/BAM/CRAM files (PR #35).samtools/samtools_sort: Sort SAM/BAM/CRAM files (PR #36).samtools/samtools_stats: Reports alignment summary statistics for a BAM file (PR #39).samtools/samtools_faidx: Indexes FASTA files to enable random access to fasta and fastq files (PR #41).samtools/samtools_collate: Shuffles and groups reads in SAM/BAM/CRAM files together by their names (PR #42).samtools/samtools_view: Views and converts SAM/BAM/CRAM files (PR #48).samtools/samtools_fastq: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).samtools/samtools_fastq: Converts a SAM/BAM/CRAM file to FASTA (PR #53).
-
umi_tools:umi_tools/umi_tools_extract: Flexible removal of UMI sequences from fastq reads (PR #71).umi_tools/umi_tools_prepareforrsem: Fix paired-end reads in name sorted BAM file to prepare for RSEM (PR #148).
-
falco: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43). -
bedtools:bedtools_getfasta: extract sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file (PR #59).
-
bbmap:bbmap_bbsplit: Split sequencing reads by mapping them to multiple references simultaneously (PR #138).
MINOR CHANGES
-
Uniformize component metadata (PR #23).
-
Update to Viash 0.8.5 (PR #25).
-
Update to Viash 0.9.0-RC3 (PR #51).
-
Update to Viash 0.9.0-RC6 (PR #63).
-
Switch to viash-hub/toolbox actions (PR #64).
DOCUMENTATION
- Update README (PR #64).
BUG FIXES
-
Add escaping character before leading hashtag in the description field of the config file (PR #50).
-
Format URL in biobase/bcl_convert description (PR #55).