Build branch add_summary_to_csv_tests with version add_summary_to_csv_tests (6796c07)

Build pipeline: viash-hub.demultiplex.add-summary-to-csv-tests-kjm49

Source commit: 6796c074cf

Source message: Merge remote-tracking branch 'origin/main' into add_summary_to_csv_tests
This commit is contained in:
CI
2025-04-29 15:14:03 +00:00
commit 8c09489b0a
89 changed files with 55626 additions and 0 deletions

8
.gitignore vendored Normal file
View File

@@ -0,0 +1,8 @@
target
testData
test_resources
# Nextflow related files
.nextflow
.nextflow.log*
work

188
CHANGELOG.md Normal file
View File

@@ -0,0 +1,188 @@
# demultiplex v0.3.10
## Minor changes
* Moved the test resources to their new location (PR #37).
# demultiplex v0.3.9
## Bug fixes
* Fix defaults for output arguments in nextflow schema's.
* Fix an issue where an integer being passed to a argument with `type: double` resulted in an error (PR #44).
## Minor changes
* Bump viash to 0.9.4, which adds support for nextflow versions starting major version 25.01 (PR #43 and #44).
# demultiplex v0.3.8
## Bug fixes
* Provide a proper error when a FASTQ file is empty after demultiplexing (PR #40).
# demultiplex v0.3.7
## Minor updates
* Ignore lines starting with '#' when parsing run information CSV (PR #39).
# demultiplex v0.3.6
## Minor updates
* Allow letter case variants for headers when looking for sample information in run information CSV (PR #38).
# demultiplex v0.3.5
## Breaking changes
* The `demultiplex` workflow now outputs a list of directories
for the `output_falco` argument (one for each barcode) instead of one directory
for the complete run. The output from the `runner` workflow remained
unchanged (PR #33).
## Minor updates
* In case Illumina data is detected in the input folder, check for the presence of the 'copyComplete.txt' file.
This check can be disabled using `--skip_copycomplete_check` (PR #34).
# demultiplex v0.3.4
## Minor updates
* Resource labels are now automatically included during build (PR #32).
# demultiplex v0.3.3
## Breaking change
- The `runner` defines the output differently now:
- The last part of the `--input` path is expected to be the run ID and this run ID is used to create the output directory.
- If the input is `file.tar.gz` instead of a directory, the `file` part is used as the run ID.
- The output structure is then as follows:
```
$publish_dir/<run_id>/<date_time_stamp>_demultiplex_<version>/
```
For instance:
```
$publish_dir
└── 200624_A00834_0183_BHMTFYDRXX
└── 20241217_051404_demultiplex_v1.2
├── run_information.csv
├── fastq
│   ├── Sample1_S1_L001_R1_001.fastq.gz
│   ├── Sample23_S3_L001_R1_001.fastq.gz
│   ├── SampleA_S2_L001_R1_001.fastq.gz
│   ├── Undetermined_S0_L001_R1_001.fastq.gz
│   └── sampletest_S4_L001_R1_001.fastq.gz
└── qc
├── fastqc
│   ├── Sample1_S1_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Sample1_S1_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Sample1_S1_L001_R1_001.fastq.gz_summary.txt
│   ├── Sample23_S3_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Sample23_S3_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Sample23_S3_L001_R1_001.fastq.gz_summary.txt
│   ├── SampleA_S2_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── SampleA_S2_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── SampleA_S2_L001_R1_001.fastq.gz_summary.txt
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_summary.txt
│   ├── sampletest_S4_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── sampletest_S4_L001_R1_001.fastq.gz_fastqc_report.html
│   └── sampletest_S4_L001_R1_001.fastq.gz_summary.txt
└── multiqc_report.html
```
- This logic can be avoided by providing the flag `--plain_output`.
# Minor updates
* Added `output_run_information` argument that copies the run information file to the output (PR #31).
# demultiplex v0.3.2
# Bug fixes
* Ignore empty CSV entries when parsing sample information (PR #29).
# demultiplex v0.3.1
# Minor updates
* Add `--run_information` and `--demultiplexer` arguments to `runner` workflow (PR #27).
# Bug fixes
* Fix detection of sample IDs from Illumina V2 sample sheets (PR #28).
* Provide a clear error message when `--run_information` is provided but not `--demultiplexer` (PR #27).
# demultiplex v0.3.0
## Major updates
The outflow of the workflow has been refactored to be more flexible (PR #19). This is done by creating a wrapper workflow `runner` that wraps the native `demultiplex` workflow. The `runner` workflow is responsible for setting the output directory based on the input arguments:
3 arguments exist for specifying the relative location of the 3 _outputs_ of the workflow:
- `fastq_output`: The directory where the demultiplexed fastq files are stored.
- `falco_output`: the directory for the `fastqc`/`falco` reports.
- `multiqc_output`: The filename for the `multiqc` report.
The target location path is determined by the following logic:
- If no `id` is provided, the output directory is set to `$publish_dir`.
- If an `id` is explicitly set using Seqera Cloud or by adding `--id <>`, the output directory is set to `$publish_dir/<id>`.
The workflow has two optional flags to be used in combination with `--id`:
- `--add_date_time`: rather than publishing the results under `$publish_dir`, this adds an additional layer `$publish_dir/<date-time-stamp>/`. This is useful when you want to keep track of multiple runs of the workflow (example: `240322_143020`).
- `--add_workflow_id`: adding this flag will add `_demultiplex_<version>` to the output directory (example: `demultiplex_v0.2.0`). When starting the workflow from a non-release, the version will be set to `version_unkonwn`.
The default structure in the output directory is:
- Two sub-directories:
- `fastq`
- `qc` for the reports:
- `multiqc_report.html`
- `fastqc/` directory containing the different fastqc (falco) reports.
The `$publish_dir` variable corresponds to the argument provided with `--publish-dir`. The `date-time-stamp` is generated by the workflow based on when it was launched and is thus guaranteed to be unique.
# demultiplex v0.2.0
## Breaking changes
* `demultiplex` workflow: renamed `sample_sheet` argument to `run_information` (PR #24)
## New features
* Add support for `bases2fastq` demultiplexer (PR #24)
## Minor updates
* Add resource labels to workflows (PR #21).
# demultiplex v0.1.1
## Minor updates
* Bump viash to 0.9.0 (PR #14).
* `demultiplex` workflow: use `v0.2.0` release instead of `main` branch for `biobox` dependencies (PR #11).
* Renamed `biobase` repository to `biobox` (PR #13 and PR #15).
# demultiplex v0.1.0
Initial release

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024 Data Intuitive
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

100
README.md Normal file
View File

@@ -0,0 +1,100 @@
# Demultiplex.vsh
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data. Currently data from Illumina and Element Biosciences sequencers are supported.
[![ViashHub](https://img.shields.io/badge/ViashHub-demultiplex-7a4baa.svg)](https://web.viash-hub.com/packages/demultiplex)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fdemultiplex-blue.svg)](https://github.com/viash-hub/demultiplex)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.4-blue)](https://viash.io)
## Workflow Overview
The workflow executes the following steps:
1. Unpacking the input data (when a TAR archive is provided)
2. Run `bclconvert` or `bases2fastq`
3. Run `falco` and convert Illumina InterOp information to csv
4. Run `multiqc` to generate a report
## Usage
Two variants of the same workflow are provided, depending on the flexibility in the ouput structure required:
* The `runner` workflow provides a predifined output structure. It requires the minimal amount of parameters to be provided, at the cost of being less flexible. It is located at `target/nextflow/runner/main.nf`
* The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`) allows for more fine-grained tuning, but required more parameters to be provided.
### Test data
We have provided test data at `gs://viash-hub-test-data/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`, but please feel free to bring your own. The URL of the test data can be provided as-is to the workflow, or you can download everything and specify a local path.
### Setup
In order to use the workflows in this package, you'll need to do the following:
* Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
* Install a nextflow compatible executor. This workflow provides a profile for [docker](https://docs.docker.com/get-started/).
### Setting up SCM
In order to let nextflow use the viash-hub workflows, you need to setup a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration) file. This can be done once by creating `$HOME/.nextflow/scm` and adding the following:
```
providers {
vsh {
platform = 'gitlab'
server = "packages.viash-hub.com"
}
}
```
Alternatively, a custom location for the SCM file can be specified using the `NXF_SCM_FILE` environment variable.
You can check if everything is working by getting the `--help` for a workflow:
```bash
nextflow run \
vsh/demultiplex \
-r v0.3.9 \
--help
```
### (Optional) Resource usage tuning
Nextflow's labels can be used to specify the amount of resources a process can use. This workflow uses the following labels for CPU and memory:
* `verylowmem`, `lowmem`, `midmem`, `highmem`
* `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
The defaults for these labels can be found at `src/config/labels.config`. Nextflow checks that the specified resources for a process do not exceed what is available on the machine and will not start if it does. Create your own config file to tune the labels to your needs, for example:
```
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }
withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }
```
When starting nextflow using the CLI, you can use `-c` to provide the file to nextflow and overwrite the defaults.
### Example
```bash
nextflow run vsh/demultiplex \
-r v0.3.9 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-test-data/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
--demultiplexer bclconvert \
--publish_dir example_output/ \
-profile docker \
-c labels.config
```
## Acknowledgements
Developed in collaboration with Data Intuitive and Open Analytics.

20
_viash.yaml Normal file
View File

@@ -0,0 +1,20 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v3
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script := 'includeConfig("nextflow_labels.config")'

3
main.nf Normal file
View File

@@ -0,0 +1,3 @@
workflow {
print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
}

12
nextflow.config Normal file
View File

@@ -0,0 +1,12 @@
manifest {
homePage = 'https://github.com/viash-hub/demultiplex'
description = 'Demultiplexing pipeline for sequencing data'
mainScript = 'target/nextflow/demultiplex/main.nf'
}
process {
withName: publishStatesProc {
publishDir = [ enabled: false ]
}
}

98
src/config/labels.config Normal file
View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,48 @@
name: combine_samples
namespace: dataflow
description: Combine fastq files from across samples into one event with a list of fastq files per orientation.
argument_groups:
- name: Input arguments
arguments:
- name: "--id"
description: "ID of the new event"
type: string
required: true
- name: --forward_input
type: file
required: true
multiple: true
- name: --reverse_input
type: file
required: false
multiple: true
- name: "--falco_dir"
type: file
required: true
- name: Output arguments
arguments:
- name: --output_forward
type: file
direction: output
multiple: true
required: true
- name: --output_reverse
type: file
direction: output
multiple: true
required: false
- name: "--output_falco"
type: file
direction: output
required: true
multiple: true
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,30 @@
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
| map { id, state ->
def newEvent = [state.id, state + ["_meta": ["join_id": id]]]
newEvent
}
| groupTuple(by: 0, sort: "hash")
| map {run_id, states ->
// Gather the following state for all samples
def forward_fastqs = states.collect{it.forward_input}.flatten()
def reverse_fastqs = states.collect{it.reverse_input}.findAll{it != null}.flatten()
def falco_dirs = states.collect{it.falco_dir}
def resultState = [
"output_forward": forward_fastqs,
"output_reverse": reverse_fastqs,
"output_falco": falco_dirs,
// The join ID is the same across all samples from the same run
"_meta": ["join_id": states[0]._meta.join_id]
]
return [run_id, resultState]
}
emit:
output_ch
}

View File

@@ -0,0 +1,38 @@
name: gather_fastqs_and_validate
namespace: dataflow
description: |
From a directory containing fastq files, gather the files per sample
and validate according to the contents of the sample sheet.
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Directory containing .fastq files
type: file
required: true
- name: --sample_sheet
description: Sample sheet
type: file
required: true
- name: Output arguments
arguments:
- name: --fastq_forward
type: file
direction: output
required: true
multiple: true
- name: "--fastq_reverse"
type: file
direction: output
required: false
multiple: true
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,120 @@
import java.util.zip.GZIPInputStream
import java.nio.file.Files
import java.io.BufferedInputStream
def is_empty(file_to_check){
/*
Checks if a file has content
*/
if (file_to_check.size() == 0) {
return true
}
def input_stream = Files.newInputStream(file_to_check)
def gzInputStream
try {
gzInputStream = new GZIPInputStream(new BufferedInputStream(input_stream))
} catch (java.io.EOFException ex) {
// This is not a gzipfile...
return false
}
def read_one_byte = gzInputStream.read()
return read_one_byte == -1
}
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
// Gather input files from BCL convert output folder
| flatMap { id, state ->
println "Processing sample sheet: $state.sample_sheet"
def sample_sheet = state.sample_sheet
def start_parsing = false
def sample_id_column_index = null
def samples = ["Undetermined"]
def original_id = id
// Parse sample sheet for sample IDs
println "Processing run information file ${sample_sheet}"
csv_lines = sample_sheet.splitCsv(header: false, sep: ',')
csv_lines.any { csv_items ->
if (csv_items.isEmpty() || csv_items[0].startsWith("#")) {
// skip empty or commented line
return
}
def possible_header = csv_items[0]
def header = possible_header.find(/\[(.*)\]/){fullmatch, header_name -> header_name}
if (header) {
if (start_parsing) {
// Stop parsing when encountering the next header
println "Encountered next header '[${start_parsing}]', stopping parsing."
return true
}
// [Data], [BCLConvert_Data] for illumina
// [Samples] or sometimes [SAMPLES] for Element Biosciences
if (header.toLowerCase() in ["data", "samples", "bclconvert_data"]) {
println "Found header [${header}], start parsing."
start_parsing = true
return
}
}
if (start_parsing) {
if ( sample_id_column_index == null) {
println "Looking for sample name column."
sample_id_column_index = csv_items.findIndexValues{it == "Sample_ID" || it == "SampleName"}
assert (!sample_id_column_index.isEmpty()):
"Could not find column 'Sample_ID' (Illumina) or 'SampleName' " +
"(Element Biosciences) in run information! Found: ${sample_id_column_index}"
assert sample_id_column_index.size() == 1, "Expected run information file to contain " +
"a column 'Sample_ID' or 'SampleName', not both. Found: ${sample_id_column_index}"
sample_id_column_index = sample_id_column_index[0]
println "Found sample names column '${csv_items[sample_id_column_index]}'."
return
}
def candidate_sample_id = csv_items[sample_id_column_index]
if (candidate_sample_id?.trim()) { // Don't add empty csv entries.
samples += csv_items[sample_id_column_index]
}
}
// This return is important! (If 'true' is returned, the parsing stops.)
return
}
assert start_parsing:
"Sample information file does not contain [Data], [Samples] or [BCLConvert_Data] header!"
assert samples.size() > 1:
"Sample information file does not seem to contain any information about the samples!"
println "Finished processing run information file, found samples: ${samples}."
println "Looking for fastq files in ${state.input}."
def allfastqs = state.input.listFiles().findAll{it.isFile() && it.name ==~ /^.+\.fastq.gz$/}
println "Found ${allfastqs.size()} fastq files, matching them to the following samples: ${samples}."
processed_samples = samples.collect { sample_id ->
def forward_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R1_(\d+)\.fastq\.gz$/
def reverse_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R2_(\d+)\.fastq\.gz$/
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}
assert forward_fastq && !forward_fastq.isEmpty(): "No forward fastq files were found for sample ${sample_id}. " +
"All fastq files in directory: ${allfastqs.collect{it.name}}"
assert (reverse_fastq.isEmpty() || (forward_fastq.size() == reverse_fastq.size())):
"Expected equal number of forward and reverse fastq files for sample ${sample_id}. " +
"Found forward: ${forward_fastq} and reverse: ${reverse_fastq}."
println "Found ${forward_fastq.size()} forward and ${reverse_fastq.size()} reverse " +
"fastq files for sample ${sample_id}"
assert forward_fastq.every{!is_empty(it)} && reverse_fastq.every{!is_empty(it)}:
"A fastq file for sample '${sample_id}' appears to be empty!"
def fastqs_state = [
"fastq_forward": forward_fastq,
"fastq_reverse": reverse_fastq,
"_meta": [ "join_id": original_id ],
]
[sample_id, fastqs_state]
}
println "Finished processing sample sheet."
return processed_samples
}
emit:
output_ch
}

View File

@@ -0,0 +1,105 @@
name: demultiplex
description: Demultiplexing of raw sequencing data
argument_groups:
- name: Input arguments
arguments:
- name: --id
description: Unique identifier for the run
type: string
- name: --input
description: Directory containing raw sequencing data
type: file
required: true
- name: --run_information
description: |
CSV file containing sample information, which will be used as
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)
or 'RunManifest.csv' (Element Biosciences). If not specified,
will try to autodetect the sample sheet in the input directory.
Requires --demultiplexer to be set.
type: file
required: false
- name: "--demultiplexer"
type: string
required: false
choices: ["bases2fastq", "bclconvert"]
description: |
Demultiplexer to use, choice depends on the provider
of the instrument that was used to generate the data.
When not using --sample_sheet, specifying this argument is not
required.
- name: Output arguments
arguments:
- name: --output
description: Directory to write fastq data to
type: file
direction: output
required: false
default: "$id/fastq"
- name: "--output_falco"
description: Directory to write falco output to
type: file
direction: output
required: false
multiple: true
default: "$id/qc/fastqc"
- name: "--output_multiqc"
description: Directory to write falco output to
type: file
direction: output
required: false
default: "$id/qc/multiqc_report.html"
- name: "--output_run_information"
type: file
direction: "output"
required: true
default: "$id/run_information.csv"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
test_resources:
- type: nextflow_script
path: test.nf
entrypoint: test_illumina
- type: nextflow_script
path: test.nf
entrypoint: test_bases2fastq
dependencies:
- name: io/untar
repository: local
- name: dataflow/gather_fastqs_and_validate
repository: local
- name: io/interop_summary_to_csv
repository: local
- name: dataflow/combine_samples
repository: local
- name: bcl_convert
repository: bb
- name: bases2fastq
repository: bb
- name: falco
repository: bb
- name: multiqc
repository: bb
repositories:
- name: bb
type: vsh
repo: biobox
tag: v0.3.0
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)
# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"
viash ns build --setup cb -q demultiplex
nextflow run . \
-main-script src/demultiplex/test.nf \
-profile docker,no_publish,local \
-entry test_illumina \
-c src/config/labels.config \
--resources_test https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/NovaSeq6000/ \
-resume
nextflow run . \
-main-script src/demultiplex/test.nf \
-profile docker,no_publish,local \
-entry test_bases2fastq \
-c src/config/labels.config \
-resume

240
src/demultiplex/main.nf Normal file
View File

@@ -0,0 +1,240 @@
workflow run_wf {
take:
input_ch
main:
samples_ch = input_ch
// untar input if needed
| untar.run(
directives: [label: ["lowmem", "lowcpu"]],
runIf: {id, state ->
def inputStr = state.input.toString()
inputStr.endsWith(".tar.gz") || \
inputStr.endsWith(".tar") || \
inputStr.endsWith(".tgz") ? true : false
},
fromState: [
"input": "input",
],
toState: { id, result, state ->
state + ["input": result.output]
},
)
// Gather input files from folder
| map {id, state ->
def newState = [:]
println("Provided run information: ${state.run_information} and demultiplexer: ${state.demultiplexer}")
// No auto-detection of run information file (it is user provided),
// in this case the demultiplexer should also be specified.
assert (!state.run_information || state.demultiplexer): "When setting --run_information, " +
"you must also provide a demultiplexer"
if (!state.run_information) {
println("Run information was not specified, auto-detecting...")
// The supported_platforms hashmap must be a 1-on-1 mapping
// Also, it's keys must be present in the 'choices' field
// for the 'run_information' argument in the viash config.
def supported_platforms = [
"bclconvert": "SampleSheet.csv", // Illumina
"bases2fastq": "RunManifest.csv" // Element Biosciences
]
def found_sample_information = supported_platforms.collectEntries{demultiplexer, filename ->
println("Checking if ${filename} can be found in input folder ${state.input}.")
def resolved_filename = state.input.resolve(filename)
if (!resolved_filename.isFile()) {
resolved_filename = null
}
println("Result after looking for run information for ${demultiplexer}: ${resolved_filename}.")
[demultiplexer, resolved_filename]
}
def demultiplexer = null
def run_information = null
found_sample_information.each{demultiplexer_candidate, file_path ->
if (file_path) {
// At this point, a candicate run information file was found.
assert !run_information: "Autodetection of run information " +
"(SampleSheet, RunManifest) failed: " +
"multiple candidate files found in input folder. " +
"Please specify one using --run_information."
run_information = file_path
demultiplexer = demultiplexer_candidate
}
}
// When autodetecting, the run information should have been found
assert run_information: "No run information file (SampleSheet, RunManifest) " +
"found in input directory."
// When autodetecting, the demultiplexer must be set if the run information was found
assert demultiplexer: "State error: the demultiplexer should have been autodetected. " +
"Please report this as a bug."
// When autodetecting, the found demultiplexer must match
// with the demultiplexer that the user has provided (in case it was provided).
if (state.demultiplexer) {
assert state.demultiplexer == demultiplexer,
"Requested to use demultiplexer ${state.demultiplexer} " +
"but demultiplexer based on the autodetected run information "
"file ${run_information} seems to indicate that the demultiplexer "
"should be ${demultiplexer}. Either avoid specifying the demultiplexer "
"or override the autodetection of the run information by providing "
"the file."
}
println("Using run information ${run_information} and demultiplexer ${demultiplexer}")
// At this point, the autodetected state can override the user provided state.
newState = newState + [
"run_information": run_information,
"demultiplexer": demultiplexer,
]
} // end auto-detection logic
if (newState.demultiplexer in ["bclconvert"]) {
// Do not add InterOp to state because we generate the summary csv's in the next
// step based on the run dir, not the InterOp dir.
def interop_dir = state.input.resolve("InterOp")
assert interop_dir.isDirectory(): "Expected InterOp directory to be present."
def copycomplete_file = state.input.resolve("CopyComplete.txt")
assert (copycomplete_file.isFile() || state.skip_copycomplete_check):
"'CopyComplete.txt' file was not found!"
}
def resultState = state + newState
[id, resultState]
}
| interop_summary_to_csv.run(
runIf: {id, state -> state.demultiplexer in ["bclconvert"]},
directives: [label: ["lowmem", "verylowcpu"]],
fromState: [
"input": "input",
],
toState: [
"interop_run_summary": "output_run_summary",
"interop_index_summary": "output_index_summary",
]
)
// run bcl_convert
| bcl_convert.run(
runIf: {id, state -> state.demultiplexer in ["bclconvert"]},
directives: [label: ["highmem", "midcpu"]],
fromState: { id, state ->
[
bcl_input_directory: state.input,
sample_sheet: state.run_information,
output_directory: state.output,
reports: "reports",
logs: "logs"
]
},
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
]
def newState = state + toAdd
return newState
}
)
// run bases2fastq
| bases2fastq.run(
runIf: {id, state -> state.demultiplexer in ["bases2fastq"]},
directives: [label: ["highmem", "midcpu"]],
fromState: [
"analysis_directory": "input",
"run_manifest": "run_information",
"output_directory": "output",
],
args: [
"no_projects": true, // Do not put output files in a subfolder for project
//"split_lanes": true,
"legacy_fastq": true, // Illumina style output names
"group_fastq": true, // No subdir per sample
],
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
]
def newState = state + toAdd
return newState
}
)
| gather_fastqs_and_validate.run(
fromState: [
"input": "output_demultiplexer",
"sample_sheet": "run_information",
],
toState: [
"fastq_forward": "fastq_forward",
"fastq_reverse": "fastq_reverse",
],
)
output_ch = samples_ch
| falco.run(
directives: [label: ["verylowcpu", "lowmem"]],
fromState: {id, state ->
[
"input": [state.fastq_forward, state.fastq_reverse],
"outdir": "$id/qc/falco",
"summary_filename": null,
"report_filename": null,
"data_filename": null,
]
},
toState: { id, result, state ->
state + [ "output_falco" : result.outdir ]
}
)
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
"falco_dir": state.output_falco,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
"output_falco": "output_falco",
]
)
| multiqc.run(
directives: [label: ["midcpu", "midmem"]],
fromState: {id, state ->
def new_state = [
"input": state.output_falco,
"output_report": state.output_multiqc,
"cl_config": 'sp: {fastqc/data: {fn: "*_fastqc_data.txt"}}'
]
if (state.demultiplexer == "bclconvert") {
new_state["input"] += [
state.interop_run_summary.getParent(),
state.interop_index_summary.getParent()
]
}
return new_state
},
toState: { id, result, state ->
state + [ "output_multiqc" : result.output_report ]
}
)
| setState(
[
//"_meta": "_meta",
"output": "output_demultiplexer",
"output_falco": "output_falco",
"output_multiqc": "output_multiqc",
"output_run_information": "run_information",
]
)
emit:
output_ch
}

View File

@@ -0,0 +1,10 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
params {
rootDir = java.nio.file.Paths.get("$projectDir/../../").toAbsolutePath().normalize().toString()
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

84
src/demultiplex/test.nf Normal file
View File

@@ -0,0 +1,84 @@
nextflow.enable.dsl=2
include { demultiplex } from params.rootDir + "/target/nextflow/demultiplex/main.nf"
params.resources_test = params.rootDir + "/testData/"
workflow test_illumina {
output_ch = Channel.fromList([
[
// sample_sheet: resources_test.resolve("bcl_convert_samplesheet.csv"),
// input: resources_test.resolve("iseq-DI/"),
//sample_sheet: "https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/NovaSeq6000/SampleSheet.csv",
input: params.resources_test + "200624_A00834_0183_BHMTFYDRXX.tar.gz",
publish_dir: "output_dir/",
]
])
| map { state -> [ "run", state ] }
| demultiplex.run(
toState: { id, output, state ->
output + [ orig_input: state.input ] }
)
| view { output ->
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
| map {id, state ->
assert state.output.isDirectory(): "Expected bclconvert output to be a directory"
state.output_falco.each{
assert it.isDirectory(): "Expected falco output to be a directory"
}
assert state.output_multiqc.isFile(): "Expected multiQC output to be a file"
fastq_files = state.output.listFiles().collect{it.name}
assert ["Undetermined_S0_L001_R1_001.fastq.gz", "Sample23_S3_L001_R1_001.fastq.gz",
"sampletest_S4_L001_R1_001.fastq.gz", "Sample1_S1_L001_R1_001.fastq.gz",
"SampleA_S2_L001_R1_001.fastq.gz"].toSet() == fastq_files.toSet(): \
"Output directory should contain the expected FASTQ files"
fastq_files.each{
assert it.length() != 0: "Expected FASTQ file to not be empty"
}
assert state.output_run_information.isFile(): "Expected output run information to be a file"
expected_run_information = """[Header]
|Date,6/24/2020
|Application,Illumina DRAGEN COVIDSeq Test Pipeline
|Instrument Type,NovaSeq6000
|Assay,Illumina COVIDSeq Test
|Index Adapters,IDT-ILMN DNA-RNA UDP Indexes
|Chemistry,Amplicon
|[Settings]
|AdapterRead1,CTGTCTCTTATACACATCT
|[Data]
|Lane,Sample_ID,Sample_Type,Index_ID,Index,Index2
|1,Sample1,PatientSample,UDP0001,GAACTGAGCG,TCGTGGAGCG
|1,SampleA,PatientSample,UDP0002,AGGTCAGATA,CTACAAGATA
|1,Sample23,PatientSample,UDP0003,CGTCTCATAT,TATAGTAGCT
|1,sampletest,PatientSample,UDP0004,ATTCCATAAG,TGCCTGGTGG
|""".stripMargin()
assert state.output_run_information.text.replaceAll("\r\n", "\n") == expected_run_information
}
}
workflow test_bases2fastq {
output_ch = Channel.fromList([
[
input: "http://element-public-data.s3.amazonaws.com/bases2fastq-share/bases2fastq-v2/20230404-bases2fastq-sim-151-151-9-9.tar.gz",
publish_dir: "output_dir/",
]
])
| map { state -> [ "run", state ] }
| demultiplex.run(
toState: { id, output, state ->
output + [ orig_input: state.input ] }
)
| view { output ->
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
| map {id, state ->
assert state.output.isDirectory(): "Expected bases2fastq output to be a directory"
state.output_falco.each{assert it.isDirectory(): "Expected falco output to be a directory"}
assert state.output_multiqc.isFile(): "Expected multiQC output to be a file"
}
}

View File

@@ -0,0 +1,45 @@
name: interop_summary_to_csv
namespace: io
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Sequencing run folder (*not* InterOp folder).
type: file
required: true
- name: Output arguments
arguments:
- name: --output_run_summary
type: file
direction: output
required: true
- name: --output_index_summary
type: file
direction: output
required: true
requirements:
commands: ["summary", "index-summary"]
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- path: /testData/iseq-DI
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
- wget
- type: docker
run: |
wget https://github.com/Illumina/interop/releases/download/v1.3.1/interop-1.3.1-Linux-GNU.tar.gz -O /tmp/interop.tar.gz && \
tar -C /tmp/ --no-same-owner --no-same-permissions -xvf /tmp/interop.tar.gz && \
mv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/
runners:
- type: executable
- type: nextflow

View File

@@ -0,0 +1,10 @@
#!/usr/bin/env bash
set -eo pipefail
if [ ! -d "$par_input" ]; then
echo "Input directory does not exist or is not a directory"
exit 1
fi
$(which summary) --csv=1 "$par_input" 1> "$par_output_run_summary"
$(which index-summary) --csv=1 "$par_input" 1> "$par_output_index_summary"

View File

@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
echo ">>> Run simple execution"
./$meta_functionality_name \
--input "$meta_resources_dir/iseq-DI" \
--output_run_summary "$TMPDIR/run_summary.csv" \
--output_index_summary "$TMPDIR/index_summary.csv"

33
src/io/publish/code.sh Executable file
View File

@@ -0,0 +1,33 @@
#!/bin/bash
set -eo pipefail
declare -A input_output_mapping=(["par_input"]="par_output"
["par_input_multiqc"]="par_output_multiqc"
["par_input_run_information"]="par_output_run_information"
)
for input_argument_name in "${!input_output_mapping[@]}"
do
input_location="${!input_argument_name}"
output_argument_name="${input_output_mapping[$input_argument_name]}"
output_location="${!output_argument_name}"
echo "Publishing $input_location -> $output_location"
echo "Creating directory if it does not exist."
mkdir -p $(dirname "$output_location") && echo "Containing directory $output_location created"
echo "Copying files..."
cp -rL "$input_location" "$output_location"
echo "Output files for $output_location:"
ls "$output_location"
done
echo "Grouping output from $par_input_falco into $par_output_falco"
mkdir -p "$par_output_falco"
IFS=";" read -ra falco_inputs <<< $par_input_falco
for falco_dir in "${falco_inputs[@]}"; do
echo "Copying contents of $falco_dir"
find -H -D exec "$falco_dir" -type f -maxdepth 1 -exec cp -t "$par_output_falco" {} +
done

View File

@@ -0,0 +1,57 @@
name: "publish"
namespace: "io"
description: "Publish the processed results of the run"
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Directory to write fastq data to
type: file
required: true
- name: "--input_falco"
description: Directory to write falco output to
type: file
required: true
multiple: true
- name: "--input_multiqc"
description: Location where to write the MultiQC report to.
type: file
required: true
- name: "--input_run_information"
description: "Location where to write the run information to."
type: file
required: true
- name: Output arguments
arguments:
- name: --output
type: file
direction: output
default: "fastq"
- name: --output_falco
type: file
direction: output
default: "qc/fastqc"
- name: --output_multiqc
type: file
direction: output
default: "qc/multiqc_report.html"
- name: --output_run_information
type: file
direction: output
default: run_information.csv
resources:
- type: bash_script
path: ./code.sh
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
runners:
- type: executable
- type: nextflow

View File

@@ -0,0 +1,44 @@
name: untar
namespace: io
description: |
Unpack a .tar file. When the contents of the .tar file is just a single directory,
put the contents of the directory into the output folder instead of that directory.
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Tarball file to be unpacked.
type: file
required: true
- name: Output arguments
arguments:
- name: --output
description: Directory to write the contents of the .tar file to.
type: file
direction: output
required: true
- name: "Other arguments"
arguments:
- name: "--exclude"
alternatives: ["-e"]
type: string
description: Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted.
example: "docs/figures"
required: false
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
runners:
- type: executable
- type: nextflow

41
src/io/untar/script.sh Normal file
View File

@@ -0,0 +1,41 @@
#!/usr/bin/env bash
set -eo pipefail
extra_args=()
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
# Check if tarball contains 1 top-level directory. If so, extract the contents of the
# directory to the output directory instead of the directory itself.
echo "Directory contents:"
tar -taf "${par_input}" > "$TMPDIR/tar_contents.txt"
cat "$TMPDIR/tar_contents.txt"
printf "Checking if tarball contains only a single top-level directory: "
if [[ $(grep -o -E '^[./]*[^/]+/$' "$TMPDIR/tar_contents.txt" | uniq | wc -l) -eq 1 ]]; then
echo "It does."
echo "Extracting the contents of the top-level directory to the output directory instead of the directory itself."
# The directory can be both of the format './<directory>' (or ././<directory>) or just <directory>
# Adjust the number of stripped components accordingly by looking for './' at the beginning of the file.
starting_relative=$(grep -oP -m 1 '^(./)*' "$TMPDIR/tar_contents.txt" | tr -d '\n' | wc -c)
n_strips=$(( ($starting_relative / 2)+1 ))
extra_args+=("--strip-components=$n_strips")
else
echo "It does not."
fi
if [ "$par_exclude" != "" ]; then
echo "Exclusion of files with wildcard '$par_exclude' requested."
extra_args+=("--exclude=$par_exclude")
fi
echo "Starting extraction of tarball '$par_input' to output directory '$par_output'."
mkdir -p "$par_output"
echo "executing 'tar --no-same-owner --no-same-permissions --directory=$par_output ${extra_args[@]} -xavf $par_input'"
tar --no-same-owner --no-same-permissions --directory="$par_output" ${extra_args[@]} -xavf "$par_input"

126
src/io/untar/test.sh Normal file
View File

@@ -0,0 +1,126 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
INPUT_FILE="$TMPDIR/test_file.txt"
echo ">>> Creating test input file at '$TMPDIR/test_file.txt'."
echo "foo" > "$INPUT_FILE"
echo ">>> Created '$INPUT_FILE'."
echo ">>> Creating tar.gz from '$INPUT_FILE'."
TARFILE="${INPUT_FILE}.tar.gz"
tar -C "$TMPDIR" -czvf ${INPUT_FILE}.tar.gz $(basename "$INPUT_FILE")
[[ ! -f "$TARFILE" ]] && echo ">>> Test setup failed: could not create tarfile." && exit 1
echo ">>> '$TARFILE' created."
echo ">>> Check whether tar.gz can be extracted"
echo ">>> Creating temporary output directory for test 1."
OUTPUT_DIR_1="$TMPDIR/output_test_1/"
mkdir "$OUTPUT_DIR_1"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_1'".
./$meta_functionality_name \
--input "$TARFILE" \
--output "$OUTPUT_DIR_1"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_1/test_file.txt" ]] && echo "Output file could not be found. Output directory contents: " && ls "$OUTPUT_DIR_1" && exit 1
echo ">>> Creating temporary output directory for test 2."
OUTPUT_DIR_2="$TMPDIR/output_test_2/"
mkdir "$OUTPUT_DIR_2"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_2', excluding '$test_file.txt'".
./$meta_functionality_name \
--input "$TARFILE" \
--output "$OUTPUT_DIR_2" \
--exclude 'test_file.txt'
echo ">>> Check whether excluded file was not extracted"
[[ -f "$OUTPUT_DIR_2/test_file.txt" ]] && echo "File should have been excluded! Output directory contents:" && ls "$OUTPUT_DIR_2" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory."
mkdir "$TMPDIR/input_test_3/"
cp "$INPUT_FILE" "$TMPDIR/input_test_3/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_3.tar.gz" $(basename "$TMPDIR/input_test_3")
TARFILE_3="$TMPDIR/input_test_3.tar.gz"
echo ">>> Creating temporary output directory for test 3."
OUTPUT_DIR_3="$TMPDIR/output_test_3/"
mkdir "$OUTPUT_DIR_3"
echo "Extracting '$TARFILE_3' to '$OUTPUT_DIR_3'".
./$meta_functionality_name \
--input "$TARFILE_3" \
--output "$OUTPUT_DIR_3"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_3/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Check for tar archive that contains a single directory starting with './'."
mkdir "$TMPDIR/input_test_4/"
cp "$INPUT_FILE" "$TMPDIR/input_test_4/"
pushd "$TMPDIR/"
trap popd ERR
tar -czvf "$TMPDIR/input_test_4.tar.gz" ./input_test_4
popd
trap - ERR
OUTPUT_DIR_4="$TMPDIR/output_test_4/"
echo "Extracting '$TMPDIR/input_test_4.tar.gz' to '$OUTPUT_DIR_4'".
./$meta_functionality_name \
--input "$TMPDIR/input_test_4.tar.gz" \
--output "$OUTPUT_DIR_4"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_4/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory, but it is nested."
mkdir -p "$TMPDIR/input_test_5/nested/"
cp "$INPUT_FILE" "$TMPDIR/input_test_5/nested/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_5.tar.gz" $(basename "$TMPDIR/input_test_5")
TARFILE_5="$TMPDIR/input_test_5.tar.gz"
echo ">>> Creating temporary output directory for test 5."
OUTPUT_DIR_5="$TMPDIR/output_test_5/"
mkdir "$OUTPUT_DIR_5"
echo "Extracting '$TARFILE_5' to '$OUTPUT_DIR_5'".
./$meta_functionality_name \
--input "$TARFILE_5" \
--output "$OUTPUT_DIR_5"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_5/nested/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing two top-level directories."
mkdir -p "$TMPDIR/input_test_6/number_1/"
mkdir "$TMPDIR/input_test_6/number_2/"
cp "$INPUT_FILE" "$TMPDIR/input_test_6/number_1/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_6.tar.gz" $(basename "$TMPDIR/input_test_6")
TARFILE_6="$TMPDIR/input_test_6.tar.gz"
echo ">>> Creating temporary output directory for test 6."
OUTPUT_DIR_6="$TMPDIR/output_test_6/"
mkdir "$OUTPUT_DIR_6"
echo "Extracting '$TARFILE_6' to '$OUTPUT_DIR_6'".
./$meta_functionality_name \
--input "$TARFILE_6" \
--output "$OUTPUT_DIR_6"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_6/number_1/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
[[ ! -d "$OUTPUT_DIR_6/number_2" ]] && echo "Output directory could not be found!" && exit 1
echo ">>> Test finished successfully"

View File

@@ -0,0 +1,74 @@
name: runner
description: Runner for demultiplexing of raw sequencing data
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: |
Base directory of the canonical form `s3://<bucket>/<path>/<RunID>/`.
A tarball (tar.gz, .tgz, .tar) containing run information can be provided in which
case the RunID is set to the name of the tarball without the extension.
type: file
required: true
- name: --run_information
description: |
CSV file containing sample information, which will be used as
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)
or 'RunManifest.csv' (Element Biosciences). If not specified,
will try to autodetect the sample sheet in the input directory.
Requires --demultiplexer to be set.
type: file
required: false
- name: "--demultiplexer"
type: string
required: false
choices: ["bases2fastq", "bclconvert"]
description: |
Demultiplexer to use, choice depends on the provider
of the instrument that was used to generate the data.
When not using --sample_sheet, specifying this argument is not
required.
- name: Annotation flags
arguments:
- name: --plain_output
description: |
Flag to indicate that the output should be stored directly under $publish_dir rather than
under a subdirectory structure runID/<date_time>_demultiplex_<version>/.
type: boolean_true
- name: Output arguments
arguments:
- name: --fastq_output
type: file
direction: output
default: "fastq"
- name: --falco_output
type: file
direction: output
default: "qc/fastqc"
- name: --multiqc_output
type: file
direction: output
default: "qc/multiqc_report.html"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
dependencies:
- name: demultiplex
repository: local
- name: io/publish
repository: local
runners:
- type: nextflow
engines:
- type: native

89
src/runner/main.nf Normal file
View File

@@ -0,0 +1,89 @@
def date = new Date().format('yyyyMMdd_hhmmss')
def viash_config = java.nio.file.Paths.get("$projectDir/../../../").toAbsolutePath().normalize().toString() + "/_viash.yaml"
def version = get_version(viash_config)
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
// Extract the ID from the input.
// If the input is a tarball, strip the suffix.
| map{ id, state ->
def id_with_suffix = state.input.getFileName().toString()
[
id,
state + [ run_id: id_with_suffix - ~/\.(tar.gz|tgz|tar)$/ ]
]
}
| demultiplex.run(
fromState: { id, state ->
def state_to_pass = [
"input": state.input,
"run_information": state.run_information,
"demultiplexer": state.demultiplexer,
"skip_copycomplete_check": state.skip_copycomplete_check,
"output": "$id/fastq",
"output_falco": "$id/qc/fastqc",
"output_multiqc": "$id/qc/multiqc_report.html",
]
if (state.run_information) {
state_to_pass += ["output_run_information": state.run_information.getName()]
}
state_to_pass
},
toState: { id, result, state ->
state + result
},
)
| publish.run(
fromState: { id, state ->
println(state.plain_output)
def id1 = (state.plain_output) ? id : "${state.run_id}/${date}"
def id2 = (state.plain_output) ? id : "${id1}_demultiplex_${version}"
def fastq_output_1 = (id2 == "run") ? state.fastq_output : "${id2}/" + state.fastq_output
def falco_output_1 = (id2 == "run") ? state.falco_output : "${id2}/" + state.falco_output
def multiqc_output_1 = (id2 == "run") ? state.multiqc_output : "${id2}/" + state.multiqc_output
def run_information_output_1 = (id2 == "run") ? "${state.output_run_information.getName()}" : "${id2}/${state.output_run_information.getName()}"
if (id2 == "run") {
println("Publising to ${params.publish_dir}")
} else {
println("Publising to ${params.publish_dir}/${id2}")
}
[
input: state.output,
input_falco: state.output_falco,
input_multiqc: state.output_multiqc,
input_run_information: state.output_run_information,
output: fastq_output_1,
output_falco: falco_output_1,
output_multiqc: multiqc_output_1,
output_run_information: run_information_output_1,
]
},
toState: { id, result, state -> [:] },
directives: [
publishDir: [
path: "${params.publish_dir}",
overwrite: false,
mode: "copy"
]
]
)
emit:
output_ch
}
def get_version(inputFile) {
def yamlSlurper = new groovy.yaml.YamlSlurper()
def loaded_viash_config = yamlSlurper.parse(file(inputFile))
def version = (loaded_viash_config.version) ? loaded_viash_config.version : "unknown_version"
println("Version to be used: ${version}")
return version
}

View File

@@ -0,0 +1,12 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
process {
withName: publishStatesProc {
publishDir = [ enabled: false ]
}
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

0
target/.build.yaml Normal file
View File

View File

@@ -0,0 +1,421 @@
name: "bases2fastq"
version: "v0.3.0"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input"
arguments:
- type: "file"
name: "--analysis_directory"
description: "Location of analysis directory"
info: null
example:
- "input"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_manifest"
alternatives:
- "-r"
description: "Location of run manifest to use instead of default RunManifest.csv\
\ found in analysis directory"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output"
arguments:
- type: "file"
name: "--output_directory"
alternatives:
- "-o"
description: "Location to save output fastqs"
info: null
example:
- "fastq_dir"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--report"
description: "Output location for the HTML report"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--logs"
description: "Directory containing log files"
info: null
example:
- "logs_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Arguments"
arguments:
- type: "string"
name: "--chemistry_version"
description: "Run parameters override, chemistry version."
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--demux_only"
alternatives:
- "-d"
description: "Generate demux files and indexing stats without generating FASTQ\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--detect_adapters"
description: "Detect adapters sequences, overriding any sequences present in run\
\ manifest.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--error_on_missing"
description: "Terminate execution for a missing file (by default, missing files\
\ are\nskipped and execution continues). Also set by --strict.\n"
info: null
direction: "input"
- type: "string"
name: "--exclude_tile"
alternatives:
- "-e"
description: "Regex matching tile names to exclude. This flag can be specified\
\ multiple times. (e.g. L1.*C0[23]S.)\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--filter_mask"
description: "Run parameters override, custom pass filter mask.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--flowcell_id"
description: "Run parameters override, flowcell ID.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--force_index_orientation"
description: "Do not attempt to find orientation for I1/I2 reads (reverse complement).\n\
Use orientation given in run manifest.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--group_fastq"
description: "Group all FASTQ/stats/metrics for a project are in the project folder.\n"
info: null
direction: "input"
- type: "integer"
name: "--i1_cycles"
description: "Run parameters override, I1 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--i2_cycles"
description: "Run parameters override, I2 cycles\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--include_tile"
alternatives:
- "-i"
description: "Regex matching tile names to include. This flag\ncan be specified\
\ multiple times. (e.g. L1.*C0[23]S.)\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--kit_configuration"
description: "Run parameters override, kit configuration.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--legacy_fastq"
description: "Legacy naming for FASTQ files (e.g. SampleName_S1_L001_R1_001.fastq.gz)\n"
info: null
direction: "input"
- type: "string"
name: "--log_level"
alternatives:
- "-l"
description: "Severity level for logging.\n"
info: null
example:
- "INFO"
required: false
choices:
- "DEBUG"
- "INFO"
- "WARNING"
- "ERROR"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--no_error_on_invalid"
description: "Skip invalid files and continue execution. Overridden by --strict\
\ options\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_projects"
description: "Disable project directories\n"
info: null
direction: "input"
- type: "integer"
name: "--num_unassigned"
description: "Max Number of unassigned sequences to report.\n"
info: null
example:
- 30
required: false
min: 0
max: 1000
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--preparation_workflow"
description: "Run parameters override, preparation workflow. \n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--qc_only"
description: "Quickly generate run stats for single tile without generating FASTQ.\n\
Use --include_tile/--exclude_tile to define custom tile set.\n"
info: null
direction: "input"
- type: "integer"
name: "--r1_cycles"
description: "Run parameters override, R1 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--r2_cycles"
description: "Run parameters override, R2 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--split_lanes"
description: "Split FASTQ files by lane.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--strict"
description: "In strict mode any invalid or missing input file will terminate\
\ execution \n(overrides no_error_on_invalid and sets --error_on_missing)\n"
info: null
direction: "input"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "Bases2Fastq demultiplexes sequencing data generated by Element Biosciences\
\ instruments and converts base calls into FASTQ files.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
requirements:
commands:
- "ps"
keywords:
- "demultiplex"
- "fastq"
- "demux"
- "Element Biosciences"
license: "Proprietairy"
links:
repository: "https://github.com/viash-hub/biobox"
documentation: "https://docs.elembio.io/docs/bases2fastq/introduction/"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "elembio/bases2fastq:2.1.0"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
- "tree"
interactive: false
- type: "docker"
run:
- "echo \"bases2fastq: $(bases2fastq --version | cut -d' ' -f3)\" > /var/software_versions.txt\n"
test_setup:
- type: "apt"
packages:
- "curl"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/bases2fastq/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/bases2fastq"
executable: "target/nextflow/bases2fastq/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
info: null
viash_version: "0.9.0"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'bases2fastq'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
description = 'Bases2Fastq demultiplexes sequencing data generated by Element Biosciences instruments and converts base calls into FASTQ files.\n'
author = 'Dries Schaumont'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,394 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "bases2fastq",
"description": "Bases2Fastq demultiplexes sequencing data generated by Element Biosciences instruments and converts base calls into FASTQ files.\n",
"type": "object",
"definitions": {
"arguments" : {
"title": "Arguments",
"type": "object",
"description": "No description",
"properties": {
"chemistry_version": {
"type":
"string",
"description": "Type: `string`. Run parameters override, chemistry version",
"help_text": "Type: `string`. Run parameters override, chemistry version."
}
,
"demux_only": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Generate demux files and indexing stats without generating FASTQ\n",
"help_text": "Type: `boolean_true`, default: `false`. Generate demux files and indexing stats without generating FASTQ\n"
,
"default":false
}
,
"detect_adapters": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Detect adapters sequences, overriding any sequences present in run manifest",
"help_text": "Type: `boolean_true`, default: `false`. Detect adapters sequences, overriding any sequences present in run manifest.\n"
,
"default":false
}
,
"error_on_missing": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Terminate execution for a missing file (by default, missing files are\nskipped and execution continues)",
"help_text": "Type: `boolean_true`, default: `false`. Terminate execution for a missing file (by default, missing files are\nskipped and execution continues). Also set by --strict.\n"
,
"default":false
}
,
"exclude_tile": {
"type":
"string",
"description": "Type: List of `string`, multiple_sep: `\";\"`. Regex matching tile names to exclude",
"help_text": "Type: List of `string`, multiple_sep: `\";\"`. Regex matching tile names to exclude. This flag can be specified multiple times. (e.g. L1.*C0[23]S.)\n"
}
,
"filter_mask": {
"type":
"string",
"description": "Type: `string`. Run parameters override, custom pass filter mask",
"help_text": "Type: `string`. Run parameters override, custom pass filter mask.\n"
}
,
"flowcell_id": {
"type":
"string",
"description": "Type: `string`. Run parameters override, flowcell ID",
"help_text": "Type: `string`. Run parameters override, flowcell ID.\n"
}
,
"force_index_orientation": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Do not attempt to find orientation for I1/I2 reads (reverse complement)",
"help_text": "Type: `boolean_true`, default: `false`. Do not attempt to find orientation for I1/I2 reads (reverse complement).\nUse orientation given in run manifest.\n"
,
"default":false
}
,
"group_fastq": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Group all FASTQ/stats/metrics for a project are in the project folder",
"help_text": "Type: `boolean_true`, default: `false`. Group all FASTQ/stats/metrics for a project are in the project folder.\n"
,
"default":false
}
,
"i1_cycles": {
"type":
"integer",
"description": "Type: `integer`. Run parameters override, I1 cycles",
"help_text": "Type: `integer`. Run parameters override, I1 cycles.\n"
}
,
"i2_cycles": {
"type":
"integer",
"description": "Type: `integer`. Run parameters override, I2 cycles\n",
"help_text": "Type: `integer`. Run parameters override, I2 cycles\n"
}
,
"include_tile": {
"type":
"string",
"description": "Type: List of `string`, multiple_sep: `\";\"`. Regex matching tile names to include",
"help_text": "Type: List of `string`, multiple_sep: `\";\"`. Regex matching tile names to include. This flag\ncan be specified multiple times. (e.g. L1.*C0[23]S.)\n"
}
,
"kit_configuration": {
"type":
"string",
"description": "Type: `string`. Run parameters override, kit configuration",
"help_text": "Type: `string`. Run parameters override, kit configuration.\n"
}
,
"legacy_fastq": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Legacy naming for FASTQ files (e",
"help_text": "Type: `boolean_true`, default: `false`. Legacy naming for FASTQ files (e.g. SampleName_S1_L001_R1_001.fastq.gz)\n"
,
"default":false
}
,
"log_level": {
"type":
"string",
"description": "Type: `string`, example: `INFO`, choices: ``DEBUG`, `INFO`, `WARNING`, `ERROR``. Severity level for logging",
"help_text": "Type: `string`, example: `INFO`, choices: ``DEBUG`, `INFO`, `WARNING`, `ERROR``. Severity level for logging.\n",
"enum": ["DEBUG", "INFO", "WARNING", "ERROR"]
}
,
"no_error_on_invalid": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Skip invalid files and continue execution",
"help_text": "Type: `boolean_true`, default: `false`. Skip invalid files and continue execution. Overridden by --strict options\n"
,
"default":false
}
,
"no_projects": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Disable project directories\n",
"help_text": "Type: `boolean_true`, default: `false`. Disable project directories\n"
,
"default":false
}
,
"num_unassigned": {
"type":
"integer",
"description": "Type: `integer`, example: `30`. Max Number of unassigned sequences to report",
"help_text": "Type: `integer`, example: `30`. Max Number of unassigned sequences to report.\n"
}
,
"preparation_workflow": {
"type":
"string",
"description": "Type: `string`. Run parameters override, preparation workflow",
"help_text": "Type: `string`. Run parameters override, preparation workflow. \n"
}
,
"qc_only": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Quickly generate run stats for single tile without generating FASTQ",
"help_text": "Type: `boolean_true`, default: `false`. Quickly generate run stats for single tile without generating FASTQ.\nUse --include_tile/--exclude_tile to define custom tile set.\n"
,
"default":false
}
,
"r1_cycles": {
"type":
"integer",
"description": "Type: `integer`. Run parameters override, R1 cycles",
"help_text": "Type: `integer`. Run parameters override, R1 cycles.\n"
}
,
"r2_cycles": {
"type":
"integer",
"description": "Type: `integer`. Run parameters override, R2 cycles",
"help_text": "Type: `integer`. Run parameters override, R2 cycles.\n"
}
,
"split_lanes": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Split FASTQ files by lane",
"help_text": "Type: `boolean_true`, default: `false`. Split FASTQ files by lane.\n"
,
"default":false
}
,
"strict": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. In strict mode any invalid or missing input file will terminate execution \n(overrides no_error_on_invalid and sets --error_on_missing)\n",
"help_text": "Type: `boolean_true`, default: `false`. In strict mode any invalid or missing input file will terminate execution \n(overrides no_error_on_invalid and sets --error_on_missing)\n"
,
"default":false
}
}
},
"input" : {
"title": "Input",
"type": "object",
"description": "No description",
"properties": {
"analysis_directory": {
"type":
"string",
"description": "Type: `file`, required, example: `input`. Location of analysis directory",
"help_text": "Type: `file`, required, example: `input`. Location of analysis directory"
}
,
"run_manifest": {
"type":
"string",
"description": "Type: `file`. Location of run manifest to use instead of default RunManifest",
"help_text": "Type: `file`. Location of run manifest to use instead of default RunManifest.csv found in analysis directory"
}
}
},
"output" : {
"title": "Output",
"type": "object",
"description": "No description",
"properties": {
"output_directory": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Location to save output fastqs",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Location to save output fastqs"
,
"default":"$id.$key.output_directory.output_directory"
}
,
"report": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.report.report`. Output location for the HTML report",
"help_text": "Type: `file`, default: `$id.$key.report.report`. Output location for the HTML report"
,
"default":"$id.$key.report.report"
}
,
"logs": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Directory containing log files",
"help_text": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Directory containing log files"
,
"default":"$id.$key.logs.logs"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/arguments"
},
{
"$ref": "#/definitions/input"
},
{
"$ref": "#/definitions/output"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,445 @@
name: "bcl_convert"
version: "v0.3.0"
authors:
- name: "Toni Verbeiren"
roles:
- "author"
- "maintainer"
info:
links:
github: "tverbeiren"
linkedin: "verbeiren"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist and CEO"
- name: "Dorien Roosen"
roles:
- "author"
info:
links:
email: "dorien@data-intuitive.com"
github: "dorien-er"
linkedin: "dorien-roosen"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--bcl_input_directory"
alternatives:
- "-i"
description: "Input run directory"
info: null
example:
- "bcl_dir"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_sheet"
alternatives:
- "-s"
description: "Path to SampleSheet.csv file (default searched for in --bcl_input_directory)"
info: null
example:
- "bcl_dir/sample_sheet.csv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_info"
description: "Path to RunInfo.xml file (default root of BCL input directory)"
info: null
example:
- "bcl_dir/RunInfo.xml"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Lane and tile settings"
arguments:
- type: "integer"
name: "--bcl_only_lane"
description: "Convert only specified lane number (default all lanes)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--first_tile_only"
description: "Only convert first tile of input (for testing & debugging)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--tiles"
description: "Process only a subset of tiles by a regular expression"
info: null
example:
- "s_[0-9]+_1"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--exclude_tiles"
description: "Exclude set of tiles by a regular expression"
info: null
example:
- "s_[0-9]+_1"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Resource arguments"
arguments:
- type: "boolean"
name: "--shared_thread_odirect_output"
description: "Use linux native asynchronous io (io_submit) for file output (Default=false)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_parallel_tiles"
description: "\\# of tiles to process in parallel (default 1)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_conversion_threads"
description: "\\# of threads for conversion (per tile, default # cpu threads)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_compression_threads"
description: "\\# of threads for fastq.gz output compression (per tile, default\
\ # cpu threads, or HW+12)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_decompression_threads"
description: "\\# of threads for bcl/cbcl input decompression (per tile, default\
\ half # cpu threads, or HW+8). Only applies when preloading files"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Run arguments"
arguments:
- type: "boolean"
name: "--bcl_only_matched_reads"
description: "For pure BCL conversion, do not output files for 'Undetermined'\
\ [unmatched] reads (output by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--no_lane_splitting"
description: "Do not split FASTQ file by lane (false by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--num_unknown_barcodes_reported"
description: "\\# of Top Unknown Barcodes to output (1000 by default)"
info: null
example:
- 1000
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--bcl_validate_sample_sheet_only"
description: "Only validate RunInfo.xml & SampleSheet files (produce no FASTQ\
\ files)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--strict_mode"
description: "Abort if any files are missing (false by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--sample_name_column_enabled"
description: "Use sample sheet 'Sample_Name' column when naming fastq files &\
\ subdirectories"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_directory"
alternatives:
- "-o"
description: "Output directory containig fastq files"
info: null
example:
- "fastq_dir"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--bcl_sampleproject_subdirectories"
description: "Output to subdirectories based upon sample sheet 'Sample_Project'\
\ column"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--fastq_gzip_compression_level"
description: "Set fastq output compression level 0-9 (default 1)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--reports"
description: "Reports directory"
info: null
example:
- "reports_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--logs"
description: "Reports directory"
info: null
example:
- "logs_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "Convert bcl files to fastq files using bcl-convert.\nInformation about\
\ upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\n\
and [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
requirements:
commands:
- "ps"
keywords:
- "demultiplex"
- "fastq"
- "bcl"
- "illumina"
license: "Proprietary"
links:
repository: "https://github.com/viash-hub/biobox"
homepage: "https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html"
documentation: "https://support.illumina.com/downloads/bcl-convert-user-guide.html"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:trixie-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "wget"
- "gdb"
- "which"
- "hostname"
- "alien"
- "procps"
interactive: false
- type: "docker"
run:
- "wget https://s3.amazonaws.com/webdata.illumina.com/downloads/software/bcl-convert/bcl-convert-4.2.7-2.el8.x86_64.rpm\
\ -O /tmp/bcl-convert.rpm && \\\nalien -i /tmp/bcl-convert.rpm && \\\nrm -rf\
\ /var/lib/apt/lists/* && \\\nrm /tmp/bcl-convert.rpm\n"
- type: "docker"
run:
- "echo \"bcl-convert: \\\"$(bcl-convert -V 2>&1 >/dev/null | sed -n '/Version/\
\ s/^bcl-convert\\ Version //p')\\\"\" > /var/software_versions.txt\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/bcl_convert/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/bcl_convert"
executable: "target/nextflow/bcl_convert/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
info: null
viash_version: "0.9.0"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'bcl_convert'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
description = 'Convert bcl files to fastq files using bcl-convert.\nInformation about upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\nand [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n'
author = 'Toni Verbeiren, Dorien Roosen'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,349 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "bcl_convert",
"description": "Convert bcl files to fastq files using bcl-convert.\nInformation about upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\nand [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"bcl_input_directory": {
"type":
"string",
"description": "Type: `file`, required, example: `bcl_dir`. Input run directory",
"help_text": "Type: `file`, required, example: `bcl_dir`. Input run directory"
}
,
"sample_sheet": {
"type":
"string",
"description": "Type: `file`, example: `bcl_dir/sample_sheet.csv`. Path to SampleSheet",
"help_text": "Type: `file`, example: `bcl_dir/sample_sheet.csv`. Path to SampleSheet.csv file (default searched for in --bcl_input_directory)"
}
,
"run_info": {
"type":
"string",
"description": "Type: `file`, example: `bcl_dir/RunInfo.xml`. Path to RunInfo",
"help_text": "Type: `file`, example: `bcl_dir/RunInfo.xml`. Path to RunInfo.xml file (default root of BCL input directory)"
}
}
},
"lane and tile settings" : {
"title": "Lane and tile settings",
"type": "object",
"description": "No description",
"properties": {
"bcl_only_lane": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. Convert only specified lane number (default all lanes)",
"help_text": "Type: `integer`, example: `1`. Convert only specified lane number (default all lanes)"
}
,
"first_tile_only": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Only convert first tile of input (for testing \u0026 debugging)",
"help_text": "Type: `boolean`, example: `true`. Only convert first tile of input (for testing \u0026 debugging)"
}
,
"tiles": {
"type":
"string",
"description": "Type: `string`, example: `s_[0-9]+_1`. Process only a subset of tiles by a regular expression",
"help_text": "Type: `string`, example: `s_[0-9]+_1`. Process only a subset of tiles by a regular expression"
}
,
"exclude_tiles": {
"type":
"string",
"description": "Type: `string`, example: `s_[0-9]+_1`. Exclude set of tiles by a regular expression",
"help_text": "Type: `string`, example: `s_[0-9]+_1`. Exclude set of tiles by a regular expression"
}
}
},
"resource arguments" : {
"title": "Resource arguments",
"type": "object",
"description": "No description",
"properties": {
"shared_thread_odirect_output": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Use linux native asynchronous io (io_submit) for file output (Default=false)",
"help_text": "Type: `boolean`, example: `true`. Use linux native asynchronous io (io_submit) for file output (Default=false)"
}
,
"bcl_num_parallel_tiles": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. \\# of tiles to process in parallel (default 1)",
"help_text": "Type: `integer`, example: `1`. \\# of tiles to process in parallel (default 1)"
}
,
"bcl_num_conversion_threads": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. \\# of threads for conversion (per tile, default # cpu threads)",
"help_text": "Type: `integer`, example: `1`. \\# of threads for conversion (per tile, default # cpu threads)"
}
,
"bcl_num_compression_threads": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. \\# of threads for fastq",
"help_text": "Type: `integer`, example: `1`. \\# of threads for fastq.gz output compression (per tile, default # cpu threads, or HW+12)"
}
,
"bcl_num_decompression_threads": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. \\# of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8)",
"help_text": "Type: `integer`, example: `1`. \\# of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8). Only applies when preloading files"
}
}
},
"run arguments" : {
"title": "Run arguments",
"type": "object",
"description": "No description",
"properties": {
"bcl_only_matched_reads": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. For pure BCL conversion, do not output files for \u0027Undetermined\u0027 [unmatched] reads (output by default)",
"help_text": "Type: `boolean`, example: `true`. For pure BCL conversion, do not output files for \u0027Undetermined\u0027 [unmatched] reads (output by default)"
}
,
"no_lane_splitting": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Do not split FASTQ file by lane (false by default)",
"help_text": "Type: `boolean`, example: `true`. Do not split FASTQ file by lane (false by default)"
}
,
"num_unknown_barcodes_reported": {
"type":
"integer",
"description": "Type: `integer`, example: `1000`. \\# of Top Unknown Barcodes to output (1000 by default)",
"help_text": "Type: `integer`, example: `1000`. \\# of Top Unknown Barcodes to output (1000 by default)"
}
,
"bcl_validate_sample_sheet_only": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Only validate RunInfo",
"help_text": "Type: `boolean`, example: `true`. Only validate RunInfo.xml \u0026 SampleSheet files (produce no FASTQ files)"
}
,
"strict_mode": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Abort if any files are missing (false by default)",
"help_text": "Type: `boolean`, example: `true`. Abort if any files are missing (false by default)"
}
,
"sample_name_column_enabled": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Use sample sheet \u0027Sample_Name\u0027 column when naming fastq files \u0026 subdirectories",
"help_text": "Type: `boolean`, example: `true`. Use sample sheet \u0027Sample_Name\u0027 column when naming fastq files \u0026 subdirectories"
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_directory": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Output directory containig fastq files",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Output directory containig fastq files"
,
"default":"$id.$key.output_directory.output_directory"
}
,
"bcl_sampleproject_subdirectories": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Output to subdirectories based upon sample sheet \u0027Sample_Project\u0027 column",
"help_text": "Type: `boolean`, example: `true`. Output to subdirectories based upon sample sheet \u0027Sample_Project\u0027 column"
}
,
"fastq_gzip_compression_level": {
"type":
"integer",
"description": "Type: `integer`, example: `1`. Set fastq output compression level 0-9 (default 1)",
"help_text": "Type: `integer`, example: `1`. Set fastq output compression level 0-9 (default 1)"
}
,
"reports": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.reports.reports`, example: `reports_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.reports.reports`, example: `reports_dir`. Reports directory"
,
"default":"$id.$key.reports.reports"
}
,
"logs": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Reports directory"
,
"default":"$id.$key.logs.logs"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/lane and tile settings"
},
{
"$ref": "#/definitions/resource arguments"
},
{
"$ref": "#/definitions/run arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,344 @@
name: "falco"
version: "v0.3.0"
authors:
- name: "Toni Verbeiren"
roles:
- "author"
- "maintainer"
info:
links:
github: "tverbeiren"
linkedin: "verbeiren"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist and CEO"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "input fastq files"
info: null
example:
- "input1.fastq;input2.fastq"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Run arguments"
arguments:
- type: "boolean_true"
name: "--nogroup"
description: "Disable grouping of bases for reads >50bp. \nAll reports will show\
\ data for every base in \nthe read. WARNING: When using this option, \nyour\
\ plots may end up a ridiculous size. You \nhave been warned!\n"
info: null
direction: "input"
- type: "file"
name: "--contaminents"
description: "Specifies a non-default file which contains \nthe list of contaminants\
\ to screen \noverrepresented sequences against. The file \nmust contain sets\
\ of named contaminants in \nthe form name[tab]sequence. Lines prefixed \nwith\
\ a hash will be ignored. Default: \nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/contaminant_list.txt\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--adapters"
description: "Specifies a non-default file which contains \nthe list of adapter\
\ sequences which will be \nexplicity searched against the library. The \nfile\
\ must contain sets of named adapters in \nthe form name[tab]sequence. Lines\
\ prefixed \nwith a hash will be ignored. Default:\nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/adapter_list.txt\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--limits"
description: "Specifies a non-default file which contains \na set of criteria\
\ which will be used to \ndetermine the warn/error limits for the \nvarious\
\ modules. This file can also be used \nto selectively remove some modules from\
\ the \noutput all together. The format needs to \nmirror the default limits.txt\
\ file found in \nthe Configuration folder. Default: \nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/limits.txt\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--subsample"
alternatives:
- "-s"
description: "[Falco only] makes falco faster (but \npossibly less accurate) by\
\ only processing \nreads that are a multiple of this value (using \n0-based\
\ indexing to number reads).\n"
info: null
example:
- 10
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--bisulfite"
alternatives:
- "-b"
description: "[Falco only] reads are whole genome \nbisulfite sequencing, and\
\ more Ts and fewer \nCs are therefore expected and will be \naccounted for\
\ in base content.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--reverse_complement"
alternatives:
- "-r"
description: "[Falco only] The input is a \nreverse-complement. All modules will\
\ be \ntested by swapping A/T and C/G\n"
info: null
direction: "input"
- name: "Output arguments"
arguments:
- type: "file"
name: "--outdir"
alternatives:
- "-o"
description: "Create all output files in the specified \noutput directory. FALCO-SPECIFIC:\
\ If the \ndirectory does not exists, the program will \ncreate it.\n"
info: null
example:
- "output"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--format"
alternatives:
- "-f"
description: "Bypasses the normal sequence file format \ndetection and forces\
\ the program to use the \nspecified format. Validformats are bam, sam, \nbam_mapped,\
\ sam_mapped, fastq, fq, fastq.gz \nor fq.gz.\n"
info: null
required: false
choices:
- "bam"
- "sam"
- "bam_mapped"
- "sam_mapped"
- "fastq"
- "fq"
- "fastq.gz"
- "fq.gz"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--data_filename"
alternatives:
- "-D"
description: "[Falco only] Specify filename for FastQC \ndata output (TXT). If\
\ not specified, it will \nbe called fastq_data.txt in either the input \nfile's\
\ directory or the one specified in the \n--output flag. Only available when\
\ running \nfalco with a single input.\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--report_filename"
alternatives:
- "-R"
description: "[Falco only] Specify filename for FastQC \nreport output (HTML).\
\ If not specified, it \nwill be called fastq_report.html in either \nthe input\
\ file's directory or the one \nspecified in the --output flag. Only \navailable\
\ when running falco with a single \ninput.\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--summary_filename"
alternatives:
- "-S"
description: "[Falco only] Specify filename for the short \nsummary output (TXT).\
\ If not specified, it \nwill be called fastq_report.html in either \nthe input\
\ file's directory or the one \nspecified in the --output flag. Only \navailable\
\ when running falco with a single \ninput.\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "A C++ drop-in replacement of FastQC to assess the quality of sequence\
\ read data"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
requirements:
commands:
- "ps"
keywords:
- "qc"
- "fastqc"
- "sequencing"
license: "GPL-3.0"
references:
doi:
- "10.12688/f1000research.21142.2"
links:
repository: "https://github.com/smithlabcode/falco"
documentation: "https://falco.readthedocs.io/en/latest/"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:trixie-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "wget"
- "build-essential"
- "g++"
- "zlib1g-dev"
- "procps"
interactive: false
- type: "docker"
run:
- "wget https://github.com/smithlabcode/falco/releases/download/v1.2.2/falco-1.2.2.tar.gz\
\ -O /tmp/falco.tar.gz && \\\ncd /tmp && \\\ntar xvf falco.tar.gz && \\\ncd\
\ falco-1.2.2 && \\\n./configure && \\\nmake all && \\\nmake install\n"
- type: "docker"
run:
- "echo \"falco: \\\"$(falco -v | sed -n 's/^falco //p')\\\"\" > /var/software_versions.txt\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/falco/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/falco"
executable: "target/nextflow/falco/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
info: null
viash_version: "0.9.0"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'falco'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
description = 'A C++ drop-in replacement of FastQC to assess the quality of sequence read data'
author = 'Toni Verbeiren'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,227 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "falco",
"description": "A C++ drop-in replacement of FastQC to assess the quality of sequence read data",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: List of `file`, required, example: `input1.fastq;input2.fastq`, multiple_sep: `\";\"`. input fastq files",
"help_text": "Type: List of `file`, required, example: `input1.fastq;input2.fastq`, multiple_sep: `\";\"`. input fastq files"
}
}
},
"run arguments" : {
"title": "Run arguments",
"type": "object",
"description": "No description",
"properties": {
"nogroup": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Disable grouping of bases for reads \u003e50bp",
"help_text": "Type: `boolean_true`, default: `false`. Disable grouping of bases for reads \u003e50bp. \nAll reports will show data for every base in \nthe read. WARNING: When using this option, \nyour plots may end up a ridiculous size. You \nhave been warned!\n"
,
"default":false
}
,
"contaminents": {
"type":
"string",
"description": "Type: `file`. Specifies a non-default file which contains \nthe list of contaminants to screen \noverrepresented sequences against",
"help_text": "Type: `file`. Specifies a non-default file which contains \nthe list of contaminants to screen \noverrepresented sequences against. The file \nmust contain sets of named contaminants in \nthe form name[tab]sequence. Lines prefixed \nwith a hash will be ignored. Default: \nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/contaminant_list.txt\n"
}
,
"adapters": {
"type":
"string",
"description": "Type: `file`. Specifies a non-default file which contains \nthe list of adapter sequences which will be \nexplicity searched against the library",
"help_text": "Type: `file`. Specifies a non-default file which contains \nthe list of adapter sequences which will be \nexplicity searched against the library. The \nfile must contain sets of named adapters in \nthe form name[tab]sequence. Lines prefixed \nwith a hash will be ignored. Default:\nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/adapter_list.txt\n"
}
,
"limits": {
"type":
"string",
"description": "Type: `file`. Specifies a non-default file which contains \na set of criteria which will be used to \ndetermine the warn/error limits for the \nvarious modules",
"help_text": "Type: `file`. Specifies a non-default file which contains \na set of criteria which will be used to \ndetermine the warn/error limits for the \nvarious modules. This file can also be used \nto selectively remove some modules from the \noutput all together. The format needs to \nmirror the default limits.txt file found in \nthe Configuration folder. Default: \nhttps://github.com/smithlabcode/falco/blob/v1.2.2/Configuration/limits.txt\n"
}
,
"subsample": {
"type":
"integer",
"description": "Type: `integer`, example: `10`. [Falco only] makes falco faster (but \npossibly less accurate) by only processing \nreads that are a multiple of this value (using \n0-based indexing to number reads)",
"help_text": "Type: `integer`, example: `10`. [Falco only] makes falco faster (but \npossibly less accurate) by only processing \nreads that are a multiple of this value (using \n0-based indexing to number reads).\n"
}
,
"bisulfite": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. [Falco only] reads are whole genome \nbisulfite sequencing, and more Ts and fewer \nCs are therefore expected and will be \naccounted for in base content",
"help_text": "Type: `boolean_true`, default: `false`. [Falco only] reads are whole genome \nbisulfite sequencing, and more Ts and fewer \nCs are therefore expected and will be \naccounted for in base content.\n"
,
"default":false
}
,
"reverse_complement": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. [Falco only] The input is a \nreverse-complement",
"help_text": "Type: `boolean_true`, default: `false`. [Falco only] The input is a \nreverse-complement. All modules will be \ntested by swapping A/T and C/G\n"
,
"default":false
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"outdir": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.outdir.outdir`, example: `output`. Create all output files in the specified \noutput directory",
"help_text": "Type: `file`, required, default: `$id.$key.outdir.outdir`, example: `output`. Create all output files in the specified \noutput directory. FALCO-SPECIFIC: If the \ndirectory does not exists, the program will \ncreate it.\n"
,
"default":"$id.$key.outdir.outdir"
}
,
"format": {
"type":
"string",
"description": "Type: `string`, choices: ``bam`, `sam`, `bam_mapped`, `sam_mapped`, `fastq`, `fq`, `fastq.gz`, `fq.gz``. Bypasses the normal sequence file format \ndetection and forces the program to use the \nspecified format",
"help_text": "Type: `string`, choices: ``bam`, `sam`, `bam_mapped`, `sam_mapped`, `fastq`, `fq`, `fastq.gz`, `fq.gz``. Bypasses the normal sequence file format \ndetection and forces the program to use the \nspecified format. Validformats are bam, sam, \nbam_mapped, sam_mapped, fastq, fq, fastq.gz \nor fq.gz.\n",
"enum": ["bam", "sam", "bam_mapped", "sam_mapped", "fastq", "fq", "fastq.gz", "fq.gz"]
}
,
"data_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.data_filename.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.data_filename.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT). If not specified, it will \nbe called fastq_data.txt in either the input \nfile\u0027s directory or the one specified in the \n--output flag. Only available when running \nfalco with a single input.\n"
,
"default":"$id.$key.data_filename.data_filename"
}
,
"report_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.report_filename.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML)",
"help_text": "Type: `file`, default: `$id.$key.report_filename.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
,
"default":"$id.$key.report_filename.report_filename"
}
,
"summary_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.summary_filename.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.summary_filename.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
,
"default":"$id.$key.summary_filename.summary_filename"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/run arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,483 @@
name: "multiqc"
version: "v0.3.0"
authors:
- name: "Dorien Roosen"
roles:
- "author"
- "maintainer"
info:
links:
email: "dorien@data-intuitive.com"
github: "dorien-er"
linkedin: "dorien-roosen"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input"
arguments:
- type: "file"
name: "--input"
description: "File paths to be searched for analysis results to be included in\
\ the report.\n"
info: null
example:
- "data/results"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Ouput"
arguments:
- type: "file"
name: "--output_report"
description: "Filepath of the generated report.\n"
info: null
example:
- "multiqc_report.html"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_data"
description: "Output directory for parsed data files. If not provided, parsed\
\ data will not be published.\n"
info: null
example:
- "multiqc_data"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_plots"
description: "Output directory for generated plots. If not provided, plots will\
\ not be published.\n"
info: null
example:
- "multiqc_plots"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Modules and analyses to run"
arguments:
- type: "string"
name: "--include_modules"
description: "Use only these module"
info: null
example:
- "fastqc"
- "cutadapt"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--exclude_modules"
description: "Do not use only these modules"
info: null
example:
- "fastqc"
- "cutadapt"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--ignore_analysis"
info: null
example:
- "run_one/*"
- "run_two/*"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--ignore_samples"
info: null
example:
- "sample_1*"
- "sample_3*"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "boolean_true"
name: "--ignore_symlinks"
description: "Ignore symlinked directories and files"
info: null
direction: "input"
- name: "Sample name handling"
arguments:
- type: "boolean_true"
name: "--dirs"
description: "Prepend directory to sample names to avoid clashing filenames"
info: null
direction: "input"
- type: "integer"
name: "--dirs_depth"
description: "Prepend n directories to sample names. Negative number to take from\
\ start of path."
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--full_names"
description: "Do not clean the sample names (leave as full file name)"
info: null
direction: "input"
- type: "boolean_true"
name: "--fn_as_s_name"
description: "Use the log filename as the sample name"
info: null
direction: "input"
- type: "file"
name: "--replace_names"
description: "TSV file to rename sample names during report generation"
info: null
example:
- "replace_names.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Report Customisation"
arguments:
- type: "string"
name: "--title"
description: "Report title. Printed as page header, used for filename if not otherwise\
\ specified.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--comment"
description: "Custom comment, will be printed at the top of the report.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--template"
description: "Report template to use.\n"
info: null
required: false
choices:
- "default"
- "gathered"
- "geo"
- "highcharts"
- "sections"
- "simple"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_names"
description: "TSV file containing alternative sample names for renaming buttons\
\ in the report.\n"
info: null
example:
- "sample_names.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_filters"
description: "TSV file containing show/hide patterns for the report\n"
info: null
example:
- "sample_filters.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--custom_css_file"
description: "Custom CSS file to add to the final report\n"
info: null
example:
- "custom_style_sheet.css"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--profile_runtime"
description: "Add analysis of how long MultiQC takes to run to the report\n"
info: null
direction: "input"
- name: "MultiQC behaviour"
arguments:
- type: "boolean_true"
name: "--verbose"
description: "Increase output verbosity.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--quiet"
description: "Only show log warnings\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--strict"
description: "Don't catch exceptions, run additional code checks to help development.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--development"
description: "Development mode. Do not compress and minimise JS, export uncompressed\
\ plot data.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--require_logs"
description: "Require all explicitly requested modules to have log files. If not,\
\ MultiQC will exit with an error.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_megaqc_upload"
description: "Don't upload generated report to MegaQC, even if MegaQC options\
\ are found.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_ansi"
description: "Disable coloured log output.\n"
info: null
direction: "input"
- type: "string"
name: "--cl_config"
description: "YAML formatted string that allows to customize MultiQC behaviour\
\ like input file detection.\n"
info: null
example:
- "qualimap_config: { general_stats_coverage: [20,40,200] }"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output format"
arguments:
- type: "boolean_true"
name: "--flat"
description: "Use only flat plots (static images).\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--interactive"
description: "Use only interactive plots (in-browser Javascript).\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--data_dir"
description: "Force the parsed data directory to be created.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_data_dir"
description: "Prevent the parsed data directory from being created.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--zip_data_dir"
description: "Compress the data directory.\n"
info: null
direction: "input"
- type: "string"
name: "--data_format"
description: "Output parsed data in a different format than the default 'txt'.\n"
info: null
required: false
choices:
- "tsv"
- "csv"
- "json"
- "yaml"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--pdf"
description: "Creates PDF report with the 'simple' template. Requires Pandoc to\
\ be installed.\n"
info: null
direction: "input"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "MultiQC aggregates results from bioinformatics analyses across many\
\ samples into a single report.\nIt searches a given directory for analysis logs\
\ and compiles a HTML report. It's a general use tool, perfect for summarising the\
\ output from numerous bioinformatics tools.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "test_data"
info:
keywords:
- "QC"
- "html report"
- "aggregate analysis"
links:
homepage: "https://multiqc.info/"
documentation: "https://multiqc.info/docs/"
repository: "https://github.com/MultiQC/MultiQC"
references:
doi: "10.1093/bioinformatics/btw354"
licence: "GPL v3 or later"
status: "enabled"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/biobox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
namespace_separator: "/"
setup:
- type: "docker"
run:
- "multiqc --version | sed 's/multiqc, version\\s\\(.*\\)/multiqc: \"\\1\"/' >\
\ /var/software_versions.txt\n"
test_setup:
- type: "apt"
packages:
- "jq"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/multiqc/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/multiqc"
executable: "target/nextflow/multiqc/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
info: null
viash_version: "0.9.0"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'multiqc'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
description = 'MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It\'s a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n'
author = 'Dorien Roosen'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,529 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "multiqc",
"description": "MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It\u0027s a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n",
"type": "object",
"definitions": {
"input" : {
"title": "Input",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: List of `file`, required, example: `data/results`, multiple_sep: `\";\"`. File paths to be searched for analysis results to be included in the report",
"help_text": "Type: List of `file`, required, example: `data/results`, multiple_sep: `\";\"`. File paths to be searched for analysis results to be included in the report.\n"
}
}
},
"ouput" : {
"title": "Ouput",
"type": "object",
"description": "No description",
"properties": {
"output_report": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_report.html`, example: `multiqc_report.html`. Filepath of the generated report",
"help_text": "Type: `file`, default: `$id.$key.output_report.html`, example: `multiqc_report.html`. Filepath of the generated report.\n"
,
"default":"$id.$key.output_report.html"
}
,
"output_data": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_data.output_data`, example: `multiqc_data`. Output directory for parsed data files",
"help_text": "Type: `file`, default: `$id.$key.output_data.output_data`, example: `multiqc_data`. Output directory for parsed data files. If not provided, parsed data will not be published.\n"
,
"default":"$id.$key.output_data.output_data"
}
,
"output_plots": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_plots.output_plots`, example: `multiqc_plots`. Output directory for generated plots",
"help_text": "Type: `file`, default: `$id.$key.output_plots.output_plots`, example: `multiqc_plots`. Output directory for generated plots. If not provided, plots will not be published.\n"
,
"default":"$id.$key.output_plots.output_plots"
}
}
},
"modules and analyses to run" : {
"title": "Modules and analyses to run",
"type": "object",
"description": "No description",
"properties": {
"include_modules": {
"type":
"string",
"description": "Type: List of `string`, example: `fastqc;cutadapt`, multiple_sep: `\";\"`. Use only these module",
"help_text": "Type: List of `string`, example: `fastqc;cutadapt`, multiple_sep: `\";\"`. Use only these module"
}
,
"exclude_modules": {
"type":
"string",
"description": "Type: List of `string`, example: `fastqc;cutadapt`, multiple_sep: `\";\"`. Do not use only these modules",
"help_text": "Type: List of `string`, example: `fastqc;cutadapt`, multiple_sep: `\";\"`. Do not use only these modules"
}
,
"ignore_analysis": {
"type":
"string",
"description": "Type: List of `string`, example: `run_one/*;run_two/*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `string`, example: `run_one/*;run_two/*`, multiple_sep: `\";\"`. "
}
,
"ignore_samples": {
"type":
"string",
"description": "Type: List of `string`, example: `sample_1*;sample_3*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `string`, example: `sample_1*;sample_3*`, multiple_sep: `\";\"`. "
}
,
"ignore_symlinks": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Ignore symlinked directories and files",
"help_text": "Type: `boolean_true`, default: `false`. Ignore symlinked directories and files"
,
"default":false
}
}
},
"sample name handling" : {
"title": "Sample name handling",
"type": "object",
"description": "No description",
"properties": {
"dirs": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Prepend directory to sample names to avoid clashing filenames",
"help_text": "Type: `boolean_true`, default: `false`. Prepend directory to sample names to avoid clashing filenames"
,
"default":false
}
,
"dirs_depth": {
"type":
"integer",
"description": "Type: `integer`. Prepend n directories to sample names",
"help_text": "Type: `integer`. Prepend n directories to sample names. Negative number to take from start of path."
}
,
"full_names": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Do not clean the sample names (leave as full file name)",
"help_text": "Type: `boolean_true`, default: `false`. Do not clean the sample names (leave as full file name)"
,
"default":false
}
,
"fn_as_s_name": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Use the log filename as the sample name",
"help_text": "Type: `boolean_true`, default: `false`. Use the log filename as the sample name"
,
"default":false
}
,
"replace_names": {
"type":
"string",
"description": "Type: `file`, example: `replace_names.tsv`. TSV file to rename sample names during report generation",
"help_text": "Type: `file`, example: `replace_names.tsv`. TSV file to rename sample names during report generation"
}
}
},
"report customisation" : {
"title": "Report Customisation",
"type": "object",
"description": "No description",
"properties": {
"title": {
"type":
"string",
"description": "Type: `string`. Report title",
"help_text": "Type: `string`. Report title. Printed as page header, used for filename if not otherwise specified.\n"
}
,
"comment": {
"type":
"string",
"description": "Type: `string`. Custom comment, will be printed at the top of the report",
"help_text": "Type: `string`. Custom comment, will be printed at the top of the report.\n"
}
,
"template": {
"type":
"string",
"description": "Type: `string`, choices: ``default`, `gathered`, `geo`, `highcharts`, `sections`, `simple``. Report template to use",
"help_text": "Type: `string`, choices: ``default`, `gathered`, `geo`, `highcharts`, `sections`, `simple``. Report template to use.\n",
"enum": ["default", "gathered", "geo", "highcharts", "sections", "simple"]
}
,
"sample_names": {
"type":
"string",
"description": "Type: `file`, example: `sample_names.tsv`. TSV file containing alternative sample names for renaming buttons in the report",
"help_text": "Type: `file`, example: `sample_names.tsv`. TSV file containing alternative sample names for renaming buttons in the report.\n"
}
,
"sample_filters": {
"type":
"string",
"description": "Type: `file`, example: `sample_filters.tsv`. TSV file containing show/hide patterns for the report\n",
"help_text": "Type: `file`, example: `sample_filters.tsv`. TSV file containing show/hide patterns for the report\n"
}
,
"custom_css_file": {
"type":
"string",
"description": "Type: `file`, example: `custom_style_sheet.css`. Custom CSS file to add to the final report\n",
"help_text": "Type: `file`, example: `custom_style_sheet.css`. Custom CSS file to add to the final report\n"
}
,
"profile_runtime": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Add analysis of how long MultiQC takes to run to the report\n",
"help_text": "Type: `boolean_true`, default: `false`. Add analysis of how long MultiQC takes to run to the report\n"
,
"default":false
}
}
},
"multiqc behaviour" : {
"title": "MultiQC behaviour",
"type": "object",
"description": "No description",
"properties": {
"verbose": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Increase output verbosity",
"help_text": "Type: `boolean_true`, default: `false`. Increase output verbosity.\n"
,
"default":false
}
,
"quiet": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Only show log warnings\n",
"help_text": "Type: `boolean_true`, default: `false`. Only show log warnings\n"
,
"default":false
}
,
"strict": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Don\u0027t catch exceptions, run additional code checks to help development",
"help_text": "Type: `boolean_true`, default: `false`. Don\u0027t catch exceptions, run additional code checks to help development.\n"
,
"default":false
}
,
"development": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Development mode",
"help_text": "Type: `boolean_true`, default: `false`. Development mode. Do not compress and minimise JS, export uncompressed plot data.\n"
,
"default":false
}
,
"require_logs": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Require all explicitly requested modules to have log files",
"help_text": "Type: `boolean_true`, default: `false`. Require all explicitly requested modules to have log files. If not, MultiQC will exit with an error.\n"
,
"default":false
}
,
"no_megaqc_upload": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Don\u0027t upload generated report to MegaQC, even if MegaQC options are found",
"help_text": "Type: `boolean_true`, default: `false`. Don\u0027t upload generated report to MegaQC, even if MegaQC options are found.\n"
,
"default":false
}
,
"no_ansi": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Disable coloured log output",
"help_text": "Type: `boolean_true`, default: `false`. Disable coloured log output.\n"
,
"default":false
}
,
"cl_config": {
"type":
"string",
"description": "Type: `string`, example: `qualimap_config: { general_stats_coverage: [20,40,200] }`. YAML formatted string that allows to customize MultiQC behaviour like input file detection",
"help_text": "Type: `string`, example: `qualimap_config: { general_stats_coverage: [20,40,200] }`. YAML formatted string that allows to customize MultiQC behaviour like input file detection.\n"
}
}
},
"output format" : {
"title": "Output format",
"type": "object",
"description": "No description",
"properties": {
"flat": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Use only flat plots (static images)",
"help_text": "Type: `boolean_true`, default: `false`. Use only flat plots (static images).\n"
,
"default":false
}
,
"interactive": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Use only interactive plots (in-browser Javascript)",
"help_text": "Type: `boolean_true`, default: `false`. Use only interactive plots (in-browser Javascript).\n"
,
"default":false
}
,
"data_dir": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Force the parsed data directory to be created",
"help_text": "Type: `boolean_true`, default: `false`. Force the parsed data directory to be created.\n"
,
"default":false
}
,
"no_data_dir": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Prevent the parsed data directory from being created",
"help_text": "Type: `boolean_true`, default: `false`. Prevent the parsed data directory from being created.\n"
,
"default":false
}
,
"zip_data_dir": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Compress the data directory",
"help_text": "Type: `boolean_true`, default: `false`. Compress the data directory.\n"
,
"default":false
}
,
"data_format": {
"type":
"string",
"description": "Type: `string`, choices: ``tsv`, `csv`, `json`, `yaml``. Output parsed data in a different format than the default \u0027txt\u0027",
"help_text": "Type: `string`, choices: ``tsv`, `csv`, `json`, `yaml``. Output parsed data in a different format than the default \u0027txt\u0027.\n",
"enum": ["tsv", "csv", "json", "yaml"]
}
,
"pdf": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Creates PDF report with the \u0027simple\u0027 template",
"help_text": "Type: `boolean_true`, default: `false`. Creates PDF report with the \u0027simple\u0027 template. Requires Pandoc to be installed.\n"
,
"default":false
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input"
},
{
"$ref": "#/definitions/ouput"
},
{
"$ref": "#/definitions/modules and analyses to run"
},
{
"$ref": "#/definitions/sample name handling"
},
{
"$ref": "#/definitions/report customisation"
},
{
"$ref": "#/definitions/multiqc behaviour"
},
{
"$ref": "#/definitions/output format"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,191 @@
name: "interop_summary_to_csv"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Sequencing run folder (*not* InterOp folder)."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_run_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_index_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "iseq-DI"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "summary"
- "index-summary"
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
- "wget"
interactive: false
- type: "docker"
run:
- "wget https://github.com/Illumina/interop/releases/download/v1.3.1/interop-1.3.1-Linux-GNU.tar.gz\
\ -O /tmp/interop.tar.gz && \\\ntar -C /tmp/ --no-same-owner --no-same-permissions\
\ -xvf /tmp/interop.tar.gz && \\\nmv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary\
\ /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/interop_summary_to_csv/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/interop_summary_to_csv"
executable: "target/executable/io/interop_summary_to_csv/interop_summary_to_csv"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,233 @@
name: "publish"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Directory to write fastq data to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_falco"
description: "Directory to write falco output to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--input_multiqc"
description: "Location where to write the MultiQC report to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_run_information"
description: "Location where to write the run information to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
info: null
default:
- "fastq"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_falco"
info: null
default:
- "qc/fastqc"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_multiqc"
info: null
default:
- "qc/multiqc_report.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_run_information"
info: null
default:
- "run_information.csv"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "code.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Publish the processed results of the run"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/publish/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/publish"
executable: "target/executable/io/publish/publish"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,190 @@
name: "untar"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Tarball file to be unpacked."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write the contents of the .tar file to."
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "string"
name: "--exclude"
alternatives:
- "-e"
description: "Prevents any file or member whose name matches the shell wildcard\
\ (pattern) from being extracted."
info: null
example:
- "docs/figures"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Unpack a .tar file. When the contents of the .tar file is just a single\
\ directory,\nput the contents of the directory into the output folder instead of\
\ that directory.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/untar/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/untar"
executable: "target/executable/io/untar/untar"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

1167
target/executable/io/untar/untar Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,199 @@
name: "combine_samples"
namespace: "dataflow"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "string"
name: "--id"
description: "ID of the new event"
info: null
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--forward_input"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--reverse_input"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--falco_dir"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_forward"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_reverse"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_falco"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Combine fastq files from across samples into one event with a list of\
\ fastq files per orientation."
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/dataflow/combine_samples/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/dataflow/combine_samples"
executable: "target/nextflow/dataflow/combine_samples/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'dataflow/combine_samples'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'Combine fastq files from across samples into one event with a list of fastq files per orientation.'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,147 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "combine_samples",
"description": "Combine fastq files from across samples into one event with a list of fastq files per orientation.",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type":
"string",
"description": "Type: `string`, required. ID of the new event",
"help_text": "Type: `string`, required. ID of the new event"
}
,
"forward_input": {
"type":
"string",
"description": "Type: List of `file`, required, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, multiple_sep: `\";\"`. "
}
,
"reverse_input": {
"type":
"string",
"description": "Type: List of `file`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, multiple_sep: `\";\"`. "
}
,
"falco_dir": {
"type":
"string",
"description": "Type: `file`, required. ",
"help_text": "Type: `file`, required. "
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_forward": {
"type":
"string",
"description": "Type: List of `file`, required, default: `$id.$key.output_forward_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, default: `$id.$key.output_forward_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.output_forward_*"
}
,
"output_reverse": {
"type":
"string",
"description": "Type: List of `file`, default: `$id.$key.output_reverse_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, default: `$id.$key.output_reverse_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.output_reverse_*"
}
,
"output_falco": {
"type":
"string",
"description": "Type: List of `file`, required, default: `$id.$key.output_falco_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, default: `$id.$key.output_falco_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.output_falco_*"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,175 @@
name: "gather_fastqs_and_validate"
namespace: "dataflow"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Directory containing .fastq files"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_sheet"
description: "Sample sheet"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--fastq_forward"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--fastq_reverse"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "From a directory containing fastq files, gather the files per sample\
\ \nand validate according to the contents of the sample sheet.\n"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/dataflow/gather_fastqs_and_validate/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/dataflow/gather_fastqs_and_validate"
executable: "target/nextflow/dataflow/gather_fastqs_and_validate/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'dataflow/gather_fastqs_and_validate'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,116 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "gather_fastqs_and_validate",
"description": "From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory containing ",
"help_text": "Type: `file`, required. Directory containing .fastq files"
}
,
"sample_sheet": {
"type":
"string",
"description": "Type: `file`, required. Sample sheet",
"help_text": "Type: `file`, required. Sample sheet"
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_forward": {
"type":
"string",
"description": "Type: List of `file`, required, default: `$id.$key.fastq_forward_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, default: `$id.$key.fastq_forward_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.fastq_forward_*"
}
,
"fastq_reverse": {
"type":
"string",
"description": "Type: List of `file`, default: `$id.$key.fastq_reverse_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, default: `$id.$key.fastq_reverse_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.fastq_reverse_*"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,290 @@
name: "demultiplex"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "string"
name: "--id"
description: "Unique identifier for the run"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input"
description: "Directory containing raw sequencing data"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_information"
description: "CSV file containing sample information, which will be used as \n\
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)\n\
or 'RunManifest.csv' (Element Biosciences). If not specified,\nwill try to autodetect\
\ the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--demultiplexer"
description: "Demultiplexer to use, choice depends on the provider\nof the instrument\
\ that was used to generate the data.\nWhen not using --sample_sheet, specifying\
\ this argument is not\nrequired.\n"
info: null
required: false
choices:
- "bases2fastq"
- "bclconvert"
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write fastq data to"
info: null
default:
- "$id/fastq"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_falco"
description: "Directory to write falco output to"
info: null
default:
- "$id/qc/fastqc"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_multiqc"
description: "Directory to write falco output to"
info: null
default:
- "$id/qc/multiqc_report.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_run_information"
info: null
default:
- "$id/run_information.csv"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "boolean_true"
name: "--skip_copycomplete_check"
description: "Disable the check for the presence of a \"CopyComplete.txt\" file\
\ in input\ndirectory in case of Illumina data.\n"
info: null
direction: "input"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Demultiplexing of raw sequencing data"
test_resources:
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_illumina"
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_bases2fastq"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
dependencies:
- name: "io/untar"
repository:
type: "local"
- name: "dataflow/gather_fastqs_and_validate"
repository:
type: "local"
- name: "io/interop_summary_to_csv"
repository:
type: "local"
- name: "dataflow/combine_samples"
repository:
type: "local"
- name: "bcl_convert"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
- name: "bases2fastq"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
- name: "falco"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
- name: "multiqc"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
repositories:
- type: "vsh"
name: "bb"
repo: "biobox"
tag: "v0.3.0"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/demultiplex/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/demultiplex"
executable: "target/nextflow/demultiplex/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
dependencies:
- "target/nextflow/io/untar"
- "target/nextflow/dataflow/gather_fastqs_and_validate"
- "target/nextflow/io/interop_summary_to_csv"
- "target/nextflow/dataflow/combine_samples"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bcl_convert"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bases2fastq"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/falco"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/multiqc"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'demultiplex'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'Demultiplexing of raw sequencing data'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,185 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "demultiplex",
"description": "Demultiplexing of raw sequencing data",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type":
"string",
"description": "Type: `string`. Unique identifier for the run",
"help_text": "Type: `string`. Unique identifier for the run"
}
,
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory containing raw sequencing data",
"help_text": "Type: `file`, required. Directory containing raw sequencing data"
}
,
"run_information": {
"type":
"string",
"description": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer. Canonically called \u0027SampleSheet.csv\u0027 (Illumina)\nor \u0027RunManifest.csv\u0027 (Element Biosciences). If not specified,\nwill try to autodetect the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
}
,
"demultiplexer": {
"type":
"string",
"description": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data",
"help_text": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"enum": ["bases2fastq", "bclconvert"]
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id/fastq`. Directory to write fastq data to",
"help_text": "Type: `file`, default: `$id/fastq`. Directory to write fastq data to"
,
"default":"$id/fastq"
}
,
"output_falco": {
"type":
"string",
"description": "Type: List of `file`, default: `$id/qc/fastqc`, multiple_sep: `\";\"`. Directory to write falco output to",
"help_text": "Type: List of `file`, default: `$id/qc/fastqc`, multiple_sep: `\";\"`. Directory to write falco output to"
,
"default":"$id/qc/fastqc"
}
,
"output_multiqc": {
"type":
"string",
"description": "Type: `file`, default: `$id/qc/multiqc_report.html`. Directory to write falco output to",
"help_text": "Type: `file`, default: `$id/qc/multiqc_report.html`. Directory to write falco output to"
,
"default":"$id/qc/multiqc_report.html"
}
,
"output_run_information": {
"type":
"string",
"description": "Type: `file`, required, default: `$id/run_information.csv`. ",
"help_text": "Type: `file`, required, default: `$id/run_information.csv`. "
,
"default":"$id/run_information.csv"
}
}
},
"other arguments" : {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"skip_copycomplete_check": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Disable the check for the presence of a \"CopyComplete",
"help_text": "Type: `boolean_true`, default: `false`. Disable the check for the presence of a \"CopyComplete.txt\" file in input\ndirectory in case of Illumina data.\n"
,
"default":false
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/other arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,191 @@
name: "interop_summary_to_csv"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Sequencing run folder (*not* InterOp folder)."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_run_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_index_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "iseq-DI"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "summary"
- "index-summary"
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
- "wget"
interactive: false
- type: "docker"
run:
- "wget https://github.com/Illumina/interop/releases/download/v1.3.1/interop-1.3.1-Linux-GNU.tar.gz\
\ -O /tmp/interop.tar.gz && \\\ntar -C /tmp/ --no-same-owner --no-same-permissions\
\ -xvf /tmp/interop.tar.gz && \\\nmv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary\
\ /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/interop_summary_to_csv/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/io/interop_summary_to_csv"
executable: "target/nextflow/io/interop_summary_to_csv/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,124 @@
manifest {
name = 'io/interop_summary_to_csv'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,106 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "interop_summary_to_csv",
"description": "No description",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Sequencing run folder (*not* InterOp folder)",
"help_text": "Type: `file`, required. Sequencing run folder (*not* InterOp folder)."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_run_summary": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_run_summary`. ",
"help_text": "Type: `file`, required, default: `$id.$key.output_run_summary`. "
,
"default":"$id.$key.output_run_summary"
}
,
"output_index_summary": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_index_summary`. ",
"help_text": "Type: `file`, required, default: `$id.$key.output_index_summary`. "
,
"default":"$id.$key.output_index_summary"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,233 @@
name: "publish"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Directory to write fastq data to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_falco"
description: "Directory to write falco output to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--input_multiqc"
description: "Location where to write the MultiQC report to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_run_information"
description: "Location where to write the run information to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
info: null
default:
- "fastq"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_falco"
info: null
default:
- "qc/fastqc"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_multiqc"
info: null
default:
- "qc/multiqc_report.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_run_information"
info: null
default:
- "run_information.csv"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "code.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Publish the processed results of the run"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/publish/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/io/publish"
executable: "target/nextflow/io/publish/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'io/publish'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'Publish the processed results of the run'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,158 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "publish",
"description": "Publish the processed results of the run",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory to write fastq data to",
"help_text": "Type: `file`, required. Directory to write fastq data to"
}
,
"input_falco": {
"type":
"string",
"description": "Type: List of `file`, required, multiple_sep: `\";\"`. Directory to write falco output to",
"help_text": "Type: List of `file`, required, multiple_sep: `\";\"`. Directory to write falco output to"
}
,
"input_multiqc": {
"type":
"string",
"description": "Type: `file`, required. Location where to write the MultiQC report to",
"help_text": "Type: `file`, required. Location where to write the MultiQC report to."
}
,
"input_run_information": {
"type":
"string",
"description": "Type: `file`, required. Location where to write the run information to",
"help_text": "Type: `file`, required. Location where to write the run information to."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `fastq`. ",
"help_text": "Type: `file`, default: `fastq`. "
,
"default":"fastq"
}
,
"output_falco": {
"type":
"string",
"description": "Type: `file`, default: `qc/fastqc`. ",
"help_text": "Type: `file`, default: `qc/fastqc`. "
,
"default":"qc/fastqc"
}
,
"output_multiqc": {
"type":
"string",
"description": "Type: `file`, default: `qc/multiqc_report.html`. ",
"help_text": "Type: `file`, default: `qc/multiqc_report.html`. "
,
"default":"qc/multiqc_report.html"
}
,
"output_run_information": {
"type":
"string",
"description": "Type: `file`, default: `run_information.csv`. ",
"help_text": "Type: `file`, default: `run_information.csv`. "
,
"default":"run_information.csv"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,190 @@
name: "untar"
namespace: "io"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Tarball file to be unpacked."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write the contents of the .tar file to."
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "string"
name: "--exclude"
alternatives:
- "-e"
description: "Prevents any file or member whose name matches the shell wildcard\
\ (pattern) from being extracted."
info: null
example:
- "docs/figures"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Unpack a .tar file. When the contents of the .tar file is just a single\
\ directory,\nput the contents of the directory into the output folder instead of\
\ that directory.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "add_summary_to_csv_tests"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/untar/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/io/untar"
executable: "target/nextflow/io/untar/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'io/untar'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,119 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "untar",
"description": "Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Tarball file to be unpacked",
"help_text": "Type: `file`, required. Tarball file to be unpacked."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output`. Directory to write the contents of the ",
"help_text": "Type: `file`, required, default: `$id.$key.output`. Directory to write the contents of the .tar file to."
,
"default":"$id.$key.output"
}
}
},
"other arguments" : {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"exclude": {
"type":
"string",
"description": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted",
"help_text": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted."
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/other arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,232 @@
name: "runner"
version: "add_summary_to_csv_tests"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Base directory of the canonical form `s3://<bucket>/<path>/<RunID>/`.\n\
A tarball (tar.gz, .tgz, .tar) containing run information can be provided in\
\ which\ncase the RunID is set to the name of the tarball without the extension.\n"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_information"
description: "CSV file containing sample information, which will be used as \n\
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)\n\
or 'RunManifest.csv' (Element Biosciences). If not specified,\nwill try to autodetect\
\ the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--demultiplexer"
description: "Demultiplexer to use, choice depends on the provider\nof the instrument\
\ that was used to generate the data.\nWhen not using --sample_sheet, specifying\
\ this argument is not\nrequired.\n"
info: null
required: false
choices:
- "bases2fastq"
- "bclconvert"
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Annotation flags"
arguments:
- type: "boolean_true"
name: "--plain_output"
description: "Flag to indicate that the output should be stored directly under\
\ $publish_dir rather than\nunder a subdirectory structure runID/<date_time>_demultiplex_<version>/.\n"
info: null
direction: "input"
- name: "Output arguments"
arguments:
- type: "file"
name: "--fastq_output"
info: null
default:
- "fastq"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--falco_output"
info: null
default:
- "qc/fastqc"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--multiqc_output"
info: null
default:
- "qc/multiqc_report.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "boolean_true"
name: "--skip_copycomplete_check"
description: "Disable the check for the presence of a \"CopyComplete.txt\" file\
\ in input\ndirectory in case of Illumina data.\n"
info: null
direction: "input"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
description: "Runner for demultiplexing of raw sequencing data"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
dependencies:
- name: "demultiplex"
repository:
type: "local"
- name: "io/publish"
repository:
type: "local"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/runner/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/runner"
executable: "target/nextflow/runner/main.nf"
viash_version: "0.9.4"
git_commit: "6796c074cf57a458b7cc7aeb0b992e8108c5c503"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.1.1-27-g6796c07"
dependencies:
- "target/nextflow/demultiplex"
- "target/nextflow/io/publish"
package_config:
name: "demultiplex"
version: "add_summary_to_csv_tests"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'add_summary_to_csv_tests'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'runner'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'add_summary_to_csv_tests'
description = 'Runner for demultiplexing of raw sequencing data'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,189 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "runner",
"description": "Runner for demultiplexing of raw sequencing data",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Base directory of the canonical form `s3://\u003cbucket\u003e/\u003cpath\u003e/\u003cRunID\u003e/`",
"help_text": "Type: `file`, required. Base directory of the canonical form `s3://\u003cbucket\u003e/\u003cpath\u003e/\u003cRunID\u003e/`.\nA tarball (tar.gz, .tgz, .tar) containing run information can be provided in which\ncase the RunID is set to the name of the tarball without the extension.\n"
}
,
"run_information": {
"type":
"string",
"description": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer. Canonically called \u0027SampleSheet.csv\u0027 (Illumina)\nor \u0027RunManifest.csv\u0027 (Element Biosciences). If not specified,\nwill try to autodetect the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
}
,
"demultiplexer": {
"type":
"string",
"description": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data",
"help_text": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"enum": ["bases2fastq", "bclconvert"]
}
}
},
"annotation flags" : {
"title": "Annotation flags",
"type": "object",
"description": "No description",
"properties": {
"plain_output": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Flag to indicate that the output should be stored directly under $publish_dir rather than\nunder a subdirectory structure runID/\u003cdate_time\u003e_demultiplex_\u003cversion\u003e/",
"help_text": "Type: `boolean_true`, default: `false`. Flag to indicate that the output should be stored directly under $publish_dir rather than\nunder a subdirectory structure runID/\u003cdate_time\u003e_demultiplex_\u003cversion\u003e/.\n"
,
"default":false
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_output": {
"type":
"string",
"description": "Type: `file`, default: `fastq`. ",
"help_text": "Type: `file`, default: `fastq`. "
,
"default":"fastq"
}
,
"falco_output": {
"type":
"string",
"description": "Type: `file`, default: `qc/fastqc`. ",
"help_text": "Type: `file`, default: `qc/fastqc`. "
,
"default":"qc/fastqc"
}
,
"multiqc_output": {
"type":
"string",
"description": "Type: `file`, default: `qc/multiqc_report.html`. ",
"help_text": "Type: `file`, default: `qc/multiqc_report.html`. "
,
"default":"qc/multiqc_report.html"
}
}
},
"other arguments" : {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"skip_copycomplete_check": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Disable the check for the presence of a \"CopyComplete",
"help_text": "Type: `boolean_true`, default: `false`. Disable the check for the presence of a \"CopyComplete.txt\" file in input\ndirectory in case of Illumina data.\n"
,
"default":false
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/annotation flags"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/other arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}