Build branch demultiplex/update_biobox with version update_biobox to demultiplex on branch update_biobox (4076dac)

Build pipeline: viash-hub.demultiplex.update-biobox-mln6w

Source commit: 4076dacab7

Source message: Bump biobox to v0.4.0
This commit is contained in:
CI
2025-09-17 11:04:59 +00:00
commit dc160be0ff
138 changed files with 61792 additions and 0 deletions

8
.gitignore vendored Normal file
View File

@@ -0,0 +1,8 @@
target
testData
test_resources
# Nextflow related files
.nextflow
.nextflow.log*
work

267
CHANGELOG.md Normal file
View File

@@ -0,0 +1,267 @@
# demultiplex v0.6.0
# New functionality
* Bump biobox to `v0.4.0`. This this updates `bases2fastq` to version `2.2.0` and
enables support for demultiplexing output from sequencers running AVITI OS 3.4.0 (PR #).
# demultiplex v0.5.1
## Bug fixes
* Fix disabling `publishFilesProc` and `publishStateProc` for `runner` workflow (PR #63).
* Avoid double slashes in the publish directory path in order to not create empty objects on S3 (PR #64).
# demultiplex v0.5.0
## Breaking changes
* `runner`: the name of the output directory is now determined based on the value for the event `id` instead of the name of the input folder (PR #62).
## Bug fixes
* Disable `publishFilesProc` for `runner` workflow (PR #60).
# demultiplex v0.4.4
## Bug fixes
* Only add the `transfer_complete.txt` files when the exitcode for the workflow is 0 (PR #58)
# demultiplex v0.4.3
## Minor changes
* The `runner` creates a `transfer_completed.txt` file when the publishing of the output has finished (PR #57).
# demultiplex v0.4.2
## Minor changes
* Provide output from `runner` workflow so it can be used as part of a larger workflow (PR #56).
* Add workflow identifier to version information during pipeline run (PR #56).
# demultiplex v0.4.1
## Minor changes
* Split off part of the workflow logic (`detect_demultiplexer`) from the main workflow to a dedicated subworkflow (PR #52).
* Add the package config (`_viash.yaml`) to every component's target dir. This makes introspection from, e.g. a `runner` workflow much more robust (PR #53).
# demultiplex v0.4.0
## Breaking changes
* Falco has been replaced with FastQC. Falco generates FastQC compatible output, but fails to run on empty FASTQ files (PR #51).
- `runner` workflow: `falco_output` has been renamed to `output_sample_qc`.
- `demultiplex` workflow: `output_falco` has been renamed to `output_sample_qc`.
- The output file names from the sample QC no longer contains the input file extensions. Instead, the sample name is used.
(for example `sample1_S1_R2_001.fastq.gz_fastqc_report.html` becomes `sample1_S1_R2_001_fastqc_report.html`)
* `demultiplex` workflow: `output_multiqc` argument has been renamed to `multiqc_output` in order to align inner workflow and runner (PR #51).
# demultiplex v0.3.12
## New features
* Add support for Nextflow versions version starting 25.xx.xx (PR #50).
## Bug fixes
* Allow FASTQ files for `Undetermined` to be empty (PR #50).
# demultiplex v0.3.11
## New features
* Output demultiplexer logs and metrics (PR #41).
# demultiplex v0.3.10
## Minor changes
* Moved the test resources to their new location (PR #37).
# demultiplex v0.3.9
## Bug fixes
* Fix defaults for output arguments in nextflow schema's.
* Fix an issue where an integer being passed to a argument with `type: double` resulted in an error (PR #44).
## Minor changes
* Bump viash to 0.9.4, which adds support for nextflow versions starting major version 25.01 (PR #43 and #44).
# demultiplex v0.3.8
## Bug fixes
* Provide a proper error when a FASTQ file is empty after demultiplexing (PR #40).
# demultiplex v0.3.7
## Minor updates
* Ignore lines starting with '#' when parsing run information CSV (PR #39).
# demultiplex v0.3.6
## Minor updates
* Allow letter case variants for headers when looking for sample information in run information CSV (PR #38).
# demultiplex v0.3.5
## Breaking changes
* The `demultiplex` workflow now outputs a list of directories
for the `output_falco` argument (one for each barcode) instead of one directory
for the complete run. The output from the `runner` workflow remained
unchanged (PR #33).
## Minor updates
* In case Illumina data is detected in the input folder, check for the presence of the 'copyComplete.txt' file.
This check can be disabled using `--skip_copycomplete_check` (PR #34).
# demultiplex v0.3.4
## Minor updates
* Resource labels are now automatically included during build (PR #32).
# demultiplex v0.3.3
## Breaking change
- The `runner` defines the output differently now:
- The last part of the `--input` path is expected to be the run ID and this run ID is used to create the output directory.
- If the input is `file.tar.gz` instead of a directory, the `file` part is used as the run ID.
- The output structure is then as follows:
```
$publish_dir/<run_id>/<date_time_stamp>_demultiplex_<version>/
```
For instance:
```
$publish_dir
└── 200624_A00834_0183_BHMTFYDRXX
└── 20241217_051404_demultiplex_v1.2
├── run_information.csv
├── fastq
│   ├── Sample1_S1_L001_R1_001.fastq.gz
│   ├── Sample23_S3_L001_R1_001.fastq.gz
│   ├── SampleA_S2_L001_R1_001.fastq.gz
│   ├── Undetermined_S0_L001_R1_001.fastq.gz
│   └── sampletest_S4_L001_R1_001.fastq.gz
└── qc
├── fastqc
│   ├── Sample1_S1_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Sample1_S1_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Sample1_S1_L001_R1_001.fastq.gz_summary.txt
│   ├── Sample23_S3_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Sample23_S3_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Sample23_S3_L001_R1_001.fastq.gz_summary.txt
│   ├── SampleA_S2_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── SampleA_S2_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── SampleA_S2_L001_R1_001.fastq.gz_summary.txt
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_fastqc_report.html
│   ├── Undetermined_S0_L001_R1_001.fastq.gz_summary.txt
│   ├── sampletest_S4_L001_R1_001.fastq.gz_fastqc_data.txt
│   ├── sampletest_S4_L001_R1_001.fastq.gz_fastqc_report.html
│   └── sampletest_S4_L001_R1_001.fastq.gz_summary.txt
└── multiqc_report.html
```
- This logic can be avoided by providing the flag `--plain_output`.
# Minor updates
* Added `output_run_information` argument that copies the run information file to the output (PR #31).
# demultiplex v0.3.2
# Bug fixes
* Ignore empty CSV entries when parsing sample information (PR #29).
# demultiplex v0.3.1
# Minor updates
* Add `--run_information` and `--demultiplexer` arguments to `runner` workflow (PR #27).
# Bug fixes
* Fix detection of sample IDs from Illumina V2 sample sheets (PR #28).
* Provide a clear error message when `--run_information` is provided but not `--demultiplexer` (PR #27).
# demultiplex v0.3.0
## Major updates
The outflow of the workflow has been refactored to be more flexible (PR #19). This is done by creating a wrapper workflow `runner` that wraps the native `demultiplex` workflow. The `runner` workflow is responsible for setting the output directory based on the input arguments:
3 arguments exist for specifying the relative location of the 3 _outputs_ of the workflow:
- `fastq_output`: The directory where the demultiplexed fastq files are stored.
- `falco_output`: the directory for the `fastqc`/`falco` reports.
- `multiqc_output`: The filename for the `multiqc` report.
The target location path is determined by the following logic:
- If no `id` is provided, the output directory is set to `$publish_dir`.
- If an `id` is explicitly set using Seqera Cloud or by adding `--id <>`, the output directory is set to `$publish_dir/<id>`.
The workflow has two optional flags to be used in combination with `--id`:
- `--add_date_time`: rather than publishing the results under `$publish_dir`, this adds an additional layer `$publish_dir/<date-time-stamp>/`. This is useful when you want to keep track of multiple runs of the workflow (example: `240322_143020`).
- `--add_workflow_id`: adding this flag will add `_demultiplex_<version>` to the output directory (example: `demultiplex_v0.2.0`). When starting the workflow from a non-release, the version will be set to `version_unkonwn`.
The default structure in the output directory is:
- Two sub-directories:
- `fastq`
- `qc` for the reports:
- `multiqc_report.html`
- `fastqc/` directory containing the different fastqc (falco) reports.
The `$publish_dir` variable corresponds to the argument provided with `--publish-dir`. The `date-time-stamp` is generated by the workflow based on when it was launched and is thus guaranteed to be unique.
# demultiplex v0.2.0
## Breaking changes
* `demultiplex` workflow: renamed `sample_sheet` argument to `run_information` (PR #24)
## New features
* Add support for `bases2fastq` demultiplexer (PR #24)
## Minor updates
* Add resource labels to workflows (PR #21).
# demultiplex v0.1.1
## Minor updates
* Bump viash to 0.9.0 (PR #14).
* `demultiplex` workflow: use `v0.2.0` release instead of `main` branch for `biobox` dependencies (PR #11).
* Renamed `biobase` repository to `biobox` (PR #13 and PR #15).
# demultiplex v0.1.0
Initial release

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024 Data Intuitive
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

219
README.md Normal file
View File

@@ -0,0 +1,219 @@
# Demultiplex.vsh
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data.
Currently data from Illumina and Element Biosciences sequencers are
supported.
[![ViashHub](https://img.shields.io/badge/ViashHub-demultiplex-7a4baa.svg)](https://web.viash-hub.com/packages/demultiplex)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fdemultiplex-blue.svg)](https://github.com/viash-hub/demultiplex)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.4-blue)](https://viash.io)
## Introcuction
This workflow is designed to demultiplex raw RNA-seq sequencing data
from Illumina and Element Biosciences sequencers.
The workflow is built in a modular fashion, where most of the base
functionality is provided by components from
[`biobox`](https://www.viash-hub.com/packages/biobox/latest)
supplemented by custom base components and workflow components in this
package. Each of these components can be used independently as
stand-alone modules with a standardized interface.
The full workflow can be run in two ways:
1. Run the [main
workflow](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex)
containing the main functionality.
2. Run the [(opinianated)
`runner`](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/runner)
where a number of choices (input/output structure and location) have
been made.
## Workflow Overview
The workflow executes the following steps:
1. Unpacking the input data (when a TAR archive is provided)
2. Run `bclconvert` or `bases2fastq`
3. Run `falco` and convert Illumina InterOp information to csv
4. Run `multiqc` to generate a report
## Example usage
Two variants of the same workflow are provided, depending on the
flexibility in the ouput structure required:
- The `runner` workflow provides a predifined output structure. It
requires the minimal amount of parameters to be provided, at the cost
of being less flexible. It is located at
`target/nextflow/runner/main.nf`
- The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`)
allows for more fine-grained tuning, but required more parameters to
be provided.
### Test data
We have provided test data at
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
(Illumina), but please feel free to bring your own. The URL of the test
data can be provided as-is to the workflow, or you can download
everything and specify a local path.
The input data should follow the structure of either Illumina or Element
Biosciences sequencers. The workflow will automatically detect which
demultiplexer to use (`bclconvert` or `bases2fastq`) based on the
presence of either `SampleSheet.csv` or `RunParameters.xml` in the input
directory. Demultiplexer can also be set explicitly using the
`--demultiplexer` parameter.
### Setup
In order to use the workflows in this package, youll need to do the
following:
- Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
- Install a nextflow compatible executor. This workflow provides a
profile for [docker](https://docs.docker.com/get-started/).
### Run from Viash Hub
1. Open [Viash Hub](https://www.viash-hub.com) and browse to the
[demultiplex
component](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex).
Press the Launch button and follow the instructions.
![](assets/demultiplex-launch-small.png)
2. We will start an example run and set profile to `docker`.
![](assets/demultiplex-launch-parameters-1.png)
3. In the next step, we provide the paramters as follows and leave the
rest as defalut:
- `input`:
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
![](assets/demultiplex-launch-parameters-2.png)
Press the Launch button at the end to get the instructions on how to
run the workflow from the CLI.
### Run using NF-Tower / Seqera Cloud
Its possible to run the workflow directly from [Seqera
Cloud](https://cloud.seqera.io). The necessary [Nextflow schema
file](https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/)
has been built and provided with the workflows in order to use the
form-based input.
1. Select the option to run the workflow using Seqera Cloud. You will
need to create an API token for your account. Once this token is
filled in in the corresponding field, we will get the option to
select a Workspace and a Compute environment.
![](assets/demultiplex-launch-parameters-3.png)
2. Provide the parameters similar to the previous step.
3. In the next screen, pressing the Launch button will actually start
the workflow on Seqera Cloud. A message is shown when the submit was
successful.
![](assets/demultiplex-launch-parameters-4.png)
### Setting up SCM
In order to let nextflow use the viash-hub workflows, you need to setup
a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration)
file. This can be done once by creating `$HOME/.nextflow/scm` and adding
the following:
providers {
vsh {
platform = 'gitlab'
server = "packages.viash-hub.com"
}
}
Alternatively, a custom location for the SCM file can be specified using
the `NXF_SCM_FILE` environment variable.
You can check if everything is working by getting the `--help` for a
workflow:
``` bash
nextflow run \
vsh/demultiplex \
-r v0.3.11 \
--help
```
### Run from the CLI
Running from the CLI directly without using Viash hub is possible as
well. The easiest is to use the integrated help functionality, for
instance using the following:
``` bash
nextflow run vsh/demultiplex \
-revision v0.3.11 \
-main-script target/nextflow/workflows/runner/main.nf \
--help
```
Having this project available locally, you can run the following
command:
``` bash
nextflow run vsh/demultiplex \
-r v0.3.11 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
--demultiplexer bclconvert \
--skip_copycomplete_check \
--publish_dir example_output/ \
-profile docker \
-c src/config/labels.config
```
### (Optional) Resource usage tuning
Nextflows labels can be used to specify the amount of resources a
process can use. This workflow uses the following labels for CPU and
memory:
- `verylowmem`, `lowmem`, `midmem`, `highmem`
- `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
The defaults for these labels can be found at
`src/config/labels.config`. Nextflow checks that the specified resources
for a process do not exceed what is available on the machine and will
not start if it does. Create your own config file to tune the labels to
your needs, for example:
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }
withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }
When starting nextflow using the CLI, you can use `-c` to provide the
file to nextflow and overwrite the defaults.
## Acknowledgements
Developed in collaboration with Data Intuitive and Open Analytics.

191
README.qmd Normal file
View File

@@ -0,0 +1,191 @@
---
format: gfm
---
```{r setup, include=FALSE}
project <- yaml::read_yaml("_viash.yaml")
license <- paste0(project$links$repository, "/blob/main/LICENSE")
```
# Demultiplex.vsh
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data. Currently data from Illumina and Element Biosciences sequencers are supported.
[![ViashHub](https://img.shields.io/badge/ViashHub-demultiplex-7a4baa.svg)](https://web.viash-hub.com/packages/demultiplex)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fdemultiplex-blue.svg)](https://github.com/viash-hub/demultiplex)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.4-blue)](https://viash.io)
## Introcuction
This workflow is designed to demultiplex raw RNA-seq sequencing data from Illumina and Element Biosciences sequencers.
The workflow is built in a modular fashion, where most of the base functionality is provided by components from
[`biobox`](https://www.viash-hub.com/packages/biobox/latest) supplemented by custom base components and workflow components in this package. Each of these components can be used independently as stand-alone modules with a
standardized interface.
The full workflow can be run in two ways:
1. Run the [main
workflow](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex)
containing the main functionality.
2. Run the [(opinianated)
`runner`](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/runner)
where a number of choices (input/output structure and location) have
been made.
## Workflow Overview
The workflow executes the following steps:
1. Unpacking the input data (when a TAR archive is provided)
2. Run `bclconvert` or `bases2fastq`
3. Run `falco` and convert Illumina InterOp information to csv
4. Run `multiqc` to generate a report
## Example usage
Two variants of the same workflow are provided, depending on the flexibility in the ouput structure required:
* The `runner` workflow provides a predifined output structure. It requires the minimal amount of parameters to be provided, at the cost of being less flexible. It is located at `target/nextflow/runner/main.nf`
* The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`) allows for more fine-grained tuning, but required more parameters to be provided.
### Test data
We have provided test data at `gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2` (Illumina), but please feel free to bring your own. The URL of the test data can be provided as-is to the workflow, or you can download everything and specify a local path.
The input data should follow the structure of either Illumina or Element Biosciences sequencers. The workflow will automatically detect which demultiplexer to use (`bclconvert` or `bases2fastq`) based on the
presence of either `SampleSheet.csv` or `RunParameters.xml` in the input directory. Demultiplexer can also be set explicitly using the `--demultiplexer` parameter.
### Setup
In order to use the workflows in this package, you'll need to do the following:
* Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
* Install a nextflow compatible executor. This workflow provides a profile for [docker](https://docs.docker.com/get-started/).
### Run from Viash Hub
1. Open [Viash Hub](https://www.viash-hub.com) and browse to the [demultiplex
component](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex).
Press the Launch button and follow the instructions.
![](assets/demultiplex-launch-small.png)
2. We will start an example run and set profile to `docker`.
![](assets/demultiplex-launch-parameters-1.png)
3. In the next step, we provide the paramters as follows and leave the rest as defalut:
- `input`:
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
![](assets/demultiplex-launch-parameters-2.png)
Press the Launch button at the end to get the instructions on how to
run the workflow from the CLI.
### Run using NF-Tower / Seqera Cloud
Its possible to run the workflow directly from [Seqera
Cloud](https://cloud.seqera.io). The necessary [Nextflow schema
file](https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/)
has been built and provided with the workflows in order to use the
form-based input.
1. Select the option to run the workflow using Seqera Cloud. You
will need to create an API token for your account. Once this token is
filled in in the corresponding field, we will get the option to select
a Workspace and a Compute environment.
![](assets/demultiplex-launch-parameters-3.png)
2. Provide the parameters similar to the previous step.
3. In the next screen, pressing the Launch button will actually start the
workflow on Seqera Cloud. A message is shown when the submit was
successful.
![](assets/demultiplex-launch-parameters-4.png)
### Setting up SCM
In order to let nextflow use the viash-hub workflows, you need to setup a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration) file. This can be done once by creating `$HOME/.nextflow/scm` and adding the following:
```
providers {
vsh {
platform = 'gitlab'
server = "packages.viash-hub.com"
}
}
```
Alternatively, a custom location for the SCM file can be specified using the `NXF_SCM_FILE` environment variable.
You can check if everything is working by getting the `--help` for a workflow:
```bash
nextflow run \
vsh/demultiplex \
-r v0.3.11 \
--help
```
### Run from the CLI
Running from the CLI directly without using Viash hub is possible as well. The
easiest is to use the integrated help functionality, for instance
using the following:
``` bash
nextflow run vsh/demultiplex \
-revision v0.3.11 \
-main-script target/nextflow/workflows/runner/main.nf \
--help
```
Having this project available locally, you can run the following command:
```bash
nextflow run vsh/demultiplex \
-r v0.3.11 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
--demultiplexer bclconvert \
--skip_copycomplete_check \
--publish_dir example_output/ \
-profile docker \
-c src/config/labels.config
```
### (Optional) Resource usage tuning
Nextflow's labels can be used to specify the amount of resources a process can use. This workflow uses the following labels for CPU and memory:
* `verylowmem`, `lowmem`, `midmem`, `highmem`
* `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
The defaults for these labels can be found at `src/config/labels.config`. Nextflow checks that the specified resources for a process do not exceed what is available on the machine and will not start if it does. Create your own config file to tune the labels to your needs, for example:
```
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }
withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }
```
When starting nextflow using the CLI, you can use `-c` to provide the file to nextflow and overwrite the defaults.
## Acknowledgements
Developed in collaboration with Data Intuitive and Open Analytics.

21
_viash.yaml Normal file
View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

3
main.nf Normal file
View File

@@ -0,0 +1,3 @@
workflow {
print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
}

12
nextflow.config Normal file
View File

@@ -0,0 +1,12 @@
manifest {
homePage = 'https://github.com/viash-hub/demultiplex'
description = 'Demultiplexing pipeline for sequencing data'
mainScript = 'target/nextflow/demultiplex/main.nf'
}
process {
withName: publishStatesProc {
publishDir = [ enabled: false ]
}
}

98
src/config/labels.config Normal file
View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,48 @@
name: combine_samples
namespace: dataflow
description: Combine fastq files from across samples into one event with a list of fastq files per orientation.
argument_groups:
- name: Input arguments
arguments:
- name: "--id"
description: "ID of the new event"
type: string
required: true
- name: --forward_input
type: file
required: true
multiple: true
- name: --reverse_input
type: file
required: false
multiple: true
- name: "--sample_qc_dir"
type: file
required: true
- name: Output arguments
arguments:
- name: --output_forward
type: file
direction: output
multiple: true
required: true
- name: --output_reverse
type: file
direction: output
multiple: true
required: false
- name: "--output_sample_qc"
type: file
direction: output
required: true
multiple: true
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,30 @@
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
| map { id, state ->
def newEvent = [state.id, state + ["_meta": ["join_id": id]]]
newEvent
}
| groupTuple(by: 0, sort: "hash")
| map {run_id, states ->
// Gather the following state for all samples
def forward_fastqs = states.collect{it.forward_input}.flatten()
def reverse_fastqs = states.collect{it.reverse_input}.findAll{it != null}.flatten()
def sample_qc_dirs = states.collect{it.sample_qc_dir}
def resultState = [
"output_forward": forward_fastqs,
"output_reverse": reverse_fastqs,
"output_sample_qc": sample_qc_dirs,
// The join ID is the same across all samples from the same run
"_meta": ["join_id": states[0]._meta.join_id]
]
return [run_id, resultState]
}
emit:
output_ch
}

View File

@@ -0,0 +1,49 @@
name: gather_fastqs_and_validate
namespace: dataflow
description: |
From a directory containing fastq files, gather the files per sample
and validate according to the contents of the sample sheet.
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Directory containing .fastq files
type: file
required: true
- name: --sample_sheet
description: Sample sheet
type: file
required: true
- name: Output arguments
arguments:
- name: --fastq_forward
type: file
direction: output
required: true
multiple: true
- name: "--fastq_reverse"
type: file
direction: output
required: false
multiple: true
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
test_resources:
- type: nextflow_script
path: test.nf
entrypoint: test_gather_and_validate
- type: nextflow_script
path: test.nf
entrypoint: test_undetermined_empty
- type: nextflow_script
path: test.nf
entrypoint: test_without_index
- path: test_data
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,33 @@
#!/usr/bin/env bash
set -eo pipefail
# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)
# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"
viash ns build --setup cb -q gather_fastqs_and_validate
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_gather_and_validate \
-c src/config/labels.config \
-resume
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_undetermined_empty \
-c src/config/labels.config \
-resume
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_without_index \
-c src/config/labels.config \
-resume

View File

@@ -0,0 +1,122 @@
import java.util.zip.GZIPInputStream
import java.nio.file.Files
import java.io.BufferedInputStream
def is_empty(file_to_check){
/*
Checks if a file has content
*/
if (file_to_check.size() == 0) {
return true
}
def input_stream = Files.newInputStream(file_to_check)
def gzInputStream
try {
gzInputStream = new GZIPInputStream(new BufferedInputStream(input_stream))
} catch (java.io.EOFException ex) {
// This is not a gzipfile...
return false
}
def read_one_byte = gzInputStream.read()
return read_one_byte == -1
}
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
// Gather input files from BCL convert output folder
| flatMap { id, state ->
println "Processing sample sheet: $state.sample_sheet"
def sample_sheet = state.sample_sheet
def start_parsing = false
def sample_id_column_index = null
def undetermined_sample_name = "Undetermined"
def samples = [undetermined_sample_name]
def original_id = id
// Parse sample sheet for sample IDs
println "Processing run information file ${sample_sheet}"
csv_lines = sample_sheet.splitCsv(header: false, sep: ',')
csv_lines.any { csv_items ->
if (csv_items.isEmpty() || csv_items[0].startsWith("#")) {
// skip empty or commented line
return
}
def possible_header = csv_items[0]
def header = possible_header.find(/\[(.*)\]/){fullmatch, header_name -> header_name}
if (header) {
if (start_parsing) {
// Stop parsing when encountering the next header
println "Encountered next header '[${start_parsing}]', stopping parsing."
return true
}
// [Data], [BCLConvert_Data] for illumina
// [Samples] or sometimes [SAMPLES] for Element Biosciences
if (header.toLowerCase() in ["data", "samples", "bclconvert_data"]) {
println "Found header [${header}], start parsing."
start_parsing = true
return
}
}
if (start_parsing) {
if ( sample_id_column_index == null) {
println "Looking for sample name column."
sample_id_column_index = csv_items.findIndexValues{it == "Sample_ID" || it == "SampleName"}
assert (!sample_id_column_index.isEmpty()):
"Could not find column 'Sample_ID' (Illumina) or 'SampleName' " +
"(Element Biosciences) in run information! Found: ${sample_id_column_index}"
assert sample_id_column_index.size() == 1, "Expected run information file to contain " +
"a column 'Sample_ID' or 'SampleName', not both. Found: ${sample_id_column_index}"
sample_id_column_index = sample_id_column_index[0]
println "Found sample names column '${csv_items[sample_id_column_index]}'."
return
}
def candidate_sample_id = csv_items[sample_id_column_index]
if (candidate_sample_id?.trim()) { // Don't add empty csv entries.
samples += csv_items[sample_id_column_index]
}
}
// This return is important! (If 'true' is returned, the parsing stops.)
return
}
assert start_parsing:
"Sample information file does not contain [Data], [Samples] or [BCLConvert_Data] header!"
assert samples.size() > 1:
"Sample information file does not seem to contain any information about the samples!"
println "Finished processing run information file, found samples: ${samples}."
println "Looking for fastq files in ${state.input}."
def allfastqs = state.input.listFiles().findAll{it.isFile() && it.name ==~ /^.+\.fastq.gz$/}
println "Found ${allfastqs.size()} fastq files, matching them to the following samples: ${samples}."
processed_samples = samples.collect { sample_id ->
def forward_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R1_(\d+)\.fastq\.gz$/
def reverse_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R2_(\d+)\.fastq\.gz$/
// Sort is needed here because multiple lanes (_L00*_) might be present and they need to be in the same order in both lists
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}.sort()
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}.sort()
assert forward_fastq && !forward_fastq.isEmpty(): "No forward fastq files were found for sample ${sample_id}. " +
"All fastq files in directory: ${allfastqs.collect{it.name}}"
assert (reverse_fastq.isEmpty() || (forward_fastq.size() == reverse_fastq.size())):
"Expected equal number of forward and reverse fastq files for sample ${sample_id}. " +
"Found forward: ${forward_fastq} and reverse: ${reverse_fastq}."
println "Found ${forward_fastq.size()} forward and ${reverse_fastq.size()} reverse " +
"fastq files for sample ${sample_id}"
assert sample_id == undetermined_sample_name || (forward_fastq.every{!is_empty(it)} && reverse_fastq.every{!is_empty(it)}):
"A fastq file for sample '${sample_id}' appears to be empty!"
def fastqs_state = [
"fastq_forward": forward_fastq,
"fastq_reverse": reverse_fastq,
"_meta": [ "join_id": original_id ],
]
[sample_id, fastqs_state]
}
println "Finished processing sample sheet."
return processed_samples
}
emit:
output_ch
}

View File

@@ -0,0 +1,10 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
params {
rootDir = java.nio.file.Paths.get("$projectDir/../../../").toAbsolutePath().normalize().toString()
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

View File

@@ -0,0 +1,86 @@
nextflow.enable.dsl=2
include { gather_fastqs_and_validate } from params.rootDir + "/target/nextflow/dataflow/gather_fastqs_and_validate/main.nf"
workflow test_gather_and_validate {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 3: "Expected three fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
def second_pair = it[1][1]
assert second_pair.fastq_forward.collect{it.name} == ["sample1_S1_L001_R1_001.fastq.gz", "sample1_S1_L002_R1_001.fastq.gz"]
assert second_pair.fastq_reverse.collect{it.name} == ["sample1_S1_L001_R2_001.fastq.gz", "sample1_S1_L002_R2_001.fastq.gz"]
def undetermined_pair = it[2][1]
assert undetermined_pair.fastq_forward.collect{it.name} == ["sample2_S1_L001_R1_001.fastq.gz"]
assert undetermined_pair.fastq_reverse.collect{it.name} == ["sample2_S1_L001_R2_001.fastq.gz"]
}
}
workflow test_undetermined_empty {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs_undetermined_empty",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 3: "Expected three fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
def second_pair = it[1][1]
assert second_pair.fastq_forward.collect{it.name} == ["sample1_S1_L001_R1_001.fastq.gz", "sample1_S1_L002_R1_001.fastq.gz"]
assert second_pair.fastq_reverse.collect{it.name} == ["sample1_S1_L001_R2_001.fastq.gz", "sample1_S1_L002_R2_001.fastq.gz"]
def undetermined_pair = it[2][1]
assert undetermined_pair.fastq_forward.collect{it.name} == ["sample2_S1_L001_R1_001.fastq.gz"]
assert undetermined_pair.fastq_reverse.collect{it.name} == ["sample2_S1_L001_R2_001.fastq.gz"]
}
}
workflow test_without_index {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs_undetermined_empty",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet_no_index.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 2: "Expected two fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
}
}

View File

@@ -0,0 +1,11 @@
[foo],,,,
[somecontent],,,,
bar,lorem,,,
,,,,
# Comment
[BCLConvert_Data],,,,
Sample_ID,Index,Index2,,
sample1,GTAGCCCTGT,GAGCATCTAT,,
sample2,TCGGCTCTAC,CCGATGGTCT,,
,,,,
1 [foo],,,,
2 [somecontent],,,,
3 bar,lorem,,,
4 ,,,,
5 # Comment
6 [BCLConvert_Data],,,,
7 Sample_ID,Index,Index2,,
8 sample1,GTAGCCCTGT,GAGCATCTAT,,
9 sample2,TCGGCTCTAC,CCGATGGTCT,,
10 ,,,,

View File

@@ -0,0 +1,10 @@
[foo],,,,
[somecontent],,,,
bar,lorem,,,
,,,,
# Comment
[BCLConvert_Data],,,,
Sample_ID,Index,Index2,,
sample1,,,,
,,,,
1 [foo],,,,
2 [somecontent],,,,
3 bar,lorem,,,
4 ,,,,
5 # Comment
6 [BCLConvert_Data],,,,
7 Sample_ID,Index,Index2,,
8 sample1,,,,
9 ,,,,

View File

@@ -0,0 +1,115 @@
name: demultiplex
description: Demultiplexing of raw sequencing data
argument_groups:
- name: Input arguments
arguments:
- name: --id
description: Unique identifier for the run
type: string
- name: --input
description: Directory containing raw sequencing data
type: file
required: true
- name: --run_information
description: |
CSV file containing sample information, which will be used as
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)
or 'RunManifest.csv' (Element Biosciences). If not specified,
will try to autodetect the sample sheet in the input directory.
Requires --demultiplexer to be set.
type: file
required: false
- name: "--demultiplexer"
type: string
required: false
choices: ["bases2fastq", "bclconvert"]
description: |
Demultiplexer to use, choice depends on the provider
of the instrument that was used to generate the data.
When not using --sample_sheet, specifying this argument is not
required.
- name: Output arguments
arguments:
- name: --output
description: Directory to write fastq data to
type: file
direction: output
required: false
default: "$id/fastq"
- name: "--output_sample_qc"
description: Directory to write FastQC output to
type: file
direction: output
required: false
multiple: true
default: "$id/qc/fastqc"
- name: "--multiqc_output"
description: Location where to write MultiQC output to
type: file
direction: output
required: false
default: "$id/qc/multiqc_report.html"
- name: "--output_run_information"
type: file
direction: "output"
required: true
default: "$id/run_information.csv"
- name: "--demultiplexer_logs"
type: file
direction: output
required: true
default: "$id/demultiplexer_logs"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
test_resources:
- type: nextflow_script
path: test.nf
entrypoint: test_illumina
- type: nextflow_script
path: test.nf
entrypoint: test_bases2fastq
- type: nextflow_script
path: test.nf
entrypoint: test_no_index
dependencies:
- name: io/untar
repository: local
- name: dataflow/gather_fastqs_and_validate
repository: local
- name: io/interop_summary_to_csv
repository: local
- name: dataflow/combine_samples
repository: local
- name: bcl_convert
repository: bb
- name: bases2fastq
repository: bb
- name: fastqc
repository: bb
- name: multiqc
repository: bb
- name: detect_demultiplexer
repository: local
repositories:
- name: bb
type: vsh
repo: biobox
tag: v0.4.0
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,31 @@
#!/usr/bin/env bash
# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)
# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"
viash ns build --setup cb -q demultiplex
nextflow run . \
-main-script src/demultiplex/test.nf \
-profile docker,no_publish,local \
-entry test_illumina \
-c src/config/labels.config \
--resources_test https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/NovaSeq6000/ \
-resume
nextflow run . \
-main-script src/demultiplex/test.nf \
-profile docker,no_publish,local \
-entry test_bases2fastq \
-c src/config/labels.config \
-resume
nextflow run . \
-main-script src/demultiplex/test.nf \
-profile docker,no_publish,local \
-entry test_no_index \
-c src/config/labels.config \
-resume

184
src/demultiplex/main.nf Normal file
View File

@@ -0,0 +1,184 @@
workflow run_wf {
take:
input_ch
main:
samples_ch = input_ch
// untar input if needed
| untar.run(
directives: [label: ["lowmem", "lowcpu"]],
runIf: {id, state ->
def inputStr = state.input.toString()
inputStr.endsWith(".tar.gz") || \
inputStr.endsWith(".tar") || \
inputStr.endsWith(".tgz") ? true : false
},
fromState: [
"input": "input",
],
toState: { id, result, state ->
state + ["input": result.output]
},
)
// detect demultiplexer
| detect_demultiplexer.run(
fromState: [
"input": "input",
"run_information": "run_information",
"demultiplexer": "demultiplexer",
],
toState: { id, result, state ->
state + [
"demultiplexer": result.demultiplexer_output,
"run_information": result.run_information_output
]
}
)
| interop_summary_to_csv.run(
runIf: {id, state -> state.demultiplexer in ["bclconvert"]},
directives: [label: ["lowmem", "verylowcpu"]],
fromState: [
"input": "input",
],
toState: [
"interop_run_summary": "output_run_summary",
"interop_index_summary": "output_index_summary",
]
)
// run bcl_convert
| bcl_convert.run(
runIf: {id, state -> state.demultiplexer in ["bclconvert"]},
directives: [label: ["highmem", "midcpu"]],
fromState: { id, state ->
[
bcl_input_directory: state.input,
sample_sheet: state.run_information,
output_directory: state.output,
reports: state.demultiplexer_logs,
logs: state.demultiplexer_logs,
]
},
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.reports,
]
def newState = state + toAdd
return newState
}
)
// run bases2fastq
| bases2fastq.run(
runIf: {id, state -> state.demultiplexer in ["bases2fastq"]},
directives: [label: ["highmem", "midcpu"]],
fromState: { id, state ->
[
"analysis_directory": state.input,
"run_manifest": state.run_information,
"output_directory": state.output,
"report": state.demultiplexer_logs + "/report.html",
"logs": state.demultiplexer_logs,
]
},
args: [
"no_projects": true, // Do not put output files in a subfolder for project
//"split_lanes": true,
"legacy_fastq": true, // Illumina style output names
"group_fastq": true, // No subdir per sample
],
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.logs,
]
def newState = state + toAdd
return newState
}
)
| gather_fastqs_and_validate.run(
fromState: [
"input": "output_demultiplexer",
"sample_sheet": "run_information",
],
toState: [
"fastq_forward": "fastq_forward",
"fastq_reverse": "fastq_reverse",
],
)
output_ch = samples_ch
| fastqc.run(
directives: [label: ["verylowcpu", "lowmem"]],
fromState: {id, state ->
def output_base = "$id/qc/fastqc/*"
[
"input": [state.fastq_forward, state.fastq_reverse],
"html": "${output_base}_fastqc_report.html",
"summary": "${output_base}_summary.txt",
"data": "${output_base}_fastqc_data.txt",
]
},
toState: { id, result, state ->
// The output directory for all files above is the same:
// take the directory from one of the files
state + [ "output_sample_qc": result.html[0].parent ]
}
)
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
"sample_qc_dir": state.output_sample_qc,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
"output_sample_qc": "output_sample_qc",
]
)
| multiqc.run(
directives: [label: ["midcpu", "midmem"]],
fromState: {id, state ->
def new_state = [
"input": state.output_sample_qc,
"output_report": state.multiqc_output,
"cl_config": 'sp: {fastqc/data: {fn: "*_fastqc_data.txt"}}'
]
if (state.demultiplexer == "bclconvert") {
new_state["input"] += [
state.interop_run_summary.getParent(),
state.interop_index_summary.getParent()
]
}
return new_state
},
toState: { id, result, state ->
state + [ "multiqc_output" : result.output_report ]
}
)
| setState(
[
//"_meta": "_meta",
"output": "output_demultiplexer",
"output_sample_qc": "output_sample_qc",
"multiqc_output": "multiqc_output",
"output_run_information": "run_information",
"demultiplexer_logs": "demultiplexer_logs"
]
)
emit:
output_ch
}

View File

@@ -0,0 +1,10 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
params {
rootDir = java.nio.file.Paths.get("$projectDir/../../").toAbsolutePath().normalize().toString()
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

201
src/demultiplex/test.nf Normal file
View File

@@ -0,0 +1,201 @@
nextflow.enable.dsl=2
include { demultiplex } from params.rootDir + "/target/nextflow/demultiplex/main.nf"
params.resources_test = params.rootDir + "/testData/"
workflow test_illumina {
output_ch = Channel.fromList([
[
// sample_sheet: resources_test.resolve("bcl_convert_samplesheet.csv"),
// input: resources_test.resolve("iseq-DI/"),
//sample_sheet: "https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/NovaSeq6000/SampleSheet.csv",
input: params.resources_test + "200624_A00834_0183_BHMTFYDRXX.tar.gz",
publish_dir: "output_dir/",
]
])
| map { state -> [ "run", state ] }
| demultiplex.run(
toState: { id, output, state ->
output + [ orig_input: state.input ] }
)
| view { output ->
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
event_count_ch = output_ch
| toSortedList()
| map { state ->
assert state.size() == 1 : "Expected one event in the output channel"
}
assert_ch = output_ch
| map {id, state ->
assert state.output.isDirectory(): "Expected bclconvert output to be a directory"
state.output_sample_qc.each{
assert it.isDirectory(): "Expected sample QC output to be a directory"
}
assert state.multiqc_output.isFile(): "Expected multiQC output to be a file"
fastq_files = state.output.listFiles().collect{it.name}
assert ["Undetermined_S0_L001_R1_001.fastq.gz", "Sample23_S3_L001_R1_001.fastq.gz",
"sampletest_S4_L001_R1_001.fastq.gz", "Sample1_S1_L001_R1_001.fastq.gz",
"SampleA_S2_L001_R1_001.fastq.gz"].toSet() == fastq_files.toSet(): \
"Output directory should contain the expected FASTQ files"
fastq_files.each{
assert it.length() != 0: "Expected FASTQ file to not be empty"
}
assert state.output_run_information.isFile(): "Expected output run information to be a file"
expected_run_information = """[Header]
|Date,6/24/2020
|Application,Illumina DRAGEN COVIDSeq Test Pipeline
|Instrument Type,NovaSeq6000
|Assay,Illumina COVIDSeq Test
|Index Adapters,IDT-ILMN DNA-RNA UDP Indexes
|Chemistry,Amplicon
|[Settings]
|AdapterRead1,CTGTCTCTTATACACATCT
|[Data]
|Lane,Sample_ID,Sample_Type,Index_ID,Index,Index2
|1,Sample1,PatientSample,UDP0001,GAACTGAGCG,TCGTGGAGCG
|1,SampleA,PatientSample,UDP0002,AGGTCAGATA,CTACAAGATA
|1,Sample23,PatientSample,UDP0003,CGTCTCATAT,TATAGTAGCT
|1,sampletest,PatientSample,UDP0004,ATTCCATAAG,TGCCTGGTGG
|""".stripMargin()
assert state.output_run_information.text.replaceAll("\r\n", "\n") == expected_run_information
println "ID: ${id}"
println "State: ${state}"
assert state.demultiplexer_logs.isDirectory():
"Expected BCL Convert reports to be a directory"
def logs_files = state.demultiplexer_logs.listFiles()
println "Logs files: ${logs_files}"
assert logs_files.size() > 0: "Expected BCL Convert logs dir to contain files"
assert logs_files.find { it.name == "Demultiplex_Stats.csv" }:
"Expected to find BCL Convert Demultiplex_Stats.csv"
assert logs_files.find { it.name == "Logs" }:
"Expected to find BCL Convert Logs directory"
}
}
workflow test_bases2fastq {
output_ch = Channel.fromList([
[
input: "http://element-public-data.s3.amazonaws.com/bases2fastq-share/bases2fastq-v2/20230404-bases2fastq-sim-151-151-9-9.tar.gz",
publish_dir: "output_dir/",
]
])
| map { state -> [ "run", state ] }
| demultiplex.run(
toState: { id, output, state ->
output + [ orig_input: state.input ] }
)
| view { output ->
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
| map {id, state ->
assert state.output.isDirectory(): "Expected bases2fastq output to be a directory"
state.output_sample_qc.each{assert it.isDirectory(): "Expected sample QC output to be a directory"}
assert state.multiqc_output.isFile(): "Expected multiQC output to be a file"
def logs_files = state.demultiplexer_logs.listFiles()
println "Logs files: ${logs_files}"
assert logs_files.size() > 0: "Expected bases2fastq logs dir to contain files"
assert logs_files.find { it.name == "report.html" } != null:
"Expected to find bases2fastq report.html"
assert logs_files.find { it.name == "info" }:
"Expected to find bases2fastq info directory"
}
}
workflow test_no_index {
// Test what happens when no index is specified. All the reads go into one sample
// and the "Undetermined" should be empty
output_ch = Channel.fromList([
[
input: params.resources_test + "demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2",
demultiplexer: "bclconvert",
run_information: params.resources_test + "demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2/SampleSheetNoIndex.csv"
]
])
| map { state -> [ "run", state ] }
| demultiplex.run(
toState: { id, output, state ->
output + [ orig_input: state.input ] }
)
| view { output ->
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
event_count_ch = output_ch
| toSortedList()
| map { state ->
assert state.size() == 1 : "Expected one event in the output channel"
}
assert_ch = output_ch
| map {id, state ->
assert state.output.isDirectory(): "Expected bclconvert output to be a directory"
state.output_sample_qc.each{
assert it.isDirectory(): "Expected sample QC output to be a directory"
}
assert state.multiqc_output.isFile(): "Expected multiQC output to be a file"
fastq_files = state.output.listFiles().collect{it.name}
assert ["Undetermined_S0_R2_001.fastq.gz", "Undetermined_S0_R1_001.fastq.gz",
"SingleCell-RNA-P3-2-SI-TT-A5_S1_R1_001.fastq.gz", "SingleCell-RNA-P3-2-SI-TT-A5_S1_R2_001.fastq.gz"
].toSet() == fastq_files.toSet(): \
"Output directory should contain the expected FASTQ files"
fastq_files.each{
assert it.length() != 0: "Expected FASTQ file to not be empty"
}
assert state.output_run_information.isFile(): "Expected output run information to be a file"
expected_run_information = """[Header],,,,
|FileFormatVersion,2,,,
|RunName,SingleCell-RNA_P3_2,,,
|InstrumentPlatform,NextSeq1k2k,,,
|IndexOrientation,Forward,,,
|,,,,
|[Reads],,,,
|Read1Cycles,28,,,
|Read2Cycles,90,,,
|Index1Cycles,10,,,
|Index2Cycles,10,,,
|,,,,
|[BCLConvert_Settings],,,,
|SoftwareVersion,4.2.7,,,
|TrimUMI,0,,,
|OverrideCycles,U28;N10;N10;Y90,,,
|FastqCompressionFormat,gzip,,,
|NoLaneSplitting,TRUE,,,
|,,,,
|[BCLConvert_Data],,,,
|Sample_ID,Index,Index2,,
|SingleCell-RNA-P3-2-SI-TT-A5,,,,
|,,,,""".stripMargin()
assert state.output_run_information.text.replaceAll("\r\n", "\n") == expected_run_information
println "ID: ${id}"
println "State: ${state}"
assert state.demultiplexer_logs.isDirectory():
"Expected BCL Convert reports to be a directory"
def logs_files = state.demultiplexer_logs.listFiles()
println "Logs files: ${logs_files}"
assert logs_files.size() > 0: "Expected BCL Convert logs dir to contain files"
assert logs_files.find { it.name == "Demultiplex_Stats.csv" }:
"Expected to find BCL Convert Demultiplex_Stats.csv"
assert logs_files.find { it.name == "Logs" }:
"Expected to find BCL Convert Logs directory"
}
}

View File

@@ -0,0 +1,57 @@
name: detect_demultiplexer
description: |
Detects the demultiplexer and accompanying sample information file which can be
used to generate the fastq files.
arguments:
- name: --id
description: Unique identifier for the run
type: string
- name: --input
description: Directory containing raw sequencing data
type: file
required: true
- name: --run_information
description: |
CSV file containing sample information, which will be used as
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)
or 'RunManifest.csv' (Element Biosciences). If not specified,
will try to autodetect the sample sheet in the input directory.
Requires --demultiplexer to be set.
type: file
required: false
- name: "--demultiplexer"
type: string
required: false
choices: ["bases2fastq", "bclconvert"]
description: |
Demultiplexer to use, choice depends on the provider
of the instrument that was used to generate the data.
When not using --sample_sheet, specifying this argument is not
required.
- name: --demultiplexer_output
description: |
Demultiplexer program. The demultiplexer is either provided (with --demultiplexer),
or inferred from the contents of the input data.
type: string
direction: output
required: false
- name: --run_information_output
description: |
Sample information that can be used to demultiplex the input data.
An appropriate file was either provided (with --run_information), or
inferred from the contents of the input data.
type: file
direction: output
required: false
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
runners:
- type: nextflow
engines:
- type: native

View File

@@ -0,0 +1,96 @@
workflow run_wf {
take:
input_ch // Channel with [id, state] pairs
main:
output_ch = input_ch
// Gather input files from folder
| map {id, state ->
def newState = [:]
println("Provided run information: ${state.run_information} and demultiplexer: ${state.demultiplexer}")
// No auto-detection of run information file (it is user provided),
// in this case the demultiplexer should also be specified.
assert (!state.run_information || state.demultiplexer): "When setting --run_information, " +
"you must also provide a demultiplexer"
if (!state.run_information) {
println("Run information was not specified, auto-detecting...")
// The supported_platforms hashmap must be a 1-on-1 mapping
// Also, it's keys must be present in the 'choices' field
// for the 'run_information' argument in the viash config.
def supported_platforms = [
"bclconvert": "SampleSheet.csv", // Illumina
"bases2fastq": "RunManifest.csv" // Element Biosciences
]
def found_sample_information = supported_platforms.collectEntries{demultiplexer, filename ->
println("Checking if ${filename} can be found in input folder ${state.input}.")
def resolved_filename = state.input.resolve(filename)
if (!resolved_filename.isFile()) {
resolved_filename = null
}
println("Result after looking for run information for ${demultiplexer}: ${resolved_filename}.")
[demultiplexer, resolved_filename]
}
def demultiplexer = null
def run_information = null
found_sample_information.each{demultiplexer_candidate, file_path ->
if (file_path) {
// At this point, a candicate run information file was found.
assert !run_information: "Autodetection of run information " +
"(SampleSheet, RunManifest) failed: " +
"multiple candidate files found in input folder. " +
"Please specify one using --run_information."
run_information = file_path
demultiplexer = demultiplexer_candidate
}
}
// When autodetecting, the run information should have been found
assert run_information: "No run information file (SampleSheet, RunManifest) " +
"found in input directory."
// When autodetecting, the demultiplexer must be set if the run information was found
assert demultiplexer: "State error: the demultiplexer should have been autodetected. " +
"Please report this as a bug."
// When autodetecting, the found demultiplexer must match
// with the demultiplexer that the user has provided (in case it was provided).
if (state.demultiplexer) {
assert state.demultiplexer == demultiplexer,
"Requested to use demultiplexer ${state.demultiplexer} " +
"but demultiplexer based on the autodetected run information "
"file ${run_information} seems to indicate that the demultiplexer "
"should be ${demultiplexer}. Either avoid specifying the demultiplexer "
"or override the autodetection of the run information by providing "
"the file."
}
println("Using run information ${run_information} and demultiplexer ${demultiplexer}")
// At this point, the autodetected state can override the user provided state.
newState = newState + [
"run_information": run_information,
"demultiplexer": demultiplexer,
]
} // end auto-detection logic
if (newState.demultiplexer in ["bclconvert"]) {
// Do not add InterOp to state because we generate the summary csv's in the next
// step based on the run dir, not the InterOp dir.
def interop_dir = state.input.resolve("InterOp")
assert interop_dir.isDirectory(): "Expected InterOp directory to be present."
def copycomplete_file = state.input.resolve("CopyComplete.txt")
assert (copycomplete_file.isFile() || state.skip_copycomplete_check):
"'CopyComplete.txt' file was not found!"
}
def resultState = state + newState
[id, resultState]
}
| setState(["demultiplexer_output": "demultiplexer",
"run_information_output": "run_information"])
emit:
output_ch
}

View File

@@ -0,0 +1,45 @@
name: interop_summary_to_csv
namespace: io
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Sequencing run folder (*not* InterOp folder).
type: file
required: true
- name: Output arguments
arguments:
- name: --output_run_summary
type: file
direction: output
required: true
- name: --output_index_summary
type: file
direction: output
required: true
requirements:
commands: ["summary", "index-summary"]
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- path: /testData/iseq-DI
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
- wget
- type: docker
run: |
wget https://github.com/Illumina/interop/releases/download/v1.3.1/interop-1.3.1-Linux-GNU.tar.gz -O /tmp/interop.tar.gz && \
tar -C /tmp/ --no-same-owner --no-same-permissions -xvf /tmp/interop.tar.gz && \
mv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/
runners:
- type: executable
- type: nextflow

View File

@@ -0,0 +1,10 @@
#!/usr/bin/env bash
set -eo pipefail
if [ ! -d "$par_input" ]; then
echo "Input directory does not exist or is not a directory"
exit 1
fi
$(which summary) --csv=1 "$par_input" 1> "$par_output_run_summary"
$(which index-summary) --csv=1 "$par_input" 1> "$par_output_index_summary"

View File

@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
echo ">>> Run simple execution"
./$meta_functionality_name \
--input "$meta_resources_dir/iseq-DI" \
--output_run_summary "$TMPDIR/run_summary.csv" \
--output_index_summary "$TMPDIR/index_summary.csv"

34
src/io/publish/code.sh Executable file
View File

@@ -0,0 +1,34 @@
#!/bin/bash
set -eo pipefail
declare -A input_output_mapping=(["par_input"]="par_output"
["par_input_multiqc"]="par_output_multiqc"
["par_input_run_information"]="par_output_run_information"
["par_input_demultiplexer_logs"]="par_output_demultiplexer_logs"
)
for input_argument_name in "${!input_output_mapping[@]}"
do
input_location="${!input_argument_name}"
output_argument_name="${input_output_mapping[$input_argument_name]}"
output_location="${!output_argument_name}"
echo "Publishing $input_location -> $output_location"
echo "Creating directory if it does not exist."
mkdir -p $(dirname "$output_location") && echo "Containing directory $output_location created"
echo "Copying files..."
cp -rL "$input_location" "$output_location"
echo "Output files for $output_location:"
ls "$output_location"
done
echo "Grouping output from $par_input_sample_qc into $par_output_sample_qc"
mkdir -p "$par_output_sample_qc"
IFS=";" read -ra sample_qc_inputs <<< $par_input_sample_qc
for qc_dir in "${sample_qc_inputs[@]}"; do
echo "Copying contents of $qc_dir"
find -H -D exec "$qc_dir" -type f -maxdepth 1 -exec cp -t "$par_output_sample_qc" {} +
done

View File

@@ -0,0 +1,64 @@
name: "publish"
namespace: "io"
description: "Publish the processed results of the run"
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Directory to write fastq data to
type: file
required: true
- name: "--input_sample_qc"
description: Directory to write sample QC output to
type: file
required: true
multiple: true
- name: "--input_multiqc"
description: Location where to write the MultiQC report to.
type: file
required: true
- name: "--input_run_information"
description: "Location where to write the run information to."
type: file
required: true
- name: "--input_demultiplexer_logs"
type: file
required: true
- name: Output arguments
arguments:
- name: --output
type: file
direction: output
default: "fastq"
- name: --output_sample_qc
type: file
direction: output
default: "qc/fastqc"
- name: --output_multiqc
type: file
direction: output
default: "qc/multiqc_report.html"
- name: --output_run_information
type: file
direction: output
default: run_information.csv
- name: "--output_demultiplexer_logs"
type: file
direction: output
default: "demultiplexer_logs"
resources:
- type: bash_script
path: ./code.sh
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
runners:
- type: executable
- type: nextflow

View File

@@ -0,0 +1,44 @@
name: untar
namespace: io
description: |
Unpack a .tar file. When the contents of the .tar file is just a single directory,
put the contents of the directory into the output folder instead of that directory.
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Tarball file to be unpacked.
type: file
required: true
- name: Output arguments
arguments:
- name: --output
description: Directory to write the contents of the .tar file to.
type: file
direction: output
required: true
- name: "Other arguments"
arguments:
- name: "--exclude"
alternatives: ["-e"]
type: string
description: Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted.
example: "docs/figures"
required: false
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
runners:
- type: executable
- type: nextflow

41
src/io/untar/script.sh Normal file
View File

@@ -0,0 +1,41 @@
#!/usr/bin/env bash
set -eo pipefail
extra_args=()
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
# Check if tarball contains 1 top-level directory. If so, extract the contents of the
# directory to the output directory instead of the directory itself.
echo "Directory contents:"
tar -taf "${par_input}" > "$TMPDIR/tar_contents.txt"
cat "$TMPDIR/tar_contents.txt"
printf "Checking if tarball contains only a single top-level directory: "
if [[ $(grep -o -E '^[./]*[^/]+/$' "$TMPDIR/tar_contents.txt" | uniq | wc -l) -eq 1 ]]; then
echo "It does."
echo "Extracting the contents of the top-level directory to the output directory instead of the directory itself."
# The directory can be both of the format './<directory>' (or ././<directory>) or just <directory>
# Adjust the number of stripped components accordingly by looking for './' at the beginning of the file.
starting_relative=$(grep -oP -m 1 '^(./)*' "$TMPDIR/tar_contents.txt" | tr -d '\n' | wc -c)
n_strips=$(( ($starting_relative / 2)+1 ))
extra_args+=("--strip-components=$n_strips")
else
echo "It does not."
fi
if [ "$par_exclude" != "" ]; then
echo "Exclusion of files with wildcard '$par_exclude' requested."
extra_args+=("--exclude=$par_exclude")
fi
echo "Starting extraction of tarball '$par_input' to output directory '$par_output'."
mkdir -p "$par_output"
echo "executing 'tar --no-same-owner --no-same-permissions --directory=$par_output ${extra_args[@]} -xavf $par_input'"
tar --no-same-owner --no-same-permissions --directory="$par_output" ${extra_args[@]} -xavf "$par_input"

126
src/io/untar/test.sh Normal file
View File

@@ -0,0 +1,126 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
INPUT_FILE="$TMPDIR/test_file.txt"
echo ">>> Creating test input file at '$TMPDIR/test_file.txt'."
echo "foo" > "$INPUT_FILE"
echo ">>> Created '$INPUT_FILE'."
echo ">>> Creating tar.gz from '$INPUT_FILE'."
TARFILE="${INPUT_FILE}.tar.gz"
tar -C "$TMPDIR" -czvf ${INPUT_FILE}.tar.gz $(basename "$INPUT_FILE")
[[ ! -f "$TARFILE" ]] && echo ">>> Test setup failed: could not create tarfile." && exit 1
echo ">>> '$TARFILE' created."
echo ">>> Check whether tar.gz can be extracted"
echo ">>> Creating temporary output directory for test 1."
OUTPUT_DIR_1="$TMPDIR/output_test_1/"
mkdir "$OUTPUT_DIR_1"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_1'".
./$meta_functionality_name \
--input "$TARFILE" \
--output "$OUTPUT_DIR_1"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_1/test_file.txt" ]] && echo "Output file could not be found. Output directory contents: " && ls "$OUTPUT_DIR_1" && exit 1
echo ">>> Creating temporary output directory for test 2."
OUTPUT_DIR_2="$TMPDIR/output_test_2/"
mkdir "$OUTPUT_DIR_2"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_2', excluding '$test_file.txt'".
./$meta_functionality_name \
--input "$TARFILE" \
--output "$OUTPUT_DIR_2" \
--exclude 'test_file.txt'
echo ">>> Check whether excluded file was not extracted"
[[ -f "$OUTPUT_DIR_2/test_file.txt" ]] && echo "File should have been excluded! Output directory contents:" && ls "$OUTPUT_DIR_2" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory."
mkdir "$TMPDIR/input_test_3/"
cp "$INPUT_FILE" "$TMPDIR/input_test_3/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_3.tar.gz" $(basename "$TMPDIR/input_test_3")
TARFILE_3="$TMPDIR/input_test_3.tar.gz"
echo ">>> Creating temporary output directory for test 3."
OUTPUT_DIR_3="$TMPDIR/output_test_3/"
mkdir "$OUTPUT_DIR_3"
echo "Extracting '$TARFILE_3' to '$OUTPUT_DIR_3'".
./$meta_functionality_name \
--input "$TARFILE_3" \
--output "$OUTPUT_DIR_3"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_3/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Check for tar archive that contains a single directory starting with './'."
mkdir "$TMPDIR/input_test_4/"
cp "$INPUT_FILE" "$TMPDIR/input_test_4/"
pushd "$TMPDIR/"
trap popd ERR
tar -czvf "$TMPDIR/input_test_4.tar.gz" ./input_test_4
popd
trap - ERR
OUTPUT_DIR_4="$TMPDIR/output_test_4/"
echo "Extracting '$TMPDIR/input_test_4.tar.gz' to '$OUTPUT_DIR_4'".
./$meta_functionality_name \
--input "$TMPDIR/input_test_4.tar.gz" \
--output "$OUTPUT_DIR_4"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_4/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory, but it is nested."
mkdir -p "$TMPDIR/input_test_5/nested/"
cp "$INPUT_FILE" "$TMPDIR/input_test_5/nested/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_5.tar.gz" $(basename "$TMPDIR/input_test_5")
TARFILE_5="$TMPDIR/input_test_5.tar.gz"
echo ">>> Creating temporary output directory for test 5."
OUTPUT_DIR_5="$TMPDIR/output_test_5/"
mkdir "$OUTPUT_DIR_5"
echo "Extracting '$TARFILE_5' to '$OUTPUT_DIR_5'".
./$meta_functionality_name \
--input "$TARFILE_5" \
--output "$OUTPUT_DIR_5"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_5/nested/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing two top-level directories."
mkdir -p "$TMPDIR/input_test_6/number_1/"
mkdir "$TMPDIR/input_test_6/number_2/"
cp "$INPUT_FILE" "$TMPDIR/input_test_6/number_1/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_6.tar.gz" $(basename "$TMPDIR/input_test_6")
TARFILE_6="$TMPDIR/input_test_6.tar.gz"
echo ">>> Creating temporary output directory for test 6."
OUTPUT_DIR_6="$TMPDIR/output_test_6/"
mkdir "$OUTPUT_DIR_6"
echo "Extracting '$TARFILE_6' to '$OUTPUT_DIR_6'".
./$meta_functionality_name \
--input "$TARFILE_6" \
--output "$OUTPUT_DIR_6"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_6/number_1/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
[[ ! -d "$OUTPUT_DIR_6/number_2" ]] && echo "Output directory could not be found!" && exit 1
echo ">>> Test finished successfully"

View File

@@ -0,0 +1,86 @@
name: runner
description: Runner for demultiplexing of raw sequencing data
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: |
Base directory of the canonical form `s3://<bucket>/<path>/<RunID>/`.
A tarball (tar.gz, .tgz, .tar) containing run information can be provided.
The <RunID> is the value passed to the `id` argument.
type: file
required: true
- name: --run_information
description: |
CSV file containing sample information, which will be used as
input for the demultiplexer. Canonically called 'SampleSheet.csv' (Illumina)
or 'RunManifest.csv' (Element Biosciences). If not specified,
will try to autodetect the sample sheet in the input directory.
Requires --demultiplexer to be set.
type: file
required: false
- name: "--demultiplexer"
type: string
required: false
choices: ["bases2fastq", "bclconvert"]
description: |
Demultiplexer to use, choice depends on the provider
of the instrument that was used to generate the data.
When not using --sample_sheet, specifying this argument is not
required.
- name: Annotation flags
arguments:
- name: --plain_output
description: |
Flag to indicate that the output should be stored directly under $publish_dir rather than
under a subdirectory structure runID/<date_time>_demultiplex_<version>/.
type: boolean_true
- name: Output arguments
arguments:
- name: --fastq_output
type: file
direction: output
default: "fastq"
- name: --sample_qc_output
type: file
direction: output
default: "qc/fastqc"
- name: --multiqc_output
type: file
direction: output
default: "qc/multiqc_report.html"
- name: "--demultiplexer_logs"
type: file
direction: output
default: "demultiplexer_logs"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
- path: disable_publish_processes.config
test_resources:
- type: nextflow_script
path: test.nf
entrypoint: test
dependencies:
- name: demultiplex
repository: local
- name: io/publish
repository: local
runners:
- type: nextflow
config:
script:
- includeConfig("disable_publish_processes.config")
engines:
- type: native

View File

@@ -0,0 +1,9 @@
process {
withName: publishFilesProc {
publishDir = [ enabled: false ]
}
withName: publishStatesProc {
publishDir = [ enabled: false ]
}
}

16
src/runner/integration_tests.sh Executable file
View File

@@ -0,0 +1,16 @@
#!/usr/bin/env bash
# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)
# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"
viash ns build --setup cb -q runner
nextflow run . \
-main-script src/runner/test.nf \
-entry test \
-profile docker,local \
-c src/config/labels.config \
-resume

177
src/runner/main.nf Normal file
View File

@@ -0,0 +1,177 @@
import java.util.concurrent.ThreadPoolExecutor
import java.util.concurrent.atomic.AtomicBoolean
def date = new Date().format('yyyyMMdd_hhmmss')
def viash_config = java.nio.file.Paths.get("${moduleDir}/_viash.yaml")
def version = get_version(viash_config)
session = nextflow.Nextflow.getSession()
final service = session.publishDirExecutorService()
// S3 paths containing double slashes might cause issues with empty objects being created
// Remove trailing slashes from the publish dir. The params map is immutable, so create a copy
def publish_dir = params.publish_dir - ~/\/+$/
workflow run_wf {
take:
input_ch
main:
output_ch = input_ch
| map { id, state ->
// The argument names for this workflow and the demultiplex workflow may overlap
// here, we store a copy in order to make sure to not accidentally overwrite the state.
def new_state = state + [
"fastq_output_workflow": state.fastq_output,
"multiqc_output_workflow": state.multiqc_output,
"sample_qc_output_workflow": state.sample_qc_output,
"demultiplexer_logs_workflow": state.demultiplexer_logs,
"run_id": id
]
return [id, new_state]
}
| demultiplex.run(
fromState: { id, state ->
def state_to_pass = [
"input": state.input,
"run_information": state.run_information,
"demultiplexer": state.demultiplexer,
"skip_copycomplete_check": state.skip_copycomplete_check,
"output": "$id/fastq",
"output_sample_qc": "$id/qc/fastqc",
"multiqc_output": "$id/qc/multiqc_report.html",
"demultiplexer_logs": "$id/demultiplexer_logs",
]
if (state.run_information) {
state_to_pass += ["output_run_information": state.run_information.getName()]
}
state_to_pass
},
toState: { id, result, state ->
// Duplicate the results under its own key, makes it easier to access later.
state + result + [ to_return: result ]
},
)
| map {id, state ->
def id1 = (state.plain_output) ? id : "${state.run_id}/${date}"
def id2 = (state.plain_output) ? id : "${id1}_demultiplex_${version}"
def prefix = (id2 == "run") ? "" : "${id2}/"
def new_state = state + ["prefix": prefix]
[id, new_state]
}
| publish.run(
fromState: { id, state ->
def prefix = state.prefix
// These output names are determined by arguments.
def fastq_output_1 = "${prefix}${state.fastq_output_workflow}"
def sample_qc_output_1 = "${prefix}${state.sample_qc_output_workflow}"
def multiqc_output_1 = "${prefix}${state.multiqc_output_workflow}"
def demultiplexer_logs_output = "${prefix}${state.demultiplexer_logs_workflow}"
// The name of the output file for the run information is determined by the input file name.
def run_information_output_1 = "${prefix}${state.output_run_information.getName()}"
println("Publishing to ${publish_dir}/${prefix}")
[
input: state.output,
input_sample_qc: state.output_sample_qc,
input_multiqc: state.multiqc_output,
input_run_information: state.output_run_information,
input_demultiplexer_logs: state.demultiplexer_logs,
output: fastq_output_1,
output_sample_qc: sample_qc_output_1,
output_multiqc: multiqc_output_1,
output_run_information: run_information_output_1,
output_demultiplexer_logs: demultiplexer_logs_output,
]
},
toState: { id, result, state -> [
"fastq_output": result.output,
"prefix": state.prefix,
"multiqc_output": result.output_multiqc,
"sample_qc_output": result.output_sample_qc,
"demultiplexer_logs": result.output_demultiplexer_logs,
]
},
directives: [
publishDir: [
path: publish_dir,
overwrite: false,
mode: "copy"
]
]
)
has_published = new AtomicBoolean(false)
interval_ch = channel.interval('10s'){ i ->
// Allow this channel to stop generating events based on a later signal
if (has_published.get()) {
return channel.STOP
}
i
}
await_ch = output_ch
// Wait for demultiplexing processes to be done
| toSortedList()
// Create periodic events in order to check for the publishing to be done
| combine(interval_ch)
| until { event ->
println("Checking if publishing has finished in service ${service}")
def running_tasks = null
if(service instanceof ThreadPoolExecutor) {
def completed_tasks = service.getCompletedTaskCount()
def task_count = service.getTaskCount()
running_tasks = completed_tasks - task_count
}
else if( System.getenv('NXF_ENABLE_VIRTUAL_THREADS') ) {
running_tasks = service.threadCount()
}
else {
error("Publishing service of class ${service.getClass()} is not supported.")
}
if (running_tasks == 0) {
println("Publishing has finished all current tasks. Continuing execution.")
return true
}
println("Workflow is publishing. Waiting...")
return false
}
| last()
| map{ event ->
// Signal to interval channel to stop generating events.
has_published.compareAndSet(false, true)
return event[0]
}
| map {id, state ->
println("Creating transfer_complete.txt file.")
def complete_file = file("${params.publish_dir}/${state.prefix}/transfer_completed.txt")
complete_file.text = "" // This will create a file when it does not exist.
[id, state]
}
| setState([
"fastq_output",
"multiqc_output",
"sample_qc_output",
"demultiplexer_logs"
])
emit:
await_ch
}
def get_version(input) {
def inputFile = file(input)
if (!inputFile.exists()) {
// When executing tests
return "unknown_version"
}
def yamlSlurper = new groovy.yaml.YamlSlurper()
def loaded_viash_config = yamlSlurper.parse(inputFile)
def version = (loaded_viash_config.version) ? loaded_viash_config.version : "unknown_version"
println("Version of demultiplex to be used: ${version}")
return version
}

View File

@@ -0,0 +1,20 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
process {
withName: publishStatesProc {
publishDir = [ enabled: false ]
}
withName: publishFilesProc {
publishDir = [ enabled: false ]
}
}
params {
rootDir = java.nio.file.Paths.get("$projectDir/../../").toAbsolutePath().normalize().toString()
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

111
src/runner/test.nf Normal file
View File

@@ -0,0 +1,111 @@
import java.nio.file.Files
import nextflow.exception.WorkflowScriptErrorException
// Create temporary directory for the publish_dir if it is not defined
if (!params.publish_dir && params.publishDir) {
params.publish_dir = params.publishDir
}
if (!params.publish_dir) {
def tempDir = Files.createTempDirectory("demultiplex_runner_integration_test")
println "Created temp directory: $tempDir"
// Register shutdown hook to delete it on JVM exit
Runtime.runtime.addShutdownHook(new Thread({
try {
// Delete directory recursively
Files.walk(tempDir)
.sorted(Comparator.reverseOrder())
.forEach { Files.delete(it) }
println "Deleted temp directory: $tempDir"
} catch (Exception e) {
println "Failed to delete temp directory: $e"
}
}))
params.publish_dir = tempDir
}
// The module inherits the parameters defined before the include statement,
// therefore any parameters set afterwards will not be used by the module.
include { runner } from params.rootDir + "/target/nextflow/runner/main.nf"
params.resources_test = params.rootDir + "/testData/"
workflow test {
output_ch = Channel.fromList([
[
id: "test",
input: params.resources_test + "200624_A00834_0183_BHMTFYDRXX.tar.gz",
]
])
| map {event -> [event.id, event] }
| runner.run(
fromState: {id, state -> state }
)
all_events_ch = output_ch
| toSortedList()
| map{states ->
assert states.size() == 1
}
output_ch
| map {id, state ->
assert id == "test"
assert state.fastq_output.isDirectory()
assert state.sample_qc_output.isDirectory()
assert state.multiqc_output.isFile()
assert state.demultiplexer_logs.isDirectory()
}
workflow.onComplete = {
try {
// Nexflow only allows exceptions generated using the 'error' function (which throws WorkflowScriptErrorException).
// So in order for the assert statement to work (or allow other errors to let the tests to fail)
// We need to wrap these in WorkflowScriptErrorException. See https://github.com/nextflow-io/nextflow/pull/4458/files
// The error message will show up in .nextflow.log
def publish_subdir = file("${params.publish_dir}/test")
assert publish_subdir.isDirectory()
def all_files = publish_subdir.listFiles()
assert all_files.size() == 1
def publish_dir = file(all_files[0])
// version can be unknown_version (local tests) or actual version configured in _viash.yaml
// with the new approach to fetching the version from _viash.yaml, this will be the branch name during CI builds
// Disabling this test temporarily and creating an issue for it
// assert publish_dir.name.endsWith("_demultiplex_unknown_version")
def published_items = publish_dir.listFiles()
assert published_items.size() == 5
assert published_items.collect{it.name}.toSet() == ["demultiplexer_logs", "fastq", "qc", "SampleSheet.csv", "transfer_completed.txt"].toSet()
def fastqc_files = publish_dir.resolve("qc/fastqc").listFiles()
assert fastqc_files.collect{it.name}.toSet() == [
"Sample1_S1_L001_R1_001_fastqc_data.txt",
"Sample1_S1_L001_R1_001_fastqc_report.html",
"Sample1_S1_L001_R1_001_summary.txt",
"Sample23_S3_L001_R1_001_fastqc_data.txt",
"Sample23_S3_L001_R1_001_fastqc_report.html",
"Sample23_S3_L001_R1_001_summary.txt",
"SampleA_S2_L001_R1_001_fastqc_data.txt",
"SampleA_S2_L001_R1_001_fastqc_report.html",
"SampleA_S2_L001_R1_001_summary.txt",
"sampletest_S4_L001_R1_001_fastqc_data.txt",
"sampletest_S4_L001_R1_001_fastqc_report.html",
"sampletest_S4_L001_R1_001_summary.txt",
"Undetermined_S0_L001_R1_001_fastqc_data.txt",
"Undetermined_S0_L001_R1_001_fastqc_report.html",
"Undetermined_S0_L001_R1_001_summary.txt"
].toSet()
assert publish_dir.resolve("qc/multiqc_report.html").exists()
def fastq_files = publish_dir.resolve("fastq").listFiles()
assert fastq_files.collect{it.name}.toSet() == [
"Sample1_S1_L001_R1_001.fastq.gz",
"Sample23_S3_L001_R1_001.fastq.gz",
"SampleA_S2_L001_R1_001.fastq.gz",
"sampletest_S4_L001_R1_001.fastq.gz",
"Undetermined_S0_L001_R1_001.fastq.gz"
].toSet()
assert publish_dir.resolve("SampleSheet.csv").exists()
} catch (Exception e) {
throw new WorkflowScriptErrorException("Integration test failed!", e)
}
}
}

0
target/.build.yaml Normal file
View File

View File

@@ -0,0 +1,489 @@
name: "bases2fastq"
version: "v0.4.0"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input"
arguments:
- type: "file"
name: "--analysis_directory"
description: "Location of analysis directory"
info: null
example:
- "input"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_manifest"
alternatives:
- "-r"
description: "Location of run manifest to use instead of default RunManifest.csv\
\ found in analysis directory"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output"
arguments:
- type: "file"
name: "--output_directory"
alternatives:
- "-o"
description: "Location to save output fastqs"
info: null
example:
- "fastq_dir"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--report"
description: "Output location for the HTML report"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--logs"
description: "Directory containing log files"
info: null
example:
- "logs_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Arguments"
arguments:
- type: "string"
name: "--chemistry_version"
description: "Run parameters override, chemistry version."
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--demux_only"
alternatives:
- "-d"
description: "Generate demux files and indexing stats without generating FASTQ\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--detect_adapters"
description: "Detect adapters sequences, overriding any sequences present in run\
\ manifest.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--error_on_missing"
description: "Terminate execution for a missing file (by default, missing files\
\ are\nskipped and execution continues). Also set by --strict.\n"
info: null
direction: "input"
- type: "string"
name: "--exclude_tile"
alternatives:
- "-e"
description: "Regex matching tile names to exclude. This flag can be specified\
\ multiple times. (e.g. L1.*C0[23]S.)\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--filter_mask"
description: "Run parameters override, custom pass filter mask.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--flowcell_id"
description: "Run parameters override, flowcell ID.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--force_index_orientation"
description: "Do not attempt to find orientation for I1/I2 reads (reverse complement).\n\
Use orientation given in run manifest.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--group_fastq"
description: "Group all FASTQ/stats/metrics for a project are in the project folder.\n"
info: null
direction: "input"
- type: "integer"
name: "--i1_cycles"
description: "Run parameters override, I1 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--i2_cycles"
description: "Run parameters override, I2 cycles\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--include_tile"
alternatives:
- "-i"
description: "Regex matching tile names to include. This flag\ncan be specified\
\ multiple times. (e.g. L1.*C0[23]S.)\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--kit_configuration"
description: "Run parameters override, kit configuration.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--legacy_fastq"
description: "Legacy naming for FASTQ files (e.g. SampleName_S1_L001_R1_001.fastq.gz)\n"
info: null
direction: "input"
- type: "string"
name: "--log_level"
alternatives:
- "-l"
description: "Severity level for logging.\n"
info: null
example:
- "INFO"
required: false
choices:
- "DEBUG"
- "INFO"
- "WARNING"
- "ERROR"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--no_error_on_invalid"
description: "Skip invalid files and continue execution. Overridden by --strict\
\ options\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_projects"
description: "Disable project directories\n"
info: null
direction: "input"
- type: "integer"
name: "--num_unassigned"
description: "Max Number of unassigned sequences to report.\n"
info: null
example:
- 30
required: false
min: 0
max: 1000
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--preparation_workflow"
description: "Run parameters override, preparation workflow. \n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--qc_only"
description: "Quickly generate run stats for single tile without generating FASTQ.\n\
Use --include_tile/--exclude_tile to define custom tile set.\n"
info: null
direction: "input"
- type: "integer"
name: "--r1_cycles"
description: "Run parameters override, R1 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--r2_cycles"
description: "Run parameters override, R2 cycles.\n"
info: null
required: false
min: 1
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--split_lanes"
description: "Split FASTQ files by lane.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--skip_qc_report"
description: "Do not generate HTML QC report.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--skip_multi_qc"
description: "Do not generate MultiQC HTML report.\n"
info: null
direction: "input"
- type: "string"
name: "--settings"
description: "Run manifest settings override. This option may be specified multiple\
\ times.\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Cyto-fastq Arguments"
arguments:
- type: "string"
name: "--batch"
description: "Restrict cyto-fastq generation to batch(es) that match comma delimited\
\ list (e.g. --batch B01,B02,B03).\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--cyto_fastq_mask"
description: "Cycle mask for cyto fastq generation. This flag can be specified\
\ multiple times.\n"
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--panel"
description: "Local or remote path to panel JSON\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--per_target_fastq"
description: "Create per-target fastq for each cell assignment target site in\
\ each DISS batch according to FastqMasks in TargetCellAssignmentManifest.\n"
info: null
direction: "input"
- type: "file"
name: "--tca_manifest"
description: "Location of TargetCellAssignmentManifest to use instead of default\
\ csv found in analysis directory\n"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "Bases2Fastq demultiplexes sequencing data generated by Element Biosciences\
\ instruments and converts base calls into FASTQ files.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "test_helpers.sh"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "bases2fastq"
- "ps"
keywords:
- "demultiplex"
- "fastq"
- "demux"
- "Element Biosciences"
license: "Proprietary"
links:
repository: "https://github.com/Illumina/bases2fastq"
homepage: "https://www.elembio.com/"
documentation: "https://docs.elembio.io/docs/bases2fastq/introduction/"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "elembio/bases2fastq:2.2"
target_registry: "images.viash-hub.com"
target_tag: "v0.4.0"
namespace_separator: "/"
setup:
- type: "docker"
run:
- "bases2fastq --version 2>&1 | head -1 | sed 's/.*version \\([0-9\\\\.]*\\).*/bases2fastq:\
\ \\1/' > /var/software_versions.txt\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/bases2fastq/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/bases2fastq"
executable: "target/nextflow/bases2fastq/main.nf"
viash_version: "0.9.4"
git_commit: "666507c86de9150bfbdffdb2dbabc1dbde3c3262"
git_remote: "https://github.com/viash-hub/biobox"
package_config:
name: "biobox"
version: "v0.4.0"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.4.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'bases2fastq'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.4.0'
description = 'Bases2Fastq demultiplexes sequencing data generated by Element Biosciences instruments and converts base calls into FASTQ files.\n'
author = 'Dries Schaumont'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,287 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "bases2fastq",
"description": "Bases2Fastq demultiplexes sequencing data generated by Element Biosciences instruments and converts base calls into FASTQ files.\n",
"type": "object",
"$defs": {
"arguments": {
"title": "Arguments",
"type": "object",
"description": "No description",
"properties": {
"chemistry_version": {
"type": "string",
"description": "Run parameters override, chemistry version.",
"help_text": "Type: `string`, multiple: `False`. "
},
"demux_only": {
"type": "boolean",
"description": "Generate demux files and indexing stats without generating FASTQ\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"detect_adapters": {
"type": "boolean",
"description": "Detect adapters sequences, overriding any sequences present in run manifest.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"error_on_missing": {
"type": "boolean",
"description": "Terminate execution for a missing file (by default, missing files are\nskipped and execution continues)",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"exclude_tile": {
"type": "array",
"items": {
"type": "string"
},
"description": "Regex matching tile names to exclude",
"help_text": "Type: `string`, multiple: `True`. "
},
"filter_mask": {
"type": "string",
"description": "Run parameters override, custom pass filter mask.\n",
"help_text": "Type: `string`, multiple: `False`. "
},
"flowcell_id": {
"type": "string",
"description": "Run parameters override, flowcell ID.\n",
"help_text": "Type: `string`, multiple: `False`. "
},
"force_index_orientation": {
"type": "boolean",
"description": "Do not attempt to find orientation for I1/I2 reads (reverse complement).\nUse orientation given in run manifest.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"group_fastq": {
"type": "boolean",
"description": "Group all FASTQ/stats/metrics for a project are in the project folder.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"i1_cycles": {
"type": "integer",
"description": "Run parameters override, I1 cycles.\n",
"help_text": "Type: `integer`, multiple: `False`. "
},
"i2_cycles": {
"type": "integer",
"description": "Run parameters override, I2 cycles\n",
"help_text": "Type: `integer`, multiple: `False`. "
},
"include_tile": {
"type": "array",
"items": {
"type": "string"
},
"description": "Regex matching tile names to include",
"help_text": "Type: `string`, multiple: `True`. "
},
"kit_configuration": {
"type": "string",
"description": "Run parameters override, kit configuration.\n",
"help_text": "Type: `string`, multiple: `False`. "
},
"legacy_fastq": {
"type": "boolean",
"description": "Legacy naming for FASTQ files (e.g",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"log_level": {
"type": "string",
"description": "Severity level for logging.\n",
"help_text": "Type: `string`, multiple: `False`, example: `\"INFO\"`, choices: ``DEBUG`, `INFO`, `WARNING`, `ERROR``. ",
"enum": [
"DEBUG",
"INFO",
"WARNING",
"ERROR"
]
},
"no_error_on_invalid": {
"type": "boolean",
"description": "Skip invalid files and continue execution",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"no_projects": {
"type": "boolean",
"description": "Disable project directories\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"num_unassigned": {
"type": "integer",
"description": "Max Number of unassigned sequences to report.\n",
"help_text": "Type: `integer`, multiple: `False`, example: `30`. "
},
"preparation_workflow": {
"type": "string",
"description": "Run parameters override, preparation workflow",
"help_text": "Type: `string`, multiple: `False`. "
},
"qc_only": {
"type": "boolean",
"description": "Quickly generate run stats for single tile without generating FASTQ.\nUse --include_tile/--exclude_tile to define custom tile set.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"r1_cycles": {
"type": "integer",
"description": "Run parameters override, R1 cycles.\n",
"help_text": "Type: `integer`, multiple: `False`. "
},
"r2_cycles": {
"type": "integer",
"description": "Run parameters override, R2 cycles.\n",
"help_text": "Type: `integer`, multiple: `False`. "
},
"split_lanes": {
"type": "boolean",
"description": "Split FASTQ files by lane.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"skip_qc_report": {
"type": "boolean",
"description": "Do not generate HTML QC report.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"skip_multi_qc": {
"type": "boolean",
"description": "Do not generate MultiQC HTML report.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"settings": {
"type": "array",
"items": {
"type": "string"
},
"description": "Run manifest settings override",
"help_text": "Type: `string`, multiple: `True`. "
}
}
},
"input": {
"title": "Input",
"type": "object",
"description": "No description",
"properties": {
"analysis_directory": {
"type": "string",
"format": "path",
"exists": true,
"description": "Location of analysis directory",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"input\"`. "
},
"run_manifest": {
"type": "string",
"format": "path",
"description": "Location of run manifest to use instead of default RunManifest.csv found in analysis directory",
"help_text": "Type: `file`, multiple: `False`, direction: `input`. "
}
}
},
"output": {
"title": "Output",
"type": "object",
"description": "No description",
"properties": {
"output_directory": {
"type": "string",
"format": "path",
"description": "Location to save output fastqs",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output_directory\"`, direction: `output`, example: `\"fastq_dir\"`. ",
"default": "$id.$key.output_directory"
},
"report": {
"type": "string",
"format": "path",
"description": "Output location for the HTML report",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.report\"`, direction: `output`. ",
"default": "$id.$key.report"
},
"logs": {
"type": "string",
"format": "path",
"description": "Directory containing log files",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.logs\"`, direction: `output`, example: `\"logs_dir\"`. ",
"default": "$id.$key.logs"
}
}
},
"cyto-fastq arguments": {
"title": "Cyto-fastq Arguments",
"type": "object",
"description": "No description",
"properties": {
"batch": {
"type": "string",
"description": "Restrict cyto-fastq generation to batch(es) that match comma delimited list (e.g",
"help_text": "Type: `string`, multiple: `False`. "
},
"cyto_fastq_mask": {
"type": "array",
"items": {
"type": "string"
},
"description": "Cycle mask for cyto fastq generation",
"help_text": "Type: `string`, multiple: `True`. "
},
"panel": {
"type": "string",
"format": "path",
"description": "Local or remote path to panel JSON\n",
"help_text": "Type: `file`, multiple: `False`, direction: `input`. "
},
"per_target_fastq": {
"type": "boolean",
"description": "Create per-target fastq for each cell assignment target site in each DISS batch according to FastqMasks in TargetCellAssignmentManifest.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"tca_manifest": {
"type": "string",
"format": "path",
"description": "Location of TargetCellAssignmentManifest to use instead of default csv found in analysis directory\n",
"help_text": "Type: `file`, multiple: `False`, direction: `input`. "
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
},
"allOf": [
{
"$ref": "#/$defs/arguments"
},
{
"$ref": "#/$defs/input"
},
{
"$ref": "#/$defs/output"
},
{
"$ref": "#/$defs/cyto-fastq arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,468 @@
name: "bcl_convert"
version: "v0.4.0"
authors:
- name: "Toni Verbeiren"
roles:
- "author"
- "maintainer"
info:
links:
github: "tverbeiren"
linkedin: "verbeiren"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist and CEO"
- name: "Dorien Roosen"
roles:
- "author"
info:
links:
email: "dorien@data-intuitive.com"
github: "dorien-er"
linkedin: "dorien-roosen"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--bcl_input_directory"
alternatives:
- "-i"
description: "Input run directory"
info: null
example:
- "bcl_dir"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_sheet"
alternatives:
- "-s"
description: "Path to SampleSheet.csv file (default searched for in --bcl_input_directory)"
info: null
example:
- "bcl_dir/sample_sheet.csv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--run_info"
description: "Path to RunInfo.xml file (default root of BCL input directory)"
info: null
example:
- "bcl_dir/RunInfo.xml"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Lane and tile settings"
arguments:
- type: "integer"
name: "--bcl_only_lane"
description: "Convert only specified lane number (default all lanes)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--first_tile_only"
description: "Only convert first tile of input (for testing & debugging)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--tiles"
description: "Process only a subset of tiles by a regular expression"
info: null
example:
- "s_[0-9]+_1"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--exclude_tiles"
description: "Exclude set of tiles by a regular expression"
info: null
example:
- "s_[0-9]+_1"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Resource arguments"
arguments:
- type: "boolean"
name: "--shared_thread_odirect_output"
description: "Use linux native asynchronous io (io_submit) for file output (Default=false)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_parallel_tiles"
description: "\\# of tiles to process in parallel (default 1)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_conversion_threads"
description: "\\# of threads for conversion (per tile, default # cpu threads)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_compression_threads"
description: "\\# of threads for fastq.gz output compression (per tile, default\
\ # cpu threads, or HW+12)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--bcl_num_decompression_threads"
description: "\\# of threads for bcl/cbcl input decompression (per tile, default\
\ half # cpu threads, or HW+8). Only applies when preloading files"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Run arguments"
arguments:
- type: "boolean"
name: "--bcl_only_matched_reads"
description: "For pure BCL conversion, do not output files for 'Undetermined'\
\ [unmatched] reads (output by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--no_lane_splitting"
description: "Do not split FASTQ file by lane (false by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--num_unknown_barcodes_reported"
description: "\\# of Top Unknown Barcodes to output (1000 by default)"
info: null
example:
- 1000
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--bcl_validate_sample_sheet_only"
description: "Only validate RunInfo.xml & SampleSheet files (produce no FASTQ\
\ files)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--strict_mode"
description: "Abort if any files are missing (false by default)"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--sample_name_column_enabled"
description: "Use sample sheet 'Sample_Name' column when naming fastq files &\
\ subdirectories"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_directory"
alternatives:
- "-o"
description: "Output directory containig fastq files"
info: null
example:
- "fastq_dir"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--bcl_sampleproject_subdirectories"
description: "Output to subdirectories based upon sample sheet 'Sample_Project'\
\ column"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--fastq_gzip_compression_level"
description: "Set fastq output compression level 0-9 (default 1)"
info: null
example:
- 1
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--reports"
description: "Reports directory"
info: null
example:
- "reports_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--logs"
description: "Reports directory"
info: null
example:
- "logs_dir"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--force"
description: "Allow destination directory to already exist and overwrite files.\n"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "Convert bcl files to fastq files using bcl-convert.\nInformation about\
\ upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\n\
and [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
keywords:
- "demultiplex"
- "fastq"
- "bcl"
- "illumina"
license: "Proprietary"
links:
repository: "https://github.com/viash-hub/biobox"
homepage: "https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html"
documentation: "https://support.illumina.com/downloads/bcl-convert-user-guide.html"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:trixie-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.4.0"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "wget"
- "gdb"
- "which"
- "hostname"
- "alien"
- "procps"
interactive: false
- type: "docker"
run:
- "wget https://s3.amazonaws.com/webdata.illumina.com/downloads/software/bcl-convert/bcl-convert-4.2.7-2.el8.x86_64.rpm\
\ -O /tmp/bcl-convert.rpm && \\\nalien -i /tmp/bcl-convert.rpm && \\\nrm -rf\
\ /var/lib/apt/lists/* && \\\nrm /tmp/bcl-convert.rpm\n"
- type: "docker"
run:
- "echo \"bcl-convert: \\\"$(bcl-convert -V 2>&1 >/dev/null | sed -n '/Version/\
\ s/^bcl-convert\\ Version //p')\\\"\" > /var/software_versions.txt\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/bcl_convert/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/bcl_convert"
executable: "target/nextflow/bcl_convert/main.nf"
viash_version: "0.9.4"
git_commit: "666507c86de9150bfbdffdb2dbabc1dbde3c3262"
git_remote: "https://github.com/viash-hub/biobox"
package_config:
name: "biobox"
version: "v0.4.0"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.4.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'bcl_convert'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.4.0'
description = 'Convert bcl files to fastq files using bcl-convert.\nInformation about upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\nand [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n'
author = 'Toni Verbeiren, Dorien Roosen'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,205 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "bcl_convert",
"description": "Convert bcl files to fastq files using bcl-convert.\nInformation about upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\nand [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"bcl_input_directory": {
"type": "string",
"format": "path",
"exists": true,
"description": "Input run directory",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"bcl_dir\"`. "
},
"sample_sheet": {
"type": "string",
"format": "path",
"description": "Path to SampleSheet.csv file (default searched for in --bcl_input_directory)",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"bcl_dir/sample_sheet.csv\"`. "
},
"run_info": {
"type": "string",
"format": "path",
"description": "Path to RunInfo.xml file (default root of BCL input directory)",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"bcl_dir/RunInfo.xml\"`. "
}
}
},
"lane and tile settings": {
"title": "Lane and tile settings",
"type": "object",
"description": "No description",
"properties": {
"bcl_only_lane": {
"type": "integer",
"description": "Convert only specified lane number (default all lanes)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
},
"first_tile_only": {
"type": "boolean",
"description": "Only convert first tile of input (for testing & debugging)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"tiles": {
"type": "string",
"description": "Process only a subset of tiles by a regular expression",
"help_text": "Type: `string`, multiple: `False`, example: `\"s_[0-9]+_1\"`. "
},
"exclude_tiles": {
"type": "string",
"description": "Exclude set of tiles by a regular expression",
"help_text": "Type: `string`, multiple: `False`, example: `\"s_[0-9]+_1\"`. "
}
}
},
"resource arguments": {
"title": "Resource arguments",
"type": "object",
"description": "No description",
"properties": {
"shared_thread_odirect_output": {
"type": "boolean",
"description": "Use linux native asynchronous io (io_submit) for file output (Default=false)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"bcl_num_parallel_tiles": {
"type": "integer",
"description": "\\# of tiles to process in parallel (default 1)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
},
"bcl_num_conversion_threads": {
"type": "integer",
"description": "\\# of threads for conversion (per tile, default # cpu threads)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
},
"bcl_num_compression_threads": {
"type": "integer",
"description": "\\# of threads for fastq.gz output compression (per tile, default # cpu threads, or HW+12)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
},
"bcl_num_decompression_threads": {
"type": "integer",
"description": "\\# of threads for bcl/cbcl input decompression (per tile, default half # cpu threads, or HW+8)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
}
}
},
"run arguments": {
"title": "Run arguments",
"type": "object",
"description": "No description",
"properties": {
"bcl_only_matched_reads": {
"type": "boolean",
"description": "For pure BCL conversion, do not output files for 'Undetermined' [unmatched] reads (output by default)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"no_lane_splitting": {
"type": "boolean",
"description": "Do not split FASTQ file by lane (false by default)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"num_unknown_barcodes_reported": {
"type": "integer",
"description": "\\# of Top Unknown Barcodes to output (1000 by default)",
"help_text": "Type: `integer`, multiple: `False`, example: `1000`. "
},
"bcl_validate_sample_sheet_only": {
"type": "boolean",
"description": "Only validate RunInfo.xml & SampleSheet files (produce no FASTQ files)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"strict_mode": {
"type": "boolean",
"description": "Abort if any files are missing (false by default)",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"sample_name_column_enabled": {
"type": "boolean",
"description": "Use sample sheet 'Sample_Name' column when naming fastq files & subdirectories",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
}
}
},
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_directory": {
"type": "string",
"format": "path",
"description": "Output directory containig fastq files",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output_directory\"`, direction: `output`, example: `\"fastq_dir\"`. ",
"default": "$id.$key.output_directory"
},
"bcl_sampleproject_subdirectories": {
"type": "boolean",
"description": "Output to subdirectories based upon sample sheet 'Sample_Project' column",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
},
"fastq_gzip_compression_level": {
"type": "integer",
"description": "Set fastq output compression level 0-9 (default 1)",
"help_text": "Type: `integer`, multiple: `False`, example: `1`. "
},
"reports": {
"type": "string",
"format": "path",
"description": "Reports directory",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.reports\"`, direction: `output`, example: `\"reports_dir\"`. ",
"default": "$id.$key.reports"
},
"logs": {
"type": "string",
"format": "path",
"description": "Reports directory",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.logs\"`, direction: `output`, example: `\"logs_dir\"`. ",
"default": "$id.$key.logs"
},
"force": {
"type": "boolean",
"description": "Allow destination directory to already exist and overwrite files.\n",
"help_text": "Type: `boolean`, multiple: `False`, example: `true`. "
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/lane and tile settings"
},
{
"$ref": "#/$defs/resource arguments"
},
{
"$ref": "#/$defs/run arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,388 @@
name: "fastqc"
version: "v0.4.0"
authors:
- name: "Theodoro Gasperin Terra Camargo"
roles:
- "author"
- "maintainer"
info:
links:
email: "theodorogtc@gmail.com"
github: "tgaspe"
linkedin: "theodoro-gasperin-terra-camargo"
argument_groups:
- name: "Inputs"
arguments:
- type: "file"
name: "--input"
description: "FASTQ file(s) to be analyzed.\n"
info: null
example:
- "input.fq"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Outputs"
description: "At least one of the output options (--html, --zip, --summary, --data)\
\ must be used.\n"
arguments:
- type: "file"
name: "--outdir"
description: "Output directory where the results will be saved.\n"
info: null
example:
- "results"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--html"
description: "Create the HTML report of the results. \n'*' wild card must be provided\
\ in the output file name. \nWild card will be replaced by the input file basename.\n\
e.g. \n --input \"sample_1.fq\"\n --html \"*.html\"\n would create an output\
\ html file named sample_1.html\n"
info: null
example:
- "*.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--zip"
description: "Create the zip file(s) containing: html report, data, images, icons,\
\ summary, etc.\n'*' wild card must be provided in the output file name.\nWild\
\ card will be replaced by the input basename.\ne.g. \n --input \"sample_1.fq\"\
\n --html \"*.zip\"\n would create an output zip file named sample_1.zip\n"
info: null
example:
- "*.zip"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--summary"
description: "Create the summary file(s).\n'*' wild card must be provided in the\
\ output file name.\nWild card will be replaced by the input basename.\ne.g.\
\ \n --input \"sample_1.fq\"\n --summary \"*_summary.txt\"\n would create\
\ an output summary.txt file named sample_1_summary.txt\n"
info: null
example:
- "*_summary.txt"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--data"
description: "Create the data file(s).\n'*' wild card must be provided in the\
\ output file name.\nWild card will be replaced by the input basename.\ne.g.\
\ \n --input \"sample_1.fq\"\n --summary \"*_data.txt\"\n would create an\
\ output data.txt file named sample_1_data.txt\n"
info: null
example:
- "*_data.txt"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- name: "Options"
arguments:
- type: "boolean_true"
name: "--casava"
description: "Files come from raw casava output. Files in the same sample\ngroup\
\ (differing only by the group number) will be analysed\nas a set rather than\
\ individually. Sequences with the filter\nflag set in the header will be excluded\
\ from the analysis.\nFiles must have the same names given to them by casava\n\
(including being gzipped and ending with .gz) otherwise they\nwon't be grouped\
\ together correctly.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--nano"
description: "Files come from nanopore sequences and are in fast5 format. In\n\
this mode you can pass in directories to process and the program\nwill take\
\ in all fast5 files within those directories and produce\na single output file\
\ from the sequences found in all files.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--nofilter"
description: "If running with --casava then don't remove read flagged by\ncasava\
\ as poor quality when performing the QC analysis.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--nogroup"
description: "Disable grouping of bases for reads >50bp. \nAll reports will show\
\ data for every base in the read. \nWARNING: Using this option will cause fastqc\
\ to crash \nand burn if you use it on really long reads, and your \nplots may\
\ end up a ridiculous size. You have been warned!\n"
info: null
direction: "input"
- type: "integer"
name: "--min_length"
description: "Sets an artificial lower limit on the length of the \nsequence to\
\ be shown in the report. As long as you \nset this to a value greater or equal\
\ to your longest \nread length then this will be the sequence length used \n\
to create your read groups. This can be useful for making\ndirectly comparable\
\ statistics from datasets with somewhat \nvariable read lengths.\n"
info: null
example:
- 0
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--format"
alternatives:
- "-f"
description: "Bypasses the normal sequence file format detection and \nforces\
\ the program to use the specified format. \nValid formats are bam, sam, bam_mapped,\
\ sam_mapped, and fastq.\n"
info: null
example:
- "bam"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--contaminants"
alternatives:
- "-c"
description: "Specifies a non-default file which contains the list \nof contaminants\
\ to screen overrepresented sequences against. \nThe file must contain sets\
\ of named contaminants in the form\nname[tab]sequence. Lines prefixed with\
\ a hash will be ignored.\n"
info: null
example:
- "contaminants.txt"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--adapters"
alternatives:
- "-a"
description: "Specifies a non-default file which contains the list of \nadapter\
\ sequences which will be explicitly searched against \nthe library. The file\
\ must contain sets of named adapters \nin the form name[tab]sequence. Lines\
\ prefixed with a hash will be ignored.\n"
info: null
example:
- "adapters.txt"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--limits"
alternatives:
- "-l"
description: "Specifies a non-default file which contains \na set of criteria\
\ which will be used to determine \nthe warn/error limits for the various modules.\
\ \nThis file can also be used to selectively remove \nsome modules from the\
\ output altogether. The format \nneeds to mirror the default limits.txt file\
\ found in \nthe Configuration folder.\n"
info: null
example:
- "limits.txt"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--kmers"
alternatives:
- "-k"
description: "Specifies the length of Kmer to look for in the Kmer \ncontent module.\
\ Specified Kmer length must be between \n2 and 10. Default length is 7 if not\
\ specified.\n"
info: null
example:
- 7
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--quiet"
alternatives:
- "-q"
description: "Suppress all progress messages on stdout and only report errors.\n"
info: null
direction: "input"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "FastQC - A high throughput sequence QC analysis tool."
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
keywords:
- "Quality control"
- "BAM"
- "SAM"
- "FASTQ"
license: "GPL-3.0, Apache-2.0"
links:
repository: "https://github.com/s-andrews/FastQC"
homepage: "https://www.bioinformatics.babraham.ac.uk/projects/fastqc/"
documentation: "https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/"
issue_tracker: "https://github.com/s-andrews/FastQC/issues"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "biocontainers/fastqc:v0.11.9_cv8"
target_registry: "images.viash-hub.com"
target_tag: "v0.4.0"
namespace_separator: "/"
setup:
- type: "docker"
run:
- "echo \"fastqc: $(fastqc --version | sed -n 's/^FastQC //p')\" > /var/software_versions.txt\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/fastqc/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/fastqc"
executable: "target/nextflow/fastqc/main.nf"
viash_version: "0.9.4"
git_commit: "666507c86de9150bfbdffdb2dbabc1dbde3c3262"
git_remote: "https://github.com/viash-hub/biobox"
package_config:
name: "biobox"
version: "v0.4.0"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.4.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'fastqc'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.4.0'
description = 'FastQC - A high throughput sequence QC analysis tool.'
author = 'Theodoro Gasperin Terra Camargo'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,175 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "fastqc",
"description": "FastQC - A high throughput sequence QC analysis tool.",
"type": "object",
"$defs": {
"inputs": {
"title": "Inputs",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"exists": true,
"description": "FASTQ file(s) to be analyzed.\n",
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`, example: `[\"input.fq\"]`. "
}
}
},
"outputs": {
"title": "Outputs",
"type": "object",
"description": "At least one of the output options (--html, --zip, --summary, --data) must be used.\n",
"properties": {
"outdir": {
"type": "string",
"format": "path",
"description": "Output directory where the results will be saved.\n",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.outdir\"`, direction: `output`, example: `\"results\"`. ",
"default": "$id.$key.outdir"
},
"html": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "Create the HTML report of the results",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.html_*.html\"`, direction: `output`, example: `[\"*.html\"]`. ",
"default": "$id.$key.html_*.html"
},
"zip": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "Create the zip file(s) containing: html report, data, images, icons, summary, etc.\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.zip_*.zip\"`, direction: `output`, example: `[\"*.zip\"]`. ",
"default": "$id.$key.zip_*.zip"
},
"summary": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "Create the summary file(s).\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.summary_*.txt\"`, direction: `output`, example: `[\"*_summary.txt\"]`. ",
"default": "$id.$key.summary_*.txt"
},
"data": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "Create the data file(s).\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.data_*.txt\"`, direction: `output`, example: `[\"*_data.txt\"]`. ",
"default": "$id.$key.data_*.txt"
}
}
},
"options": {
"title": "Options",
"type": "object",
"description": "No description",
"properties": {
"casava": {
"type": "boolean",
"description": "Files come from raw casava output",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"nano": {
"type": "boolean",
"description": "Files come from nanopore sequences and are in fast5 format",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"nofilter": {
"type": "boolean",
"description": "If running with --casava then don't remove read flagged by\ncasava as poor quality when performing the QC analysis.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"nogroup": {
"type": "boolean",
"description": "Disable grouping of bases for reads >50bp",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"min_length": {
"type": "integer",
"description": "Sets an artificial lower limit on the length of the \nsequence to be shown in the report",
"help_text": "Type: `integer`, multiple: `False`, example: `0`. "
},
"format": {
"type": "string",
"description": "Bypasses the normal sequence file format detection and \nforces the program to use the specified format",
"help_text": "Type: `string`, multiple: `False`, example: `\"bam\"`. "
},
"contaminants": {
"type": "string",
"format": "path",
"description": "Specifies a non-default file which contains the list \nof contaminants to screen overrepresented sequences against",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"contaminants.txt\"`. "
},
"adapters": {
"type": "string",
"format": "path",
"description": "Specifies a non-default file which contains the list of \nadapter sequences which will be explicitly searched against \nthe library",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"adapters.txt\"`. "
},
"limits": {
"type": "string",
"format": "path",
"description": "Specifies a non-default file which contains \na set of criteria which will be used to determine \nthe warn/error limits for the various modules",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"limits.txt\"`. "
},
"kmers": {
"type": "integer",
"description": "Specifies the length of Kmer to look for in the Kmer \ncontent module",
"help_text": "Type: `integer`, multiple: `False`, example: `7`. "
},
"quiet": {
"type": "boolean",
"description": "Suppress all progress messages on stdout and only report errors.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
},
"allOf": [
{
"$ref": "#/$defs/inputs"
},
{
"$ref": "#/$defs/outputs"
},
{
"$ref": "#/$defs/options"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,496 @@
name: "multiqc"
version: "v0.4.0"
authors:
- name: "Dorien Roosen"
roles:
- "author"
- "maintainer"
info:
links:
email: "dorien@data-intuitive.com"
github: "dorien-er"
linkedin: "dorien-roosen"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input"
arguments:
- type: "file"
name: "--input"
description: "File paths to be searched for analysis results to be included in\
\ the report.\n"
info: null
example:
- "data/results"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Ouput"
arguments:
- type: "file"
name: "--output_report"
description: "Filepath of the generated report.\n"
info: null
example:
- "multiqc_report.html"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_data"
description: "Output directory for parsed data files. If not provided, parsed\
\ data will not be published.\n"
info: null
example:
- "multiqc_data"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_plots"
description: "Output directory for generated plots. If not provided, plots will\
\ not be published.\n"
info: null
example:
- "multiqc_plots"
must_exist: false
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Modules and analyses to run"
arguments:
- type: "string"
name: "--include_modules"
description: "Use only these module"
info: null
example:
- "fastqc"
- "cutadapt"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--exclude_modules"
description: "Do not use only these modules"
info: null
example:
- "fastqc"
- "cutadapt"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--ignore_analysis"
info: null
example:
- "run_one/*"
- "run_two/*"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "string"
name: "--ignore_samples"
info: null
example:
- "sample_1*"
- "sample_3*"
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "boolean_true"
name: "--ignore_symlinks"
description: "Ignore symlinked directories and files"
info: null
direction: "input"
- name: "Sample name handling"
arguments:
- type: "boolean_true"
name: "--dirs"
description: "Prepend directory to sample names to avoid clashing filenames"
info: null
direction: "input"
- type: "integer"
name: "--dirs_depth"
description: "Prepend n directories to sample names. Negative number to take from\
\ start of path."
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--full_names"
description: "Do not clean the sample names (leave as full file name)"
info: null
direction: "input"
- type: "boolean_true"
name: "--fn_as_s_name"
description: "Use the log filename as the sample name"
info: null
direction: "input"
- type: "file"
name: "--replace_names"
description: "TSV file to rename sample names during report generation"
info: null
example:
- "replace_names.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Report Customisation"
arguments:
- type: "string"
name: "--title"
description: "Report title. Printed as page header, used for filename if not otherwise\
\ specified.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--comment"
description: "Custom comment, will be printed at the top of the report.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--template"
description: "Report template to use.\n"
info: null
required: false
choices:
- "default"
- "gathered"
- "geo"
- "highcharts"
- "sections"
- "simple"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_names"
description: "TSV file containing alternative sample names for renaming buttons\
\ in the report.\n"
info: null
example:
- "sample_names.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_filters"
description: "TSV file containing show/hide patterns for the report\n"
info: null
example:
- "sample_filters.tsv"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--custom_css_file"
description: "Custom CSS file to add to the final report\n"
info: null
example:
- "custom_style_sheet.css"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--profile_runtime"
description: "Add analysis of how long MultiQC takes to run to the report\n"
info: null
direction: "input"
- name: "MultiQC behaviour"
arguments:
- type: "boolean_true"
name: "--verbose"
description: "Increase output verbosity.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--quiet"
description: "Only show log warnings\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--strict"
description: "Don't catch exceptions, run additional code checks to help development.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--development"
description: "Development mode. Do not compress and minimise JS, export uncompressed\
\ plot data.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--require_logs"
description: "Require all explicitly requested modules to have log files. If not,\
\ MultiQC will exit with an error.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_megaqc_upload"
description: "Don't upload generated report to MegaQC, even if MegaQC options\
\ are found.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_ansi"
description: "Disable coloured log output.\n"
info: null
direction: "input"
- type: "string"
name: "--cl_config"
description: "YAML formatted string that allows to customize MultiQC behaviour\
\ like input file detection.\n"
info: null
example:
- "qualimap_config: { general_stats_coverage: [20,40,200] }"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output format"
arguments:
- type: "boolean_true"
name: "--flat"
description: "Use only flat plots (static images).\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--interactive"
description: "Use only interactive plots (in-browser Javascript).\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--data_dir"
description: "Force the parsed data directory to be created.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--no_data_dir"
description: "Prevent the parsed data directory from being created.\n"
info: null
direction: "input"
- type: "boolean_true"
name: "--zip_data_dir"
description: "Compress the data directory.\n"
info: null
direction: "input"
- type: "string"
name: "--data_format"
description: "Output parsed data in a different format than the default 'txt'.\n"
info: null
required: false
choices:
- "tsv"
- "csv"
- "json"
- "yaml"
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--pdf"
description: "Creates PDF report with the 'simple' template. Requires Pandoc to\
\ be installed.\n"
info: null
direction: "input"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
description: "MultiQC aggregates results from bioinformatics analyses across many\
\ samples into a single report.\nIt searches a given directory for analysis logs\
\ and compiles a HTML report. It's a general use tool, perfect for summarising the\
\ output from numerous bioinformatics tools.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "test_data"
info:
keywords:
- "QC"
- "html report"
- "aggregate analysis"
links:
homepage: "https://multiqc.info/"
documentation: "https://multiqc.info/docs/"
repository: "https://github.com/MultiQC/MultiQC"
references:
doi: "10.1093/bioinformatics/btw354"
licence: "GPL v3 or later"
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/biobox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0"
target_registry: "images.viash-hub.com"
target_tag: "v0.4.0"
namespace_separator: "/"
setup:
- type: "docker"
run:
- "multiqc --version | sed 's/multiqc, version\\s\\(.*\\)/multiqc: \"\\1\"/' >\
\ /var/software_versions.txt\n"
test_setup:
- type: "apt"
packages:
- "jq"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/multiqc/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/multiqc"
executable: "target/nextflow/multiqc/main.nf"
viash_version: "0.9.4"
git_commit: "666507c86de9150bfbdffdb2dbabc1dbde3c3262"
git_remote: "https://github.com/viash-hub/biobox"
package_config:
name: "biobox"
version: "v0.4.0"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.4.0'"
keywords:
- "bioinformatics"
- "modules"
- "sequencing"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/biobox"
issue_tracker: "https://github.com/viash-hub/biobox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'multiqc'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.4.0'
description = 'MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It\'s a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n'
author = 'Dorien Roosen'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,334 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "multiqc",
"description": "MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n",
"type": "object",
"$defs": {
"input": {
"title": "Input",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"exists": true,
"description": "File paths to be searched for analysis results to be included in the report.\n",
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`, example: `[\"data/results\"]`. "
}
}
},
"ouput": {
"title": "Ouput",
"type": "object",
"description": "No description",
"properties": {
"output_report": {
"type": "string",
"format": "path",
"description": "Filepath of the generated report.\n",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_report.html\"`, direction: `output`, example: `\"multiqc_report.html\"`. ",
"default": "$id.$key.output_report.html"
},
"output_data": {
"type": "string",
"format": "path",
"description": "Output directory for parsed data files",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_data\"`, direction: `output`, example: `\"multiqc_data\"`. ",
"default": "$id.$key.output_data"
},
"output_plots": {
"type": "string",
"format": "path",
"description": "Output directory for generated plots",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_plots\"`, direction: `output`, example: `\"multiqc_plots\"`. ",
"default": "$id.$key.output_plots"
}
}
},
"modules and analyses to run": {
"title": "Modules and analyses to run",
"type": "object",
"description": "No description",
"properties": {
"include_modules": {
"type": "array",
"items": {
"type": "string"
},
"description": "Use only these module",
"help_text": "Type: `string`, multiple: `True`, example: `[\"fastqc\";\"cutadapt\"]`. "
},
"exclude_modules": {
"type": "array",
"items": {
"type": "string"
},
"description": "Do not use only these modules",
"help_text": "Type: `string`, multiple: `True`, example: `[\"fastqc\";\"cutadapt\"]`. "
},
"ignore_analysis": {
"type": "array",
"items": {
"type": "string"
},
"description": "",
"help_text": "Type: `string`, multiple: `True`, example: `[\"run_one/*\";\"run_two/*\"]`. "
},
"ignore_samples": {
"type": "array",
"items": {
"type": "string"
},
"description": "",
"help_text": "Type: `string`, multiple: `True`, example: `[\"sample_1*\";\"sample_3*\"]`. "
},
"ignore_symlinks": {
"type": "boolean",
"description": "Ignore symlinked directories and files",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"sample name handling": {
"title": "Sample name handling",
"type": "object",
"description": "No description",
"properties": {
"dirs": {
"type": "boolean",
"description": "Prepend directory to sample names to avoid clashing filenames",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"dirs_depth": {
"type": "integer",
"description": "Prepend n directories to sample names",
"help_text": "Type: `integer`, multiple: `False`. "
},
"full_names": {
"type": "boolean",
"description": "Do not clean the sample names (leave as full file name)",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"fn_as_s_name": {
"type": "boolean",
"description": "Use the log filename as the sample name",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"replace_names": {
"type": "string",
"format": "path",
"description": "TSV file to rename sample names during report generation",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"replace_names.tsv\"`. "
}
}
},
"report customisation": {
"title": "Report Customisation",
"type": "object",
"description": "No description",
"properties": {
"title": {
"type": "string",
"description": "Report title",
"help_text": "Type: `string`, multiple: `False`. "
},
"comment": {
"type": "string",
"description": "Custom comment, will be printed at the top of the report.\n",
"help_text": "Type: `string`, multiple: `False`. "
},
"template": {
"type": "string",
"description": "Report template to use.\n",
"help_text": "Type: `string`, multiple: `False`, choices: ``default`, `gathered`, `geo`, `highcharts`, `sections`, `simple``. ",
"enum": [
"default",
"gathered",
"geo",
"highcharts",
"sections",
"simple"
]
},
"sample_names": {
"type": "string",
"format": "path",
"description": "TSV file containing alternative sample names for renaming buttons in the report.\n",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"sample_names.tsv\"`. "
},
"sample_filters": {
"type": "string",
"format": "path",
"description": "TSV file containing show/hide patterns for the report\n",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"sample_filters.tsv\"`. "
},
"custom_css_file": {
"type": "string",
"format": "path",
"description": "Custom CSS file to add to the final report\n",
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"custom_style_sheet.css\"`. "
},
"profile_runtime": {
"type": "boolean",
"description": "Add analysis of how long MultiQC takes to run to the report\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"multiqc behaviour": {
"title": "MultiQC behaviour",
"type": "object",
"description": "No description",
"properties": {
"verbose": {
"type": "boolean",
"description": "Increase output verbosity.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"quiet": {
"type": "boolean",
"description": "Only show log warnings\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"strict": {
"type": "boolean",
"description": "Don't catch exceptions, run additional code checks to help development.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"development": {
"type": "boolean",
"description": "Development mode",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"require_logs": {
"type": "boolean",
"description": "Require all explicitly requested modules to have log files",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"no_megaqc_upload": {
"type": "boolean",
"description": "Don't upload generated report to MegaQC, even if MegaQC options are found.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"no_ansi": {
"type": "boolean",
"description": "Disable coloured log output.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"cl_config": {
"type": "string",
"description": "YAML formatted string that allows to customize MultiQC behaviour like input file detection.\n",
"help_text": "Type: `string`, multiple: `False`, example: `\"qualimap_config: { general_stats_coverage: [20,40,200] }\"`. "
}
}
},
"output format": {
"title": "Output format",
"type": "object",
"description": "No description",
"properties": {
"flat": {
"type": "boolean",
"description": "Use only flat plots (static images).\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"interactive": {
"type": "boolean",
"description": "Use only interactive plots (in-browser Javascript).\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"data_dir": {
"type": "boolean",
"description": "Force the parsed data directory to be created.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"no_data_dir": {
"type": "boolean",
"description": "Prevent the parsed data directory from being created.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"zip_data_dir": {
"type": "boolean",
"description": "Compress the data directory.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
},
"data_format": {
"type": "string",
"description": "Output parsed data in a different format than the default 'txt'.\n",
"help_text": "Type: `string`, multiple: `False`, choices: ``tsv`, `csv`, `json`, `yaml``. ",
"enum": [
"tsv",
"csv",
"json",
"yaml"
]
},
"pdf": {
"type": "boolean",
"description": "Creates PDF report with the 'simple' template",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
},
"allOf": [
{
"$ref": "#/$defs/input"
},
{
"$ref": "#/$defs/ouput"
},
{
"$ref": "#/$defs/modules and analyses to run"
},
{
"$ref": "#/$defs/sample name handling"
},
{
"$ref": "#/$defs/report customisation"
},
{
"$ref": "#/$defs/multiqc behaviour"
},
{
"$ref": "#/$defs/output format"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,193 @@
name: "interop_summary_to_csv"
namespace: "io"
version: "update_biobox"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Sequencing run folder (*not* InterOp folder)."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_run_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_index_summary"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
- type: "file"
path: "_viash.yaml"
dest: "_viash.yaml"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "iseq-DI"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "summary"
- "index-summary"
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "update_biobox"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
- "wget"
interactive: false
- type: "docker"
run:
- "wget https://github.com/Illumina/interop/releases/download/v1.3.1/interop-1.3.1-Linux-GNU.tar.gz\
\ -O /tmp/interop.tar.gz && \\\ntar -C /tmp/ --no-same-owner --no-same-permissions\
\ -xvf /tmp/interop.tar.gz && \\\nmv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary\
\ /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/\n"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/interop_summary_to_csv/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/interop_summary_to_csv"
executable: "target/executable/io/interop_summary_to_csv/interop_summary_to_csv"
viash_version: "0.9.4"
git_commit: "4076dacab7abd6104c859c2b7188e592568946e8"
git_remote: "https://github.com/viash-hub/demultiplex"
package_config:
name: "demultiplex"
version: "update_biobox"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v4"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script += 'includeConfig(\"nextflow_labels.config\"\
)'\n.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'update_biobox'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,255 @@
name: "publish"
namespace: "io"
version: "update_biobox"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Directory to write fastq data to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_sample_qc"
description: "Directory to write sample QC output to"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--input_multiqc"
description: "Location where to write the MultiQC report to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_run_information"
description: "Location where to write the run information to."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_demultiplexer_logs"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
info: null
default:
- "fastq"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_sample_qc"
info: null
default:
- "qc/fastqc"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_multiqc"
info: null
default:
- "qc/multiqc_report.html"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_run_information"
info: null
default:
- "run_information.csv"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_demultiplexer_logs"
info: null
default:
- "demultiplexer_logs"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "code.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
- type: "file"
path: "_viash.yaml"
dest: "_viash.yaml"
description: "Publish the processed results of the run"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "update_biobox"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/publish/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/publish"
executable: "target/executable/io/publish/publish"
viash_version: "0.9.4"
git_commit: "4076dacab7abd6104c859c2b7188e592568946e8"
git_remote: "https://github.com/viash-hub/demultiplex"
package_config:
name: "demultiplex"
version: "update_biobox"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v4"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script += 'includeConfig(\"nextflow_labels.config\"\
)'\n.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'update_biobox'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,192 @@
name: "untar"
namespace: "io"
version: "update_biobox"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Tarball file to be unpacked."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write the contents of the .tar file to."
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "string"
name: "--exclude"
alternatives:
- "-e"
description: "Prevents any file or member whose name matches the shell wildcard\
\ (pattern) from being extracted."
info: null
example:
- "docs/figures"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
- type: "file"
path: "_viash.yaml"
dest: "_viash.yaml"
description: "Unpack a .tar file. When the contents of the .tar file is just a single\
\ directory,\nput the contents of the directory into the output folder instead of\
\ that directory.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "update_biobox"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/io/untar/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/io/untar"
executable: "target/executable/io/untar/untar"
viash_version: "0.9.4"
git_commit: "4076dacab7abd6104c859c2b7188e592568946e8"
git_remote: "https://github.com/viash-hub/demultiplex"
package_config:
name: "demultiplex"
version: "update_biobox"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v4"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script += 'includeConfig(\"nextflow_labels.config\"\
)'\n.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'update_biobox'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

1167
target/executable/io/untar/untar Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,201 @@
name: "combine_samples"
namespace: "dataflow"
version: "update_biobox"
argument_groups:
- name: "Input arguments"
arguments:
- type: "string"
name: "--id"
description: "ID of the new event"
info: null
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--forward_input"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--reverse_input"
info: null
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--sample_qc_dir"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output_forward"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_reverse"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_sample_qc"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
- type: "file"
path: "_viash.yaml"
dest: "_viash.yaml"
description: "Combine fastq files from across samples into one event with a list of\
\ fastq files per orientation."
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/dataflow/combine_samples/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/dataflow/combine_samples"
executable: "target/nextflow/dataflow/combine_samples/main.nf"
viash_version: "0.9.4"
git_commit: "4076dacab7abd6104c859c2b7188e592568946e8"
git_remote: "https://github.com/viash-hub/demultiplex"
package_config:
name: "demultiplex"
version: "update_biobox"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v4"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script += 'includeConfig(\"nextflow_labels.config\"\
)'\n.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'update_biobox'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'dataflow/combine_samples'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'update_biobox'
description = 'Combine fastq files from across samples into one event with a list of fastq files per orientation.'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

View File

@@ -0,0 +1,106 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "combine_samples",
"description": "Combine fastq files from across samples into one event with a list of fastq files per orientation.",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type": "string",
"description": "ID of the new event",
"help_text": "Type: `string`, multiple: `False`, required. "
},
"forward_input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"exists": true,
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`. "
},
"reverse_input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, direction: `input`. "
},
"sample_qc_dir": {
"type": "string",
"format": "path",
"exists": true,
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_forward": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, default: `\"$id.$key.output_forward_*\"`, direction: `output`. ",
"default": "$id.$key.output_forward_*"
},
"output_reverse": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.output_reverse_*\"`, direction: `output`. ",
"default": "$id.$key.output_reverse_*"
},
"output_sample_qc": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, default: `\"$id.$key.output_sample_qc_*\"`, direction: `output`. ",
"default": "$id.$key.output_sample_qc_*"
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,192 @@
name: "gather_fastqs_and_validate"
namespace: "dataflow"
version: "update_biobox"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Directory containing .fastq files"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--sample_sheet"
description: "Sample sheet"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--fastq_forward"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--fastq_reverse"
info: null
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: true
multiple_sep: ";"
resources:
- type: "nextflow_script"
path: "main.nf"
is_executable: true
entrypoint: "run_wf"
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
- type: "file"
path: "_viash.yaml"
dest: "_viash.yaml"
description: "From a directory containing fastq files, gather the files per sample\
\ \nand validate according to the contents of the sample sheet.\n"
test_resources:
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_gather_and_validate"
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_undetermined_empty"
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_without_index"
- type: "file"
path: "test_data"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
runners:
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
script:
- "includeConfig(\"nextflow_labels.config\")"
debug: false
container: "docker"
engines:
- type: "native"
id: "native"
- type: "native"
id: "native"
build_info:
config: "src/dataflow/gather_fastqs_and_validate/config.vsh.yaml"
runner: "nextflow"
engine: "native|native"
output: "target/nextflow/dataflow/gather_fastqs_and_validate"
executable: "target/nextflow/dataflow/gather_fastqs_and_validate/main.nf"
viash_version: "0.9.4"
git_commit: "4076dacab7abd6104c859c2b7188e592568946e8"
git_remote: "https://github.com/viash-hub/demultiplex"
package_config:
name: "demultiplex"
version: "update_biobox"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-resources/demultiplex/v4"
dest: "testData"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script += 'includeConfig(\"nextflow_labels.config\"\
)'\n.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'update_biobox'"
keywords:
- "bioinformatics"
- "sequence"
- "demultiplexing"
- "pipeline"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/demultiplex"
issue_tracker: "https://github.com/viash-hub/demultiplex/issues"

View File

@@ -0,0 +1,21 @@
name: demultiplex
description: |
Demultiplexing pipeline
license: MIT
keywords: [bioinformatics, sequence, demultiplexing, pipeline]
links:
issue_tracker: https://github.com/viash-hub/demultiplex/issues
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-resources/demultiplex/v4
dest: testData
viash_version: 0.9.4
config_mods: |
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script += 'includeConfig("nextflow_labels.config")'
.resources += {path: '/_viash.yaml', dest: '_viash.yaml'}
version: update_biobox
organization: vsh

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
manifest {
name = 'dataflow/gather_fastqs_and_validate'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'update_biobox'
description = 'From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}
includeConfig("nextflow_labels.config")

View File

@@ -0,0 +1,98 @@
process {
container = 'nextflow/bash:latest'
// default resources
memory = { 8.Gb * task.attempt }
cpus = 8
maxForks = 36
// Retry for exit codes that have something to do with memory issues
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
maxMemory = 192.GB
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 32 }
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
}
profiles {
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
docker {
docker.fixOwnership = true
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
local {
// This config is for local processing.
process {
maxMemory = 25.GB
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 4 }
withLabel: midcpu { cpus = 6 }
withLabel: highcpu { cpus = 12 }
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
}
}
}
def get_memory(to_compare) {
if (!process.containsKey("maxMemory") || !process.maxMemory) {
return to_compare
}
try {
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
return process.maxMemory
}
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
return max_memory as nextflow.util.MemoryUnit
}
else {
return to_compare
}
} catch (all) {
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
System.exit(1)
}
}

Some files were not shown because too many files have changed in this diff Show More