Compare commits

...

8 Commits
v0.3.4 ... v0.3

Author SHA1 Message Date
CI
b025c3195b Build branch v0.3 with version v0.3.12 (3a84e54)
Build pipeline: viash-hub.demultiplex.v0.3-xpkf6

Source commit: 3a84e5409c

Source message: Bump version to v0.3.12
2025-05-26 11:03:33 +00:00
CI
194cb8d0f1 Build branch v0.3 with version v0.3.11 (cd2cfac)
Build pipeline: viash-hub.demultiplex.v0.3-nwj5p

Source commit: cd2cfac18e

Source message: Bump version to v0.3.11
2025-05-14 09:11:58 +00:00
CI
b90bd3453a Build branch v0.3 with version v0.3.10 (e6fbd0c)
Build pipeline: vsh-ci-build-template-xmldm

Source commit: e6fbd0c220

Source message: Release version v0.3.10
2025-05-07 11:35:04 +00:00
CI
b5f42c8055 Build branch v0.3 with version v0.3.9 (820318c)
Build pipeline: viash-hub.demultiplex.v0.3-5lwbd

Source commit: 820318c378

Source message: Bump version to v0.3.9
2025-04-25 12:27:03 +00:00
CI
cd1815354d Build branch v0.3 with version v0.3.8 (dafe2d8)
Build pipeline: viash-hub.demultiplex.v0.3-brf8v

Source commit: dafe2d8290

Source message: Bump version to v0.3.8
2025-03-27 16:16:22 +00:00
CI
e9dce89f32 Build branch v0.3 with version v0.3.7 (e177226)
Build pipeline: viash-hub.demultiplex.v0.3-dc7wn

Source commit: e177226c52

Source message: Bump version to v0.3.7
2025-03-20 20:56:15 +00:00
CI
0cbd5c00cb Build branch v0.3 with version v0.3.6 (3a985c8)
Build pipeline: viash-hub.demultiplex.v0.3-2vbxf

Source commit: 3a985c8386

Source message: Bump version to v0.3.6
2025-03-20 20:00:43 +00:00
CI
78f3a8b65e Build branch v0.3 with version v0.3.5 (18e092d)
Build pipeline: viash-hub.demultiplex.v0.3-ldxlr

Source commit: 18e092d169

Source message: Bump version to 0.3.5
2025-03-04 13:36:18 +00:00
94 changed files with 6471 additions and 2612 deletions

View File

@@ -1,3 +1,69 @@
# demultiplex v0.3.12
## New features
* Add support for Nextflow versions version starting 25.xx.xx (PR #50).
## Bug fixes
* Allow FASTQ files for `Undetermined` to be empty (PR #50).
# demultiplex v0.3.11
## New features
* Output demultiplexer logs and metrics (PR #41).
# demultiplex v0.3.10
## Minor changes
* Moved the test resources to their new location (PR #37).
# demultiplex v0.3.9
## Bug fixes
* Fix defaults for output arguments in nextflow schema's.
* Fix an issue where an integer being passed to a argument with `type: double` resulted in an error (PR #44).
## Minor changes
* Bump viash to 0.9.4, which adds support for nextflow versions starting major version 25.01 (PR #43 and #44).
# demultiplex v0.3.8
## Bug fixes
* Provide a proper error when a FASTQ file is empty after demultiplexing (PR #40).
# demultiplex v0.3.7
## Minor updates
* Ignore lines starting with '#' when parsing run information CSV (PR #39).
# demultiplex v0.3.6
## Minor updates
* Allow letter case variants for headers when looking for sample information in run information CSV (PR #38).
# demultiplex v0.3.5
## Breaking changes
* The `demultiplex` workflow now outputs a list of directories
for the `output_falco` argument (one for each barcode) instead of one directory
for the complete run. The output from the `runner` workflow remained
unchanged (PR #33).
## Minor updates
* In case Illumina data is detected in the input folder, check for the presence of the 'copyComplete.txt' file.
This check can be disabled using `--skip_copycomplete_check` (PR #34).
# demultiplex v0.3.4
## Minor updates

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024 Data Intuitive
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

219
README.md Normal file
View File

@@ -0,0 +1,219 @@
# Demultiplex.vsh
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data.
Currently data from Illumina and Element Biosciences sequencers are
supported.
[![ViashHub](https://img.shields.io/badge/ViashHub-demultiplex-7a4baa.svg)](https://web.viash-hub.com/packages/demultiplex)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fdemultiplex-blue.svg)](https://github.com/viash-hub/demultiplex)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.4-blue)](https://viash.io)
## Introcuction
This workflow is designed to demultiplex raw RNA-seq sequencing data
from Illumina and Element Biosciences sequencers.
The workflow is built in a modular fashion, where most of the base
functionality is provided by components from
[`biobox`](https://www.viash-hub.com/packages/biobox/latest)
supplemented by custom base components and workflow components in this
package. Each of these components can be used independently as
stand-alone modules with a standardized interface.
The full workflow can be run in two ways:
1. Run the [main
workflow](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex)
containing the main functionality.
2. Run the [(opinianated)
`runner`](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/runner)
where a number of choices (input/output structure and location) have
been made.
## Workflow Overview
The workflow executes the following steps:
1. Unpacking the input data (when a TAR archive is provided)
2. Run `bclconvert` or `bases2fastq`
3. Run `falco` and convert Illumina InterOp information to csv
4. Run `multiqc` to generate a report
## Example usage
Two variants of the same workflow are provided, depending on the
flexibility in the ouput structure required:
- The `runner` workflow provides a predifined output structure. It
requires the minimal amount of parameters to be provided, at the cost
of being less flexible. It is located at
`target/nextflow/runner/main.nf`
- The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`)
allows for more fine-grained tuning, but required more parameters to
be provided.
### Test data
We have provided test data at
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
(Illumina), but please feel free to bring your own. The URL of the test
data can be provided as-is to the workflow, or you can download
everything and specify a local path.
The input data should follow the structure of either Illumina or Element
Biosciences sequencers. The workflow will automatically detect which
demultiplexer to use (`bclconvert` or `bases2fastq`) based on the
presence of either `SampleSheet.csv` or `RunParameters.xml` in the input
directory. Demultiplexer can also be set explicitly using the
`--demultiplexer` parameter.
### Setup
In order to use the workflows in this package, youll need to do the
following:
- Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
- Install a nextflow compatible executor. This workflow provides a
profile for [docker](https://docs.docker.com/get-started/).
### Run from Viash Hub
1. Open [Viash Hub](https://www.viash-hub.com) and browse to the
[demultiplex
component](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex).
Press the Launch button and follow the instructions.
![](assets/demultiplex-launch-small.png)
2. We will start an example run and set profile to `docker`.
![](assets/demultiplex-launch-parameters-1.png)
3. In the next step, we provide the paramters as follows and leave the
rest as defalut:
- `input`:
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
![](assets/demultiplex-launch-parameters-2.png)
Press the Launch button at the end to get the instructions on how to
run the workflow from the CLI.
### Run using NF-Tower / Seqera Cloud
Its possible to run the workflow directly from [Seqera
Cloud](https://cloud.seqera.io). The necessary [Nextflow schema
file](https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/)
has been built and provided with the workflows in order to use the
form-based input.
1. Select the option to run the workflow using Seqera Cloud. You will
need to create an API token for your account. Once this token is
filled in in the corresponding field, we will get the option to
select a Workspace and a Compute environment.
![](assets/demultiplex-launch-parameters-3.png)
2. Provide the parameters similar to the previous step.
3. In the next screen, pressing the Launch button will actually start
the workflow on Seqera Cloud. A message is shown when the submit was
successful.
![](assets/demultiplex-launch-parameters-4.png)
### Setting up SCM
In order to let nextflow use the viash-hub workflows, you need to setup
a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration)
file. This can be done once by creating `$HOME/.nextflow/scm` and adding
the following:
providers {
vsh {
platform = 'gitlab'
server = "packages.viash-hub.com"
}
}
Alternatively, a custom location for the SCM file can be specified using
the `NXF_SCM_FILE` environment variable.
You can check if everything is working by getting the `--help` for a
workflow:
``` bash
nextflow run \
vsh/demultiplex \
-r v0.3.11 \
--help
```
### Run from the CLI
Running from the CLI directly without using Viash hub is possible as
well. The easiest is to use the integrated help functionality, for
instance using the following:
``` bash
nextflow run vsh/demultiplex \
-revision v0.3.11 \
-main-script target/nextflow/workflows/runner/main.nf \
--help
```
Having this project available locally, you can run the following
command:
``` bash
nextflow run vsh/demultiplex \
-r v0.3.11 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
--demultiplexer bclconvert \
--skip_copycomplete_check \
--publish_dir example_output/ \
-profile docker \
-c src/config/labels.config
```
### (Optional) Resource usage tuning
Nextflows labels can be used to specify the amount of resources a
process can use. This workflow uses the following labels for CPU and
memory:
- `verylowmem`, `lowmem`, `midmem`, `highmem`
- `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
The defaults for these labels can be found at
`src/config/labels.config`. Nextflow checks that the specified resources
for a process do not exceed what is available on the machine and will
not start if it does. Create your own config file to tune the labels to
your needs, for example:
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }
withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }
When starting nextflow using the CLI, you can use `-c` to provide the
file to nextflow and overwrite the defaults.
## Acknowledgements
Developed in collaboration with Data Intuitive and Open Analytics.

191
README.qmd Normal file
View File

@@ -0,0 +1,191 @@
---
format: gfm
---
```{r setup, include=FALSE}
project <- yaml::read_yaml("_viash.yaml")
license <- paste0(project$links$repository, "/blob/main/LICENSE")
```
# Demultiplex.vsh
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data. Currently data from Illumina and Element Biosciences sequencers are supported.
[![ViashHub](https://img.shields.io/badge/ViashHub-demultiplex-7a4baa.svg)](https://web.viash-hub.com/packages/demultiplex)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fdemultiplex-blue.svg)](https://github.com/viash-hub/demultiplex)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/demultiplex.svg)](https://github.com/viash-hub/demultiplex/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.4-blue)](https://viash.io)
## Introcuction
This workflow is designed to demultiplex raw RNA-seq sequencing data from Illumina and Element Biosciences sequencers.
The workflow is built in a modular fashion, where most of the base functionality is provided by components from
[`biobox`](https://www.viash-hub.com/packages/biobox/latest) supplemented by custom base components and workflow components in this package. Each of these components can be used independently as stand-alone modules with a
standardized interface.
The full workflow can be run in two ways:
1. Run the [main
workflow](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex)
containing the main functionality.
2. Run the [(opinianated)
`runner`](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/runner)
where a number of choices (input/output structure and location) have
been made.
## Workflow Overview
The workflow executes the following steps:
1. Unpacking the input data (when a TAR archive is provided)
2. Run `bclconvert` or `bases2fastq`
3. Run `falco` and convert Illumina InterOp information to csv
4. Run `multiqc` to generate a report
## Example usage
Two variants of the same workflow are provided, depending on the flexibility in the ouput structure required:
* The `runner` workflow provides a predifined output structure. It requires the minimal amount of parameters to be provided, at the cost of being less flexible. It is located at `target/nextflow/runner/main.nf`
* The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`) allows for more fine-grained tuning, but required more parameters to be provided.
### Test data
We have provided test data at `gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2` (Illumina), but please feel free to bring your own. The URL of the test data can be provided as-is to the workflow, or you can download everything and specify a local path.
The input data should follow the structure of either Illumina or Element Biosciences sequencers. The workflow will automatically detect which demultiplexer to use (`bclconvert` or `bases2fastq`) based on the
presence of either `SampleSheet.csv` or `RunParameters.xml` in the input directory. Demultiplexer can also be set explicitly using the `--demultiplexer` parameter.
### Setup
In order to use the workflows in this package, you'll need to do the following:
* Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
* Install a nextflow compatible executor. This workflow provides a profile for [docker](https://docs.docker.com/get-started/).
### Run from Viash Hub
1. Open [Viash Hub](https://www.viash-hub.com) and browse to the [demultiplex
component](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex).
Press the Launch button and follow the instructions.
![](assets/demultiplex-launch-small.png)
2. We will start an example run and set profile to `docker`.
![](assets/demultiplex-launch-parameters-1.png)
3. In the next step, we provide the paramters as follows and leave the rest as defalut:
- `input`:
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
![](assets/demultiplex-launch-parameters-2.png)
Press the Launch button at the end to get the instructions on how to
run the workflow from the CLI.
### Run using NF-Tower / Seqera Cloud
Its possible to run the workflow directly from [Seqera
Cloud](https://cloud.seqera.io). The necessary [Nextflow schema
file](https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/)
has been built and provided with the workflows in order to use the
form-based input.
1. Select the option to run the workflow using Seqera Cloud. You
will need to create an API token for your account. Once this token is
filled in in the corresponding field, we will get the option to select
a Workspace and a Compute environment.
![](assets/demultiplex-launch-parameters-3.png)
2. Provide the parameters similar to the previous step.
3. In the next screen, pressing the Launch button will actually start the
workflow on Seqera Cloud. A message is shown when the submit was
successful.
![](assets/demultiplex-launch-parameters-4.png)
### Setting up SCM
In order to let nextflow use the viash-hub workflows, you need to setup a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration) file. This can be done once by creating `$HOME/.nextflow/scm` and adding the following:
```
providers {
vsh {
platform = 'gitlab'
server = "packages.viash-hub.com"
}
}
```
Alternatively, a custom location for the SCM file can be specified using the `NXF_SCM_FILE` environment variable.
You can check if everything is working by getting the `--help` for a workflow:
```bash
nextflow run \
vsh/demultiplex \
-r v0.3.11 \
--help
```
### Run from the CLI
Running from the CLI directly without using Viash hub is possible as well. The
easiest is to use the integrated help functionality, for instance
using the following:
``` bash
nextflow run vsh/demultiplex \
-revision v0.3.11 \
-main-script target/nextflow/workflows/runner/main.nf \
--help
```
Having this project available locally, you can run the following command:
```bash
nextflow run vsh/demultiplex \
-r v0.3.11 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
--demultiplexer bclconvert \
--skip_copycomplete_check \
--publish_dir example_output/ \
-profile docker \
-c src/config/labels.config
```
### (Optional) Resource usage tuning
Nextflow's labels can be used to specify the amount of resources a process can use. This workflow uses the following labels for CPU and memory:
* `verylowmem`, `lowmem`, `midmem`, `highmem`
* `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
The defaults for these labels can be found at `src/config/labels.config`. Nextflow checks that the specified resources for a process do not exceed what is available on the machine and will not start if it does. Create your own config file to tune the labels to your needs, for example:
```
// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }
withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }
```
When starting nextflow using the CLI, you can use `-c` to provide the file to nextflow and overwrite the defaults.
## Acknowledgements
Developed in collaboration with Data Intuitive and Open Analytics.

View File

@@ -1,5 +1,5 @@
name: demultiplex
version: v0.3.4
version: v0.3.12
description: |
Demultiplexing pipeline
license: MIT
@@ -9,13 +9,13 @@ links:
repository: https://github.com/viash-hub/demultiplex
info:
test_resources:
- path: gs://viash-hub-test-data/demultiplex/v2/
- path: gs://viash-hub-resources/demultiplex/v3
dest: testData
viash_version: 0.9.0
viash_version: 0.9.4
config_mods: |
.requirements.commands := ['ps']
.requirements.commands += ['ps']
.runners[.type == 'nextflow'].directives.tag := '$id'
.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}
.runners[.type == 'nextflow'].config.script := 'includeConfig("nextflow_labels.config")'

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

View File

@@ -16,6 +16,9 @@ argument_groups:
type: file
required: false
multiple: true
- name: "--falco_dir"
type: file
required: true
- name: Output arguments
arguments:
- name: --output_forward
@@ -28,6 +31,11 @@ argument_groups:
direction: output
multiple: true
required: false
- name: "--output_falco"
type: file
direction: output
required: true
multiple: true
resources:
- type: nextflow_script
path: main.nf

View File

@@ -13,10 +13,12 @@ workflow run_wf {
// Gather the following state for all samples
def forward_fastqs = states.collect{it.forward_input}.flatten()
def reverse_fastqs = states.collect{it.reverse_input}.findAll{it != null}.flatten()
def falco_dirs = states.collect{it.falco_dir}
def resultState = [
"output_forward": forward_fastqs,
"output_reverse": reverse_fastqs,
"output_falco": falco_dirs,
// The join ID is the same across all samples from the same run
"_meta": ["join_id": states[0]._meta.join_id]
]

View File

@@ -30,6 +30,17 @@ resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
test_resources:
- type: nextflow_script
path: test.nf
entrypoint: test_gather_and_validate
- type: nextflow_script
path: test.nf
entrypoint: test_undetermined_empty
- type: nextflow_script
path: test.nf
entrypoint: test_without_index
- path: test_data
runners:
- type: nextflow

View File

@@ -0,0 +1,33 @@
#!/usr/bin/env bash
set -eo pipefail
# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)
# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"
viash ns build --setup cb -q gather_fastqs_and_validate
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_gather_and_validate \
-c src/config/labels.config \
-resume
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_undetermined_empty \
-c src/config/labels.config \
-resume
nextflow run . \
-main-script src/dataflow/gather_fastqs_and_validate/test.nf \
-profile docker,no_publish,local \
-entry test_without_index \
-c src/config/labels.config \
-resume

View File

@@ -1,3 +1,26 @@
import java.util.zip.GZIPInputStream
import java.nio.file.Files
import java.io.BufferedInputStream
def is_empty(file_to_check){
/*
Checks if a file has content
*/
if (file_to_check.size() == 0) {
return true
}
def input_stream = Files.newInputStream(file_to_check)
def gzInputStream
try {
gzInputStream = new GZIPInputStream(new BufferedInputStream(input_stream))
} catch (java.io.EOFException ex) {
// This is not a gzipfile...
return false
}
def read_one_byte = gzInputStream.read()
return read_one_byte == -1
}
workflow run_wf {
take:
input_ch
@@ -10,15 +33,16 @@ workflow run_wf {
def sample_sheet = state.sample_sheet
def start_parsing = false
def sample_id_column_index = null
def samples = ["Undetermined"]
def undetermined_sample_name = "Undetermined"
def samples = [undetermined_sample_name]
def original_id = id
// Parse sample sheet for sample IDs
println "Processing run information file ${sample_sheet}"
csv_lines = sample_sheet.splitCsv(header: false, sep: ',')
csv_lines.any { csv_items ->
if (csv_items.isEmpty()) {
// skip empty line
if (csv_items.isEmpty() || csv_items[0].startsWith("#")) {
// skip empty or commented line
return
}
def possible_header = csv_items[0]
@@ -30,8 +54,8 @@ workflow run_wf {
return true
}
// [Data], [BCLConvert_Data] for illumina
// [Samples] for Element Biosciences
if (header in ["Data", "Samples", "BCLConvert_Data"]) {
// [Samples] or sometimes [SAMPLES] for Element Biosciences
if (header.toLowerCase() in ["data", "samples", "bclconvert_data"]) {
println "Found header [${header}], start parsing."
start_parsing = true
return
@@ -69,8 +93,9 @@ workflow run_wf {
processed_samples = samples.collect { sample_id ->
def forward_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R1_(\d+)\.fastq\.gz$/
def reverse_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R2_(\d+)\.fastq\.gz$/
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}
// Sort is needed here because multiple lanes (_L00*_) might be present and they need to be in the same order in both lists
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}.sort()
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}.sort()
assert forward_fastq && !forward_fastq.isEmpty(): "No forward fastq files were found for sample ${sample_id}. " +
"All fastq files in directory: ${allfastqs.collect{it.name}}"
assert (reverse_fastq.isEmpty() || (forward_fastq.size() == reverse_fastq.size())):
@@ -78,6 +103,9 @@ workflow run_wf {
"Found forward: ${forward_fastq} and reverse: ${reverse_fastq}."
println "Found ${forward_fastq.size()} forward and ${reverse_fastq.size()} reverse " +
"fastq files for sample ${sample_id}"
assert sample_id == undetermined_sample_name || (forward_fastq.every{!is_empty(it)} && reverse_fastq.every{!is_empty(it)}):
"A fastq file for sample '${sample_id}' appears to be empty!"
def fastqs_state = [
"fastq_forward": forward_fastq,
"fastq_reverse": reverse_fastq,

View File

@@ -0,0 +1,10 @@
manifest {
nextflowVersion = '!>=20.12.1-edge'
}
params {
rootDir = java.nio.file.Paths.get("$projectDir/../../../").toAbsolutePath().normalize().toString()
}
// include common settings
includeConfig("${params.rootDir}/src/config/labels.config")

View File

@@ -0,0 +1,86 @@
nextflow.enable.dsl=2
include { gather_fastqs_and_validate } from params.rootDir + "/target/nextflow/dataflow/gather_fastqs_and_validate/main.nf"
workflow test_gather_and_validate {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 3: "Expected three fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
def second_pair = it[1][1]
assert second_pair.fastq_forward.collect{it.name} == ["sample1_S1_L001_R1_001.fastq.gz", "sample1_S1_L002_R1_001.fastq.gz"]
assert second_pair.fastq_reverse.collect{it.name} == ["sample1_S1_L001_R2_001.fastq.gz", "sample1_S1_L002_R2_001.fastq.gz"]
def undetermined_pair = it[2][1]
assert undetermined_pair.fastq_forward.collect{it.name} == ["sample2_S1_L001_R1_001.fastq.gz"]
assert undetermined_pair.fastq_reverse.collect{it.name} == ["sample2_S1_L001_R2_001.fastq.gz"]
}
}
workflow test_undetermined_empty {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs_undetermined_empty",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 3: "Expected three fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
def second_pair = it[1][1]
assert second_pair.fastq_forward.collect{it.name} == ["sample1_S1_L001_R1_001.fastq.gz", "sample1_S1_L002_R1_001.fastq.gz"]
assert second_pair.fastq_reverse.collect{it.name} == ["sample1_S1_L001_R2_001.fastq.gz", "sample1_S1_L002_R2_001.fastq.gz"]
def undetermined_pair = it[2][1]
assert undetermined_pair.fastq_forward.collect{it.name} == ["sample2_S1_L001_R1_001.fastq.gz"]
assert undetermined_pair.fastq_reverse.collect{it.name} == ["sample2_S1_L001_R2_001.fastq.gz"]
}
}
workflow test_without_index {
output_ch = Channel.fromList([
[
id: "run1",
input: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/fastqs_undetermined_empty",
sample_sheet: params.rootDir + "/src/dataflow/gather_fastqs_and_validate/test_data/samplesheet_no_index.csv",
]
])
| map { state -> [state.id, state]}
| gather_fastqs_and_validate.run(toState: ["fastq_forward", "fastq_reverse"])
output_ch
| toSortedList{a, b -> a[0] <=> b[0]}
| view {"Output: $it"}
| map {
assert it.size() == 2: "Expected two fastq pairs"
def first_pair = it[0][1]
assert first_pair.fastq_forward.collect{it.name} == ["Undetermined_S1_R1_001.fastq.gz"]
assert first_pair.fastq_reverse.collect{it.name} == ["Undetermined_S1_R2_001.fastq.gz"]
}
}

View File

@@ -0,0 +1,11 @@
[foo],,,,
[somecontent],,,,
bar,lorem,,,
,,,,
# Comment
[BCLConvert_Data],,,,
Sample_ID,Index,Index2,,
sample1,GTAGCCCTGT,GAGCATCTAT,,
sample2,TCGGCTCTAC,CCGATGGTCT,,
,,,,
1 [foo],,,,
2 [somecontent],,,,
3 bar,lorem,,,
4 ,,,,
5 # Comment
6 [BCLConvert_Data],,,,
7 Sample_ID,Index,Index2,,
8 sample1,GTAGCCCTGT,GAGCATCTAT,,
9 sample2,TCGGCTCTAC,CCGATGGTCT,,
10 ,,,,

View File

@@ -0,0 +1,10 @@
[foo],,,,
[somecontent],,,,
bar,lorem,,,
,,,,
# Comment
[BCLConvert_Data],,,,
Sample_ID,Index,Index2,,
sample1,,,,
,,,,
1 [foo],,,,
2 [somecontent],,,,
3 bar,lorem,,,
4 ,,,,
5 # Comment
6 [BCLConvert_Data],,,,
7 Sample_ID,Index,Index2,,
8 sample1,,,,
9 ,,,,

View File

@@ -41,6 +41,7 @@ argument_groups:
type: file
direction: output
required: false
multiple: true
default: "$id/qc/fastqc"
- name: "--output_multiqc"
description: Directory to write falco output to
@@ -53,6 +54,19 @@ argument_groups:
direction: "output"
required: true
default: "$id/run_information.csv"
- name: "--demultiplexer_logs"
type: file
direction: output
required: true
default: "$id/demultiplexer_logs"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf
@@ -87,7 +101,7 @@ repositories:
- name: bb
type: vsh
repo: biobox
tag: v0.3.0
tag: v0.3.1
runners:
- type: nextflow

View File

@@ -94,6 +94,10 @@ workflow run_wf {
// step based on the run dir, not the InterOp dir.
def interop_dir = state.input.resolve("InterOp")
assert interop_dir.isDirectory(): "Expected InterOp directory to be present."
def copycomplete_file = state.input.resolve("CopyComplete.txt")
assert (copycomplete_file.isFile() || state.skip_copycomplete_check):
"'CopyComplete.txt' file was not found!"
}
def resultState = state + newState
@@ -120,14 +124,15 @@ workflow run_wf {
bcl_input_directory: state.input,
sample_sheet: state.run_information,
output_directory: state.output,
reports: "reports",
logs: "logs"
reports: state.demultiplexer_logs,
logs: state.demultiplexer_logs,
]
},
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.reports,
]
def newState = state + toAdd
return newState
@@ -137,11 +142,15 @@ workflow run_wf {
| bases2fastq.run(
runIf: {id, state -> state.demultiplexer in ["bases2fastq"]},
directives: [label: ["highmem", "midcpu"]],
fromState: [
"analysis_directory": "input",
"run_manifest": "run_information",
"output_directory": "output",
],
fromState: { id, state ->
[
"analysis_directory": state.input,
"run_manifest": state.run_information,
"output_directory": state.output,
"report": state.demultiplexer_logs + "/report.html",
"logs": state.demultiplexer_logs,
]
},
args: [
"no_projects": true, // Do not put output files in a subfolder for project
//"split_lanes": true,
@@ -152,6 +161,8 @@ workflow run_wf {
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.logs,
]
def newState = state + toAdd
return newState
@@ -169,28 +180,12 @@ workflow run_wf {
)
output_ch = samples_ch
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
]
)
| falco.run(
directives: [label: ["lowcpu", "lowmem"]],
directives: [label: ["verylowcpu", "lowmem"]],
fromState: {id, state ->
reverse_fastqs_list = state.reverse_fastqs ? state.reverse_fastqs : []
[
"input": state.forward_fastqs + reverse_fastqs_list,
"outdir": "${state.output_falco}",
"input": [state.fastq_forward, state.fastq_reverse],
"outdir": "$id/qc/falco",
"summary_filename": null,
"report_filename": null,
"data_filename": null,
@@ -200,11 +195,28 @@ workflow run_wf {
state + [ "output_falco" : result.outdir ]
}
)
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
"falco_dir": state.output_falco,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
"output_falco": "output_falco",
]
)
| multiqc.run(
directives: [label: ["midcpu", "midmem"]],
fromState: {id, state ->
def new_state = [
"input": [state.output_falco],
"input": state.output_falco,
"output_report": state.output_multiqc,
"cl_config": 'sp: {fastqc/data: {fn: "*_fastqc_data.txt"}}'
]
@@ -220,6 +232,7 @@ workflow run_wf {
state + [ "output_multiqc" : result.output_report ]
}
)
| setState(
[
//"_meta": "_meta",
@@ -227,6 +240,7 @@ workflow run_wf {
"output_falco": "output_falco",
"output_multiqc": "output_multiqc",
"output_run_information": "run_information",
"demultiplexer_logs": "demultiplexer_logs"
]
)

View File

@@ -23,9 +23,19 @@ workflow test_illumina {
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
"Output: $output"
}
event_count_ch = output_ch
| toSortedList()
| map { state ->
assert state.size() == 1 : "Expected one event in the output channel"
}
assert_ch = output_ch
| map {id, state ->
assert state.output.isDirectory(): "Expected bclconvert output to be a directory"
assert state.output_falco.isDirectory(): "Expected falco output to be a directory"
state.output_falco.each{
assert it.isDirectory(): "Expected falco output to be a directory"
}
assert state.output_multiqc.isFile(): "Expected multiQC output to be a file"
fastq_files = state.output.listFiles().collect{it.name}
assert ["Undetermined_S0_L001_R1_001.fastq.gz", "Sample23_S3_L001_R1_001.fastq.gz",
@@ -55,6 +65,21 @@ workflow test_illumina {
|1,sampletest,PatientSample,UDP0004,ATTCCATAAG,TGCCTGGTGG
|""".stripMargin()
assert state.output_run_information.text.replaceAll("\r\n", "\n") == expected_run_information
println "ID: ${id}"
println "State: ${state}"
assert state.demultiplexer_logs.isDirectory():
"Expected BCL Convert reports to be a directory"
def logs_files = state.demultiplexer_logs.listFiles()
println "Logs files: ${logs_files}"
assert logs_files.size() > 0: "Expected BCL Convert logs dir to contain files"
assert logs_files.find { it.name == "Demultiplex_Stats.csv" }:
"Expected to find BCL Convert Demultiplex_Stats.csv"
assert logs_files.find { it.name == "Logs" }:
"Expected to find BCL Convert Logs directory"
}
}
@@ -76,7 +101,16 @@ workflow test_bases2fastq {
}
| map {id, state ->
assert state.output.isDirectory(): "Expected bases2fastq output to be a directory"
assert state.output_falco.isDirectory(): "Expected falco output to be a directory"
state.output_falco.each{assert it.isDirectory(): "Expected falco output to be a directory"}
assert state.output_multiqc.isFile(): "Expected multiQC output to be a file"
def logs_files = state.demultiplexer_logs.listFiles()
println "Logs files: ${logs_files}"
assert logs_files.size() > 0: "Expected bases2fastq logs dir to contain files"
assert logs_files.find { it.name == "report.html" } != null:
"Expected to find bases2fastq report.html"
assert logs_files.find { it.name == "info" }:
"Expected to find bases2fastq info directory"
}
}

View File

@@ -18,10 +18,14 @@ argument_groups:
direction: output
required: true
requirements:
- commands: ["summary", "index-summary"]
commands: ["summary", "index-summary"]
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- path: /testData/iseq-DI
engines:
- type: docker
image: debian:stable-slim
@@ -38,4 +42,4 @@ engines:
runners:
- type: executable
- type: nextflow
- type: nextflow

View File

@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
echo ">>> Run simple execution"
./$meta_functionality_name \
--input "$meta_resources_dir/iseq-DI" \
--output_run_summary "$TMPDIR/run_summary.csv" \
--output_index_summary "$TMPDIR/index_summary.csv"

View File

@@ -3,9 +3,9 @@
set -eo pipefail
declare -A input_output_mapping=(["par_input"]="par_output"
["par_input_falco"]="par_output_falco"
["par_input_multiqc"]="par_output_multiqc"
["par_input_run_information"]="par_output_run_information"
["par_input_demultiplexer_logs"]="par_output_demultiplexer_logs"
)
for input_argument_name in "${!input_output_mapping[@]}"
@@ -23,4 +23,12 @@ do
echo "Output files for $output_location:"
ls "$output_location"
done
echo "Grouping output from $par_input_falco into $par_output_falco"
mkdir -p "$par_output_falco"
IFS=";" read -ra falco_inputs <<< $par_input_falco
for falco_dir in "${falco_inputs[@]}"; do
echo "Copying contents of $falco_dir"
find -H -D exec "$falco_dir" -type f -maxdepth 1 -exec cp -t "$par_output_falco" {} +
done

View File

@@ -12,6 +12,7 @@ argument_groups:
description: Directory to write falco output to
type: file
required: true
multiple: true
- name: "--input_multiqc"
description: Location where to write the MultiQC report to.
type: file
@@ -20,6 +21,9 @@ argument_groups:
description: "Location where to write the run information to."
type: file
required: true
- name: "--input_demultiplexer_logs"
type: file
required: true
- name: Output arguments
arguments:
- name: --output
@@ -38,6 +42,10 @@ argument_groups:
type: file
direction: output
default: run_information.csv
- name: "--output_demultiplexer_logs"
type: file
direction: output
default: "demultiplexer_logs"
resources:
- type: bash_script

View File

@@ -49,7 +49,17 @@ argument_groups:
type: file
direction: output
default: "qc/multiqc_report.html"
- name: "--demultiplexer_logs"
type: file
direction: output
default: "demultiplexer_logs"
- name: "Other arguments"
arguments:
- name: --skip_copycomplete_check
type: boolean_true
description: |
Disable the check for the presence of a "CopyComplete.txt" file in input
directory in case of Illumina data.
resources:
- type: nextflow_script
path: main.nf

View File

@@ -24,9 +24,11 @@ workflow run_wf {
"input": state.input,
"run_information": state.run_information,
"demultiplexer": state.demultiplexer,
"output": "fastq",
"output_falco": "qc/fastqc",
"output_multiqc": "qc/multiqc_report.html",
"skip_copycomplete_check": state.skip_copycomplete_check,
"output": "$id/fastq",
"output_falco": "$id/qc/fastqc",
"output_multiqc": "$id/qc/multiqc_report.html",
"demultiplexer_logs": "$id/demultiplexer_logs",
]
if (state.run_information) {
state_to_pass += ["output_run_information": state.run_information.getName()]
@@ -47,6 +49,7 @@ workflow run_wf {
def falco_output_1 = (id2 == "run") ? state.falco_output : "${id2}/" + state.falco_output
def multiqc_output_1 = (id2 == "run") ? state.multiqc_output : "${id2}/" + state.multiqc_output
def run_information_output_1 = (id2 == "run") ? "${state.output_run_information.getName()}" : "${id2}/${state.output_run_information.getName()}"
def demultiplexer_logs_output = (id2 == "run") ? state.demultiplexer_logs : "${id2}/${state.demultiplexer_logs.getName()}"
if (id2 == "run") {
println("Publising to ${params.publish_dir}")
@@ -59,10 +62,12 @@ workflow run_wf {
input_falco: state.output_falco,
input_multiqc: state.output_multiqc,
input_run_information: state.output_run_information,
input_demultiplexer_logs: state.demultiplexer_logs,
output: fastq_output_1,
output_falco: falco_output_1,
output_multiqc: multiqc_output_1,
output_run_information: run_information_output_1,
output_demultiplexer_logs: demultiplexer_logs_output,
]
},
toState: { id, result, state -> [:] },

View File

@@ -1,5 +1,5 @@
name: "bases2fastq"
version: "v0.3.0"
version: "v0.3.1"
authors:
- name: "Dries Schaumont"
roles:
@@ -285,6 +285,9 @@ test_resources:
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -367,7 +370,7 @@ engines:
id: "docker"
image: "elembio/bases2fastq:2.1.0"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
target_tag: "v0.3.1"
namespace_separator: "/"
setup:
- type: "apt"
@@ -393,23 +396,34 @@ build_info:
engine: "docker|native"
output: "target/nextflow/bases2fastq"
executable: "target/nextflow/bases2fastq/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
viash_version: "0.9.4"
git_commit: "98a5f3cc745525a65c10263d25cf414eb1093223"
git_remote: "https://github.com/viash-hub/biobox"
git_tag: "v0.3.0-8-g98a5f3c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
version: "v0.3.1"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.1'"
keywords:
- "bioinformatics"
- "modules"

View File

@@ -1,6 +1,6 @@
// bases2fastq v0.3.0
// bases2fastq v0.3.1
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -85,64 +85,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -154,10 +146,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -176,7 +171,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -195,15 +190,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -216,6 +204,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1669,6 +1667,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1726,8 +1880,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1740,7 +1892,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1752,33 +1904,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1809,13 +1945,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1832,7 +1965,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1863,13 +1996,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1877,18 +2006,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2562,7 +2690,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2719,12 +2848,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2737,19 +2890,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2758,23 +2986,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2808,7 +3034,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "bases2fastq",
"version" : "v0.3.0",
"version" : "v0.3.1",
"authors" : [
{
"name" : "Dries Schaumont",
@@ -3149,6 +3375,10 @@ meta = [
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3245,7 +3475,7 @@ meta = [
"id" : "docker",
"image" : "elembio/bases2fastq:2.1.0",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.0",
"target_tag" : "v0.3.1",
"namespace_separator" : "/",
"setup" : [
{
@@ -3283,23 +3513,24 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/bases2fastq",
"viash_version" : "0.9.0",
"git_commit" : "d86bd5cf62104af02caa852aacd352b1aa97ed60",
"git_remote" : "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox",
"git_tag" : "v0.2.0-29-gd86bd5c"
"viash_version" : "0.9.4",
"git_commit" : "98a5f3cc745525a65c10263d25cf414eb1093223",
"git_remote" : "https://github.com/viash-hub/biobox",
"git_tag" : "v0.3.0-8-g98a5f3c"
},
"package_config" : {
"name" : "biobox",
"version" : "v0.3.0",
"description" : "A collection of bioinformatics tools for working with sequence data.\n",
"viash_version" : "0.9.0",
"version" : "v0.3.1",
"summary" : "A curated collection of high-quality, standalone bioinformatics components built with [Viash](https://viash.io).\n",
"description" : "`biobox` offers a suite of reliable bioinformatics components, similar to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio), but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes **reusability**, **reproducibility**, and adherence to **best practices**. Key features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:** Run components directly via the command line or seamlessly integrate them into Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation for components and parameters.\n * Full exposure of underlying tool arguments.\n * Containerized (Docker) for dependency management and reproducibility.\n * Unit tested for verified functionality.\n",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.0'"
".engines[.type == 'docker'].target_tag := 'v0.3.1'"
],
"keywords" : [
"bioinformatics",
@@ -3816,7 +4047,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3830,6 +4061,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3843,7 +4095,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/biobox/bases2fastq",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'bases2fastq'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
version = 'v0.3.1'
description = 'Bases2Fastq demultiplexes sequencing data generated by Element Biosciences instruments and converts base calls into FASTQ files.\n'
author = 'Dries Schaumont'
}

View File

@@ -310,10 +310,10 @@
"output_directory": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Location to save output fastqs",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Location to save output fastqs"
"description": "Type: `file`, required, default: `$id.$key.output_directory`, example: `fastq_dir`. Location to save output fastqs",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory`, example: `fastq_dir`. Location to save output fastqs"
,
"default":"$id.$key.output_directory.output_directory"
"default":"$id.$key.output_directory"
}
@@ -321,10 +321,10 @@
"report": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.report.report`. Output location for the HTML report",
"help_text": "Type: `file`, default: `$id.$key.report.report`. Output location for the HTML report"
"description": "Type: `file`, default: `$id.$key.report`. Output location for the HTML report",
"help_text": "Type: `file`, default: `$id.$key.report`. Output location for the HTML report"
,
"default":"$id.$key.report.report"
"default":"$id.$key.report"
}
@@ -332,10 +332,10 @@
"logs": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Directory containing log files",
"help_text": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Directory containing log files"
"description": "Type: `file`, default: `$id.$key.logs`, example: `logs_dir`. Directory containing log files",
"help_text": "Type: `file`, default: `$id.$key.logs`, example: `logs_dir`. Directory containing log files"
,
"default":"$id.$key.logs.logs"
"default":"$id.$key.logs"
}

View File

@@ -1,5 +1,5 @@
name: "bcl_convert"
version: "v0.3.0"
version: "v0.3.1"
authors:
- name: "Toni Verbeiren"
roles:
@@ -290,6 +290,16 @@ argument_groups:
direction: "output"
multiple: false
multiple_sep: ";"
- type: "boolean"
name: "--force"
description: "Allow destination directory to already exist and overwrite files.\n"
info: null
example:
- true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
@@ -303,6 +313,9 @@ test_resources:
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -386,7 +399,7 @@ engines:
id: "docker"
image: "debian:trixie-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
target_tag: "v0.3.1"
namespace_separator: "/"
setup:
- type: "apt"
@@ -417,23 +430,34 @@ build_info:
engine: "docker|native"
output: "target/nextflow/bcl_convert"
executable: "target/nextflow/bcl_convert/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
viash_version: "0.9.4"
git_commit: "98a5f3cc745525a65c10263d25cf414eb1093223"
git_remote: "https://github.com/viash-hub/biobox"
git_tag: "v0.3.0-8-g98a5f3c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
version: "v0.3.1"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.1'"
keywords:
- "bioinformatics"
- "modules"

View File

@@ -1,6 +1,6 @@
// bcl_convert v0.3.0
// bcl_convert v0.3.1
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -86,64 +86,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -155,10 +147,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -177,7 +172,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -196,15 +191,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -217,6 +205,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1670,6 +1668,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1727,8 +1881,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1741,7 +1893,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1753,33 +1905,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1810,13 +1946,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1833,7 +1966,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1864,13 +1997,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1878,18 +2007,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2563,7 +2691,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2720,12 +2849,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2738,19 +2891,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2759,23 +2987,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2809,7 +3035,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "bcl_convert",
"version" : "v0.3.0",
"version" : "v0.3.1",
"authors" : [
{
"name" : "Toni Verbeiren",
@@ -3172,6 +3398,18 @@ meta = [
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
},
{
"type" : "boolean",
"name" : "--force",
"description" : "Allow destination directory to already exist and overwrite files.\n",
"example" : [
true
],
"required" : false,
"direction" : "input",
"multiple" : false,
"multiple_sep" : ";"
}
]
}
@@ -3192,6 +3430,10 @@ meta = [
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3289,7 +3531,7 @@ meta = [
"id" : "docker",
"image" : "debian:trixie-slim",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.0",
"target_tag" : "v0.3.1",
"namespace_separator" : "/",
"setup" : [
{
@@ -3328,23 +3570,24 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/bcl_convert",
"viash_version" : "0.9.0",
"git_commit" : "d86bd5cf62104af02caa852aacd352b1aa97ed60",
"git_remote" : "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox",
"git_tag" : "v0.2.0-29-gd86bd5c"
"viash_version" : "0.9.4",
"git_commit" : "98a5f3cc745525a65c10263d25cf414eb1093223",
"git_remote" : "https://github.com/viash-hub/biobox",
"git_tag" : "v0.3.0-8-g98a5f3c"
},
"package_config" : {
"name" : "biobox",
"version" : "v0.3.0",
"description" : "A collection of bioinformatics tools for working with sequence data.\n",
"viash_version" : "0.9.0",
"version" : "v0.3.1",
"summary" : "A curated collection of high-quality, standalone bioinformatics components built with [Viash](https://viash.io).\n",
"description" : "`biobox` offers a suite of reliable bioinformatics components, similar to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio), but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes **reusability**, **reproducibility**, and adherence to **best practices**. Key features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:** Run components directly via the command line or seamlessly integrate them into Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation for components and parameters.\n * Full exposure of underlying tool arguments.\n * Containerized (Docker) for dependency management and reproducibility.\n * Unit tested for verified functionality.\n",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.0'"
".engines[.type == 'docker'].target_tag := 'v0.3.1'"
],
"keywords" : [
"bioinformatics",
@@ -3395,6 +3638,7 @@ $( if [ ! -z ${VIASH_PAR_BCL_SAMPLEPROJECT_SUBDIRECTORIES+x} ]; then echo "${VIA
$( if [ ! -z ${VIASH_PAR_FASTQ_GZIP_COMPRESSION_LEVEL+x} ]; then echo "${VIASH_PAR_FASTQ_GZIP_COMPRESSION_LEVEL}" | sed "s#'#'\\"'\\"'#g;s#.*#par_fastq_gzip_compression_level='&'#" ; else echo "# par_fastq_gzip_compression_level="; fi )
$( if [ ! -z ${VIASH_PAR_REPORTS+x} ]; then echo "${VIASH_PAR_REPORTS}" | sed "s#'#'\\"'\\"'#g;s#.*#par_reports='&'#" ; else echo "# par_reports="; fi )
$( if [ ! -z ${VIASH_PAR_LOGS+x} ]; then echo "${VIASH_PAR_LOGS}" | sed "s#'#'\\"'\\"'#g;s#.*#par_logs='&'#" ; else echo "# par_logs="; fi )
$( if [ ! -z ${VIASH_PAR_FORCE+x} ]; then echo "${VIASH_PAR_FORCE}" | sed "s#'#'\\"'\\"'#g;s#.*#par_force='&'#" ; else echo "# par_force="; fi )
$( if [ ! -z ${VIASH_META_NAME+x} ]; then echo "${VIASH_META_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_name='&'#" ; else echo "# meta_name="; fi )
$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
@@ -3419,9 +3663,13 @@ $( if [ ! -z ${VIASH_META_MEMORY_PIB+x} ]; then echo "${VIASH_META_MEMORY_PIB}"
set -eo pipefail
[[ "\\$par_force" == "false" ]] && unset par_force
\\$(which bcl-convert) \\\\
--bcl-input-directory "\\$par_bcl_input_directory" \\\\
--output-directory "\\$par_output_directory" \\\\
\\${par_force:+--force} \\\\
\\${par_sample_sheet:+ --sample-sheet "\\$par_sample_sheet"} \\\\
\\${par_run_info:+ --run-info "\\$par_run_info"} \\\\
\\${par_bcl_only_lane:+ --bcl-only-lane "\\$par_bcl_only_lane"} \\\\
@@ -3789,7 +4037,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3803,6 +4051,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3816,7 +4085,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/biobox/bcl_convert",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'bcl_convert'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
version = 'v0.3.1'
description = 'Convert bcl files to fastq files using bcl-convert.\nInformation about upgrading from bcl2fastq via\n[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)\nand [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)\n'
author = 'Toni Verbeiren, Dorien Roosen'
}

View File

@@ -237,10 +237,10 @@
"output_directory": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Output directory containig fastq files",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory.output_directory`, example: `fastq_dir`. Output directory containig fastq files"
"description": "Type: `file`, required, default: `$id.$key.output_directory`, example: `fastq_dir`. Output directory containig fastq files",
"help_text": "Type: `file`, required, default: `$id.$key.output_directory`, example: `fastq_dir`. Output directory containig fastq files"
,
"default":"$id.$key.output_directory.output_directory"
"default":"$id.$key.output_directory"
}
@@ -268,10 +268,10 @@
"reports": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.reports.reports`, example: `reports_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.reports.reports`, example: `reports_dir`. Reports directory"
"description": "Type: `file`, default: `$id.$key.reports`, example: `reports_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.reports`, example: `reports_dir`. Reports directory"
,
"default":"$id.$key.reports.reports"
"default":"$id.$key.reports"
}
@@ -279,10 +279,20 @@
"logs": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.logs.logs`, example: `logs_dir`. Reports directory"
"description": "Type: `file`, default: `$id.$key.logs`, example: `logs_dir`. Reports directory",
"help_text": "Type: `file`, default: `$id.$key.logs`, example: `logs_dir`. Reports directory"
,
"default":"$id.$key.logs.logs"
"default":"$id.$key.logs"
}
,
"force": {
"type":
"boolean",
"description": "Type: `boolean`, example: `true`. Allow destination directory to already exist and overwrite files",
"help_text": "Type: `boolean`, example: `true`. Allow destination directory to already exist and overwrite files.\n"
}

View File

@@ -1,5 +1,5 @@
name: "falco"
version: "v0.3.0"
version: "v0.3.1"
authors:
- name: "Toni Verbeiren"
roles:
@@ -203,6 +203,9 @@ test_resources:
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -287,7 +290,7 @@ engines:
id: "docker"
image: "debian:trixie-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
target_tag: "v0.3.1"
namespace_separator: "/"
setup:
- type: "apt"
@@ -316,23 +319,34 @@ build_info:
engine: "docker|native"
output: "target/nextflow/falco"
executable: "target/nextflow/falco/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
viash_version: "0.9.4"
git_commit: "98a5f3cc745525a65c10263d25cf414eb1093223"
git_remote: "https://github.com/viash-hub/biobox"
git_tag: "v0.3.0-8-g98a5f3c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
version: "v0.3.1"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.1'"
keywords:
- "bioinformatics"
- "modules"

View File

@@ -1,6 +1,6 @@
// falco v0.3.0
// falco v0.3.1
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -85,64 +85,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -154,10 +146,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -176,7 +171,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -195,15 +190,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -216,6 +204,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1669,6 +1667,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1726,8 +1880,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1740,7 +1892,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1752,33 +1904,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1809,13 +1945,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1832,7 +1965,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1863,13 +1996,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1877,18 +2006,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2562,7 +2690,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2719,12 +2848,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2737,19 +2890,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2758,23 +2986,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2808,7 +3034,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "falco",
"version" : "v0.3.0",
"version" : "v0.3.1",
"authors" : [
{
"name" : "Toni Verbeiren",
@@ -3031,6 +3257,10 @@ meta = [
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3131,7 +3361,7 @@ meta = [
"id" : "docker",
"image" : "debian:trixie-slim",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.0",
"target_tag" : "v0.3.1",
"namespace_separator" : "/",
"setup" : [
{
@@ -3169,23 +3399,24 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/falco",
"viash_version" : "0.9.0",
"git_commit" : "d86bd5cf62104af02caa852aacd352b1aa97ed60",
"git_remote" : "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox",
"git_tag" : "v0.2.0-29-gd86bd5c"
"viash_version" : "0.9.4",
"git_commit" : "98a5f3cc745525a65c10263d25cf414eb1093223",
"git_remote" : "https://github.com/viash-hub/biobox",
"git_tag" : "v0.3.0-8-g98a5f3c"
},
"package_config" : {
"name" : "biobox",
"version" : "v0.3.0",
"description" : "A collection of bioinformatics tools for working with sequence data.\n",
"viash_version" : "0.9.0",
"version" : "v0.3.1",
"summary" : "A curated collection of high-quality, standalone bioinformatics components built with [Viash](https://viash.io).\n",
"description" : "`biobox` offers a suite of reliable bioinformatics components, similar to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio), but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes **reusability**, **reproducibility**, and adherence to **best practices**. Key features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:** Run components directly via the command line or seamlessly integrate them into Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation for components and parameters.\n * Full exposure of underlying tool arguments.\n * Containerized (Docker) for dependency management and reproducibility.\n * Unit tested for verified functionality.\n",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.0'"
".engines[.type == 'docker'].target_tag := 'v0.3.1'"
],
"keywords" : [
"bioinformatics",
@@ -3604,7 +3835,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3618,6 +3849,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3631,7 +3883,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/biobox/falco",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'falco'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
version = 'v0.3.1'
description = 'A C++ drop-in replacement of FastQC to assess the quality of sequence read data'
author = 'Toni Verbeiren'
}

View File

@@ -120,10 +120,10 @@
"outdir": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.outdir.outdir`, example: `output`. Create all output files in the specified \noutput directory",
"help_text": "Type: `file`, required, default: `$id.$key.outdir.outdir`, example: `output`. Create all output files in the specified \noutput directory. FALCO-SPECIFIC: If the \ndirectory does not exists, the program will \ncreate it.\n"
"description": "Type: `file`, required, default: `$id.$key.outdir`, example: `output`. Create all output files in the specified \noutput directory",
"help_text": "Type: `file`, required, default: `$id.$key.outdir`, example: `output`. Create all output files in the specified \noutput directory. FALCO-SPECIFIC: If the \ndirectory does not exists, the program will \ncreate it.\n"
,
"default":"$id.$key.outdir.outdir"
"default":"$id.$key.outdir"
}
@@ -143,10 +143,10 @@
"data_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.data_filename.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.data_filename.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT). If not specified, it will \nbe called fastq_data.txt in either the input \nfile\u0027s directory or the one specified in the \n--output flag. Only available when running \nfalco with a single input.\n"
"description": "Type: `file`, default: `$id.$key.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.data_filename`. [Falco only] Specify filename for FastQC \ndata output (TXT). If not specified, it will \nbe called fastq_data.txt in either the input \nfile\u0027s directory or the one specified in the \n--output flag. Only available when running \nfalco with a single input.\n"
,
"default":"$id.$key.data_filename.data_filename"
"default":"$id.$key.data_filename"
}
@@ -154,10 +154,10 @@
"report_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.report_filename.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML)",
"help_text": "Type: `file`, default: `$id.$key.report_filename.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
"description": "Type: `file`, default: `$id.$key.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML)",
"help_text": "Type: `file`, default: `$id.$key.report_filename`. [Falco only] Specify filename for FastQC \nreport output (HTML). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
,
"default":"$id.$key.report_filename.report_filename"
"default":"$id.$key.report_filename"
}
@@ -165,10 +165,10 @@
"summary_filename": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.summary_filename.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.summary_filename.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
"description": "Type: `file`, default: `$id.$key.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT)",
"help_text": "Type: `file`, default: `$id.$key.summary_filename`. [Falco only] Specify filename for the short \nsummary output (TXT). If not specified, it \nwill be called fastq_report.html in either \nthe input file\u0027s directory or the one \nspecified in the --output flag. Only \navailable when running falco with a single \ninput.\n"
,
"default":"$id.$key.summary_filename.summary_filename"
"default":"$id.$key.summary_filename"
}

View File

@@ -1,5 +1,5 @@
name: "multiqc"
version: "v0.3.0"
version: "v0.3.1"
authors:
- name: "Dorien Roosen"
roles:
@@ -357,6 +357,9 @@ info:
doi: "10.1093/bioinformatics/btw354"
licence: "GPL v3 or later"
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -433,7 +436,7 @@ engines:
id: "docker"
image: "quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.0"
target_tag: "v0.3.1"
namespace_separator: "/"
setup:
- type: "docker"
@@ -455,23 +458,34 @@ build_info:
engine: "docker|native"
output: "target/nextflow/multiqc"
executable: "target/nextflow/multiqc/main.nf"
viash_version: "0.9.0"
git_commit: "d86bd5cf62104af02caa852aacd352b1aa97ed60"
git_remote: "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox"
git_tag: "v0.2.0-29-gd86bd5c"
viash_version: "0.9.4"
git_commit: "98a5f3cc745525a65c10263d25cf414eb1093223"
git_remote: "https://github.com/viash-hub/biobox"
git_tag: "v0.3.0-8-g98a5f3c"
package_config:
name: "biobox"
version: "v0.3.0"
description: "A collection of bioinformatics tools for working with sequence data.\n"
version: "v0.3.1"
summary: "A curated collection of high-quality, standalone bioinformatics components\
\ built with [Viash](https://viash.io).\n"
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
\ Run components directly via the command line or seamlessly integrate them into\
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
\ * Containerized (Docker) for dependency management and reproducibility.\n\
\ * Unit tested for verified functionality.\n"
info: null
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.0'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.1'"
keywords:
- "bioinformatics"
- "modules"

View File

@@ -1,6 +1,6 @@
// multiqc v0.3.0
// multiqc v0.3.1
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -85,64 +85,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -154,10 +146,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -176,7 +171,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -195,15 +190,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -216,6 +204,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1669,6 +1667,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1726,8 +1880,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1740,7 +1892,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1752,33 +1904,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1809,13 +1945,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1832,7 +1965,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1863,13 +1996,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1877,18 +2006,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2562,7 +2690,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2719,12 +2848,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2737,19 +2890,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2758,23 +2986,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2808,7 +3034,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "multiqc",
"version" : "v0.3.0",
"version" : "v0.3.1",
"authors" : [
{
"name" : "Dorien Roosen",
@@ -3246,6 +3472,10 @@ meta = [
"licence" : "GPL v3 or later"
},
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3335,7 +3565,7 @@ meta = [
"id" : "docker",
"image" : "quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.0",
"target_tag" : "v0.3.1",
"namespace_separator" : "/",
"setup" : [
{
@@ -3365,23 +3595,24 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/multiqc",
"viash_version" : "0.9.0",
"git_commit" : "d86bd5cf62104af02caa852aacd352b1aa97ed60",
"git_remote" : "https://x-access-token:ghs_EwAUAMYJ0K4VBHlAEMs4ZP2OyQYqJM0PSfEO@github.com/viash-hub/biobox",
"git_tag" : "v0.2.0-29-gd86bd5c"
"viash_version" : "0.9.4",
"git_commit" : "98a5f3cc745525a65c10263d25cf414eb1093223",
"git_remote" : "https://github.com/viash-hub/biobox",
"git_tag" : "v0.3.0-8-g98a5f3c"
},
"package_config" : {
"name" : "biobox",
"version" : "v0.3.0",
"description" : "A collection of bioinformatics tools for working with sequence data.\n",
"viash_version" : "0.9.0",
"version" : "v0.3.1",
"summary" : "A curated collection of high-quality, standalone bioinformatics components built with [Viash](https://viash.io).\n",
"description" : "`biobox` offers a suite of reliable bioinformatics components, similar to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio), but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes **reusability**, **reproducibility**, and adherence to **best practices**. Key features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:** Run components directly via the command line or seamlessly integrate them into Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation for components and parameters.\n * Full exposure of underlying tool arguments.\n * Containerized (Docker) for dependency management and reproducibility.\n * Unit tested for verified functionality.\n",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.0'"
".engines[.type == 'docker'].target_tag := 'v0.3.1'"
],
"keywords" : [
"bioinformatics",
@@ -3935,7 +4166,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3949,6 +4180,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3962,7 +4214,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/biobox/multiqc",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'multiqc'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.0'
version = 'v0.3.1'
description = 'MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It\'s a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n'
author = 'Dorien Roosen'
}

View File

@@ -48,10 +48,10 @@
"output_data": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_data.output_data`, example: `multiqc_data`. Output directory for parsed data files",
"help_text": "Type: `file`, default: `$id.$key.output_data.output_data`, example: `multiqc_data`. Output directory for parsed data files. If not provided, parsed data will not be published.\n"
"description": "Type: `file`, default: `$id.$key.output_data`, example: `multiqc_data`. Output directory for parsed data files",
"help_text": "Type: `file`, default: `$id.$key.output_data`, example: `multiqc_data`. Output directory for parsed data files. If not provided, parsed data will not be published.\n"
,
"default":"$id.$key.output_data.output_data"
"default":"$id.$key.output_data"
}
@@ -59,10 +59,10 @@
"output_plots": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_plots.output_plots`, example: `multiqc_plots`. Output directory for generated plots",
"help_text": "Type: `file`, default: `$id.$key.output_plots.output_plots`, example: `multiqc_plots`. Output directory for generated plots. If not provided, plots will not be published.\n"
"description": "Type: `file`, default: `$id.$key.output_plots`, example: `multiqc_plots`. Output directory for generated plots",
"help_text": "Type: `file`, default: `$id.$key.output_plots`, example: `multiqc_plots`. Output directory for generated plots. If not provided, plots will not be published.\n"
,
"default":"$id.$key.output_plots.output_plots"
"default":"$id.$key.output_plots"
}

View File

@@ -1,6 +1,6 @@
name: "interop_summary_to_csv"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -41,10 +41,21 @@ resources:
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "iseq-DI"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "summary"
- "index-summary"
- "ps"
license: "MIT"
links:
@@ -121,7 +132,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -145,29 +156,29 @@ build_info:
engine: "docker|native"
output: "target/executable/io/interop_summary_to_csv"
executable: "target/executable/io/interop_summary_to_csv/interop_summary_to_csv"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,8 +1,8 @@
#!/usr/bin/env bash
# interop_summary_to_csv v0.3.4
# interop_summary_to_csv v0.3.12
#
# This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
# This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
# Intuitive.
#
@@ -169,22 +169,6 @@ VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
VIASH_META_TEMP_DIR="$VIASH_TEMP"
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "interop_summary_to_csv v0.3.4"
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Sequencing run folder (*not* InterOp folder)."
echo ""
echo "Output arguments:"
echo " --output_run_summary"
echo " type: file, required parameter, output, file must exist"
echo ""
echo " --output_index_summary"
echo " type: file, required parameter, output, file must exist"
}
# initialise variables
VIASH_MODE='run'
@@ -470,10 +454,10 @@ tar -C /tmp/ --no-same-owner --no-same-permissions -xvf /tmp/interop.tar.gz && \
mv /tmp/interop-1.3.1-Linux-GNU/bin/index-summary /tmp/interop-1.3.1-Linux-GNU/bin/summary /usr/local/bin/
LABEL org.opencontainers.image.description="Companion container for running component io interop_summary_to_csv"
LABEL org.opencontainers.image.created="2024-12-20T12:37:34Z"
LABEL org.opencontainers.image.created="2025-05-26T10:39:03Z"
LABEL org.opencontainers.image.source="https://github.com/viash-hub/demultiplex"
LABEL org.opencontainers.image.revision="e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
LABEL org.opencontainers.image.version="v0.3.4"
LABEL org.opencontainers.image.revision="3a84e5409c5758611fff4eb585736f3737c8b896"
LABEL org.opencontainers.image.version="v0.3.12"
VIASHDOCKER
fi
@@ -587,6 +571,48 @@ fi
# initialise docker variables
VIASH_DOCKER_RUN_ARGS=(-i --rm)
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "interop_summary_to_csv v0.3.12"
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Sequencing run folder (*not* InterOp folder)."
echo ""
echo "Output arguments:"
echo " --output_run_summary"
echo " type: file, required parameter, output, file must exist"
echo ""
echo " --output_index_summary"
echo " type: file, required parameter, output, file must exist"
echo ""
echo "Viash built in Computational Requirements:"
echo " ---cpus=INT"
echo " Number of CPUs to use"
echo " ---memory=STRING"
echo " Amount of memory to use. Examples: 4GB, 3MiB."
echo ""
echo "Viash built in Docker:"
echo " ---setup=STRATEGY"
echo " Setup the docker container. Options are: alwaysbuild, alwayscachedbuild, ifneedbebuild, ifneedbecachedbuild, alwayspull, alwayspullelsebuild, alwayspullelsecachedbuild, ifneedbepull, ifneedbepullelsebuild, ifneedbepullelsecachedbuild, push, pushifnotpresent, donothing."
echo " Default: ifneedbepullelsecachedbuild"
echo " ---dockerfile"
echo " Print the dockerfile to stdout."
echo " ---docker_run_args=ARG"
echo " Provide runtime arguments to Docker. See the documentation on \`docker run\` for more information."
echo " ---docker_image_id"
echo " Print the docker image id to stdout."
echo " ---debug"
echo " Enter the docker container for debugging purposes."
echo ""
echo "Viash built in Engines:"
echo " ---engine=ENGINE_ID"
echo " Specify the engine to use. Options are: docker, native."
echo " Default: docker"
}
# initialise array
VIASH_POSITIONAL_ARGS=''
@@ -609,7 +635,7 @@ while [[ $# -gt 0 ]]; do
shift 1
;;
--version)
echo "interop_summary_to_csv v0.3.4"
echo "interop_summary_to_csv v0.3.12"
exit
;;
--input)
@@ -733,7 +759,7 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
# determine docker image id
if [[ "$VIASH_ENGINE_ID" == 'docker' ]]; then
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/interop_summary_to_csv:v0.3.4'
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/interop_summary_to_csv:v0.3.12'
fi
# print dockerfile
@@ -755,13 +781,13 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
# build docker image
elif [ "$VIASH_MODE" == "setup" ]; then
ViashDockerSetup "$VIASH_DOCKER_IMAGE_ID" "$VIASH_SETUP_STRATEGY"
ViashDockerCheckCommands "$VIASH_DOCKER_IMAGE_ID" 'ps' 'bash'
ViashDockerCheckCommands "$VIASH_DOCKER_IMAGE_ID" 'summary' 'index-summary' 'ps' 'bash'
exit 0
fi
# check if docker image exists
ViashDockerSetup "$VIASH_DOCKER_IMAGE_ID" ifneedbepullelsecachedbuild
ViashDockerCheckCommands "$VIASH_DOCKER_IMAGE_ID" 'ps' 'bash'
ViashDockerCheckCommands "$VIASH_DOCKER_IMAGE_ID" 'summary' 'index-summary' 'ps' 'bash'
fi
# setting computational defaults

View File

@@ -1,6 +1,6 @@
name: "publish"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -22,7 +22,7 @@ argument_groups:
create_parent: true
required: true
direction: "input"
multiple: false
multiple: true
multiple_sep: ";"
- type: "file"
name: "--input_multiqc"
@@ -44,6 +44,15 @@ argument_groups:
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_demultiplexer_logs"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
@@ -90,6 +99,17 @@ argument_groups:
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_demultiplexer_logs"
info: null
default:
- "demultiplexer_logs"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "code.sh"
@@ -100,6 +120,9 @@ resources:
description: "Publish the processed results of the run"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -178,7 +201,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -195,29 +218,29 @@ build_info:
engine: "docker|native"
output: "target/executable/io/publish"
executable: "target/executable/io/publish/publish"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,8 +1,8 @@
#!/usr/bin/env bash
# publish v0.3.4
# publish v0.3.12
#
# This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
# This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
# Intuitive.
#
@@ -169,46 +169,6 @@ VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
VIASH_META_TEMP_DIR="$VIASH_TEMP"
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "publish v0.3.4"
echo ""
echo "Publish the processed results of the run"
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Directory to write fastq data to"
echo ""
echo " --input_falco"
echo " type: file, required parameter, file must exist"
echo " Directory to write falco output to"
echo ""
echo " --input_multiqc"
echo " type: file, required parameter, file must exist"
echo " Location where to write the MultiQC report to."
echo ""
echo " --input_run_information"
echo " type: file, required parameter, file must exist"
echo " Location where to write the run information to."
echo ""
echo "Output arguments:"
echo " --output"
echo " type: file, output, file must exist"
echo " default: fastq"
echo ""
echo " --output_falco"
echo " type: file, output, file must exist"
echo " default: qc/fastqc"
echo ""
echo " --output_multiqc"
echo " type: file, output, file must exist"
echo " default: qc/multiqc_report.html"
echo ""
echo " --output_run_information"
echo " type: file, output, file must exist"
echo " default: run_information.csv"
}
# initialise variables
VIASH_MODE='run'
@@ -490,10 +450,10 @@ RUN apt-get update && \
rm -rf /var/lib/apt/lists/*
LABEL org.opencontainers.image.description="Companion container for running component io publish"
LABEL org.opencontainers.image.created="2024-12-20T12:37:34Z"
LABEL org.opencontainers.image.created="2025-05-26T10:39:03Z"
LABEL org.opencontainers.image.source="https://github.com/viash-hub/demultiplex"
LABEL org.opencontainers.image.revision="e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
LABEL org.opencontainers.image.version="v0.3.4"
LABEL org.opencontainers.image.revision="3a84e5409c5758611fff4eb585736f3737c8b896"
LABEL org.opencontainers.image.version="v0.3.12"
VIASHDOCKER
fi
@@ -607,6 +567,79 @@ fi
# initialise docker variables
VIASH_DOCKER_RUN_ARGS=(-i --rm)
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "publish v0.3.12"
echo ""
echo "Publish the processed results of the run"
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Directory to write fastq data to"
echo ""
echo " --input_falco"
echo " type: file, required parameter, multiple values allowed, file must exist"
echo " Directory to write falco output to"
echo ""
echo " --input_multiqc"
echo " type: file, required parameter, file must exist"
echo " Location where to write the MultiQC report to."
echo ""
echo " --input_run_information"
echo " type: file, required parameter, file must exist"
echo " Location where to write the run information to."
echo ""
echo " --input_demultiplexer_logs"
echo " type: file, required parameter, file must exist"
echo ""
echo "Output arguments:"
echo " --output"
echo " type: file, output, file must exist"
echo " default: fastq"
echo ""
echo " --output_falco"
echo " type: file, output, file must exist"
echo " default: qc/fastqc"
echo ""
echo " --output_multiqc"
echo " type: file, output, file must exist"
echo " default: qc/multiqc_report.html"
echo ""
echo " --output_run_information"
echo " type: file, output, file must exist"
echo " default: run_information.csv"
echo ""
echo " --output_demultiplexer_logs"
echo " type: file, output, file must exist"
echo " default: demultiplexer_logs"
echo ""
echo "Viash built in Computational Requirements:"
echo " ---cpus=INT"
echo " Number of CPUs to use"
echo " ---memory=STRING"
echo " Amount of memory to use. Examples: 4GB, 3MiB."
echo ""
echo "Viash built in Docker:"
echo " ---setup=STRATEGY"
echo " Setup the docker container. Options are: alwaysbuild, alwayscachedbuild, ifneedbebuild, ifneedbecachedbuild, alwayspull, alwayspullelsebuild, alwayspullelsecachedbuild, ifneedbepull, ifneedbepullelsebuild, ifneedbepullelsecachedbuild, push, pushifnotpresent, donothing."
echo " Default: ifneedbepullelsecachedbuild"
echo " ---dockerfile"
echo " Print the dockerfile to stdout."
echo " ---docker_run_args=ARG"
echo " Provide runtime arguments to Docker. See the documentation on \`docker run\` for more information."
echo " ---docker_image_id"
echo " Print the docker image id to stdout."
echo " ---debug"
echo " Enter the docker container for debugging purposes."
echo ""
echo "Viash built in Engines:"
echo " ---engine=ENGINE_ID"
echo " Specify the engine to use. Options are: docker, native."
echo " Default: docker"
}
# initialise array
VIASH_POSITIONAL_ARGS=''
@@ -629,7 +662,7 @@ while [[ $# -gt 0 ]]; do
shift 1
;;
--version)
echo "publish v0.3.4"
echo "publish v0.3.12"
exit
;;
--input)
@@ -644,14 +677,20 @@ while [[ $# -gt 0 ]]; do
shift 1
;;
--input_falco)
[ -n "$VIASH_PAR_INPUT_FALCO" ] && ViashError Bad arguments for option \'--input_falco\': \'$VIASH_PAR_INPUT_FALCO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_INPUT_FALCO="$2"
if [ -z "$VIASH_PAR_INPUT_FALCO" ]; then
VIASH_PAR_INPUT_FALCO="$2"
else
VIASH_PAR_INPUT_FALCO="$VIASH_PAR_INPUT_FALCO;""$2"
fi
[ $# -lt 2 ] && ViashError Not enough arguments passed to --input_falco. Use "--help" to get more information on the parameters. && exit 1
shift 2
;;
--input_falco=*)
[ -n "$VIASH_PAR_INPUT_FALCO" ] && ViashError Bad arguments for option \'--input_falco=*\': \'$VIASH_PAR_INPUT_FALCO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_INPUT_FALCO=$(ViashRemoveFlags "$1")
if [ -z "$VIASH_PAR_INPUT_FALCO" ]; then
VIASH_PAR_INPUT_FALCO=$(ViashRemoveFlags "$1")
else
VIASH_PAR_INPUT_FALCO="$VIASH_PAR_INPUT_FALCO;"$(ViashRemoveFlags "$1")
fi
shift 1
;;
--input_multiqc)
@@ -676,6 +715,17 @@ while [[ $# -gt 0 ]]; do
VIASH_PAR_INPUT_RUN_INFORMATION=$(ViashRemoveFlags "$1")
shift 1
;;
--input_demultiplexer_logs)
[ -n "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ] && ViashError Bad arguments for option \'--input_demultiplexer_logs\': \'$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS="$2"
[ $# -lt 2 ] && ViashError Not enough arguments passed to --input_demultiplexer_logs. Use "--help" to get more information on the parameters. && exit 1
shift 2
;;
--input_demultiplexer_logs=*)
[ -n "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ] && ViashError Bad arguments for option \'--input_demultiplexer_logs=*\': \'$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS=$(ViashRemoveFlags "$1")
shift 1
;;
--output)
[ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_OUTPUT="$2"
@@ -720,6 +770,17 @@ while [[ $# -gt 0 ]]; do
VIASH_PAR_OUTPUT_RUN_INFORMATION=$(ViashRemoveFlags "$1")
shift 1
;;
--output_demultiplexer_logs)
[ -n "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ] && ViashError Bad arguments for option \'--output_demultiplexer_logs\': \'$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS="$2"
[ $# -lt 2 ] && ViashError Not enough arguments passed to --output_demultiplexer_logs. Use "--help" to get more information on the parameters. && exit 1
shift 2
;;
--output_demultiplexer_logs=*)
[ -n "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ] && ViashError Bad arguments for option \'--output_demultiplexer_logs=*\': \'$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS=$(ViashRemoveFlags "$1")
shift 1
;;
---engine)
VIASH_ENGINE_ID="$2"
shift 2
@@ -808,7 +869,7 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
# determine docker image id
if [[ "$VIASH_ENGINE_ID" == 'docker' ]]; then
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/publish:v0.3.4'
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/publish:v0.3.12'
fi
# print dockerfile
@@ -908,6 +969,10 @@ if [ -z ${VIASH_PAR_INPUT_RUN_INFORMATION+x} ]; then
ViashError '--input_run_information' is a required argument. Use "--help" to get more information on the parameters.
exit 1
fi
if [ -z ${VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS+x} ]; then
ViashError '--input_demultiplexer_logs' is a required argument. Use "--help" to get more information on the parameters.
exit 1
fi
if [ -z ${VIASH_META_NAME+x} ]; then
ViashError 'name' is a required argument. Use "--help" to get more information on the parameters.
exit 1
@@ -946,15 +1011,26 @@ fi
if [ -z ${VIASH_PAR_OUTPUT_RUN_INFORMATION+x} ]; then
VIASH_PAR_OUTPUT_RUN_INFORMATION="run_information.csv"
fi
if [ -z ${VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS+x} ]; then
VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS="demultiplexer_logs"
fi
# check whether required files exist
if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
exit 1
fi
if [ ! -z "$VIASH_PAR_INPUT_FALCO" ] && [ ! -e "$VIASH_PAR_INPUT_FALCO" ]; then
ViashError "Input file '$VIASH_PAR_INPUT_FALCO' does not exist."
exit 1
if [ ! -z "$VIASH_PAR_INPUT_FALCO" ]; then
IFS=';'
set -f
for file in $VIASH_PAR_INPUT_FALCO; do
unset IFS
if [ ! -e "$file" ]; then
ViashError "Input file '$file' does not exist."
exit 1
fi
done
set +f
fi
if [ ! -z "$VIASH_PAR_INPUT_MULTIQC" ] && [ ! -e "$VIASH_PAR_INPUT_MULTIQC" ]; then
ViashError "Input file '$VIASH_PAR_INPUT_MULTIQC' does not exist."
@@ -964,6 +1040,10 @@ if [ ! -z "$VIASH_PAR_INPUT_RUN_INFORMATION" ] && [ ! -e "$VIASH_PAR_INPUT_RUN_I
ViashError "Input file '$VIASH_PAR_INPUT_RUN_INFORMATION' does not exist."
exit 1
fi
if [ ! -z "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ] && [ ! -e "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ]; then
ViashError "Input file '$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS' does not exist."
exit 1
fi
# check whether parameters values are of the right type
if [[ -n "$VIASH_META_CPUS" ]]; then
@@ -1052,6 +1132,9 @@ fi
if [ ! -z "$VIASH_PAR_OUTPUT_RUN_INFORMATION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_RUN_INFORMATION")" ]; then
mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_RUN_INFORMATION")"
fi
if [ ! -z "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS")" ]; then
mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS")"
fi
if [ "$VIASH_ENGINE_ID" == "native" ] ; then
if [ "$VIASH_MODE" == "run" ]; then
@@ -1070,8 +1153,15 @@ if [ ! -z "$VIASH_PAR_INPUT" ]; then
VIASH_PAR_INPUT=$(ViashDockerAutodetectMount "$VIASH_PAR_INPUT")
fi
if [ ! -z "$VIASH_PAR_INPUT_FALCO" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_INPUT_FALCO")" )
VIASH_PAR_INPUT_FALCO=$(ViashDockerAutodetectMount "$VIASH_PAR_INPUT_FALCO")
VIASH_TEST_INPUT_FALCO=()
IFS=';'
for var in $VIASH_PAR_INPUT_FALCO; do
unset IFS
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$var")" )
var=$(ViashDockerAutodetectMount "$var")
VIASH_TEST_INPUT_FALCO+=( "$var" )
done
VIASH_PAR_INPUT_FALCO=$(IFS=';' ; echo "${VIASH_TEST_INPUT_FALCO[*]}")
fi
if [ ! -z "$VIASH_PAR_INPUT_MULTIQC" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_INPUT_MULTIQC")" )
@@ -1081,6 +1171,10 @@ if [ ! -z "$VIASH_PAR_INPUT_RUN_INFORMATION" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_INPUT_RUN_INFORMATION")" )
VIASH_PAR_INPUT_RUN_INFORMATION=$(ViashDockerAutodetectMount "$VIASH_PAR_INPUT_RUN_INFORMATION")
fi
if [ ! -z "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS")" )
VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS=$(ViashDockerAutodetectMount "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS")
fi
if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
VIASH_PAR_OUTPUT=$(ViashDockerAutodetectMount "$VIASH_PAR_OUTPUT")
@@ -1101,6 +1195,11 @@ if [ ! -z "$VIASH_PAR_OUTPUT_RUN_INFORMATION" ]; then
VIASH_PAR_OUTPUT_RUN_INFORMATION=$(ViashDockerAutodetectMount "$VIASH_PAR_OUTPUT_RUN_INFORMATION")
VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_RUN_INFORMATION" )
fi
if [ ! -z "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS")" )
VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS=$(ViashDockerAutodetectMount "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS")
VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" )
fi
if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
VIASH_DIRECTORY_MOUNTS+=( "$(ViashDockerAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
VIASH_META_RESOURCES_DIR=$(ViashDockerAutodetectMount "$VIASH_META_RESOURCES_DIR")
@@ -1174,10 +1273,12 @@ $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'
$( if [ ! -z ${VIASH_PAR_INPUT_FALCO+x} ]; then echo "${VIASH_PAR_INPUT_FALCO}" | sed "s#'#'\"'\"'#g;s#.*#par_input_falco='&'#" ; else echo "# par_input_falco="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_MULTIQC+x} ]; then echo "${VIASH_PAR_INPUT_MULTIQC}" | sed "s#'#'\"'\"'#g;s#.*#par_input_multiqc='&'#" ; else echo "# par_input_multiqc="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_RUN_INFORMATION+x} ]; then echo "${VIASH_PAR_INPUT_RUN_INFORMATION}" | sed "s#'#'\"'\"'#g;s#.*#par_input_run_information='&'#" ; else echo "# par_input_run_information="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS+x} ]; then echo "${VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS}" | sed "s#'#'\"'\"'#g;s#.*#par_input_demultiplexer_logs='&'#" ; else echo "# par_input_demultiplexer_logs="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_FALCO+x} ]; then echo "${VIASH_PAR_OUTPUT_FALCO}" | sed "s#'#'\"'\"'#g;s#.*#par_output_falco='&'#" ; else echo "# par_output_falco="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_MULTIQC+x} ]; then echo "${VIASH_PAR_OUTPUT_MULTIQC}" | sed "s#'#'\"'\"'#g;s#.*#par_output_multiqc='&'#" ; else echo "# par_output_multiqc="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_RUN_INFORMATION+x} ]; then echo "${VIASH_PAR_OUTPUT_RUN_INFORMATION}" | sed "s#'#'\"'\"'#g;s#.*#par_output_run_information='&'#" ; else echo "# par_output_run_information="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS+x} ]; then echo "${VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS}" | sed "s#'#'\"'\"'#g;s#.*#par_output_demultiplexer_logs='&'#" ; else echo "# par_output_demultiplexer_logs="; fi )
$( if [ ! -z ${VIASH_META_NAME+x} ]; then echo "${VIASH_META_NAME}" | sed "s#'#'\"'\"'#g;s#.*#meta_name='&'#" ; else echo "# meta_name="; fi )
$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\"'\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
@@ -1203,9 +1304,9 @@ $( if [ ! -z ${VIASH_META_MEMORY_PIB+x} ]; then echo "${VIASH_META_MEMORY_PIB}"
set -eo pipefail
declare -A input_output_mapping=(["par_input"]="par_output"
["par_input_falco"]="par_output_falco"
["par_input_multiqc"]="par_output_multiqc"
["par_input_run_information"]="par_output_run_information"
["par_input_demultiplexer_logs"]="par_output_demultiplexer_logs"
)
for input_argument_name in "\${!input_output_mapping[@]}"
@@ -1224,6 +1325,14 @@ do
echo "Output files for \$output_location:"
ls "\$output_location"
done
echo "Grouping output from \$par_input_falco into \$par_output_falco"
mkdir -p "\$par_output_falco"
IFS=";" read -ra falco_inputs <<< \$par_input_falco
for falco_dir in "\${falco_inputs[@]}"; do
echo "Copying contents of \$falco_dir"
find -H -D exec "\$falco_dir" -type f -maxdepth 1 -exec cp -t "\$par_output_falco" {} +
done
VIASHMAIN
bash "\$tempscript" &
wait "\$!"
@@ -1238,7 +1347,17 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
VIASH_PAR_INPUT=$(ViashDockerStripAutomount "$VIASH_PAR_INPUT")
fi
if [ ! -z "$VIASH_PAR_INPUT_FALCO" ]; then
VIASH_PAR_INPUT_FALCO=$(ViashDockerStripAutomount "$VIASH_PAR_INPUT_FALCO")
unset VIASH_TEST_INPUT_FALCO
IFS=';'
for var in $VIASH_PAR_INPUT_FALCO; do
unset IFS
if [ -z "$VIASH_TEST_INPUT_FALCO" ]; then
VIASH_TEST_INPUT_FALCO="$(ViashDockerStripAutomount "$var")"
else
VIASH_TEST_INPUT_FALCO="$VIASH_TEST_INPUT_FALCO;""$(ViashDockerStripAutomount "$var")"
fi
done
VIASH_PAR_INPUT_FALCO="$VIASH_TEST_INPUT_FALCO"
fi
if [ ! -z "$VIASH_PAR_INPUT_MULTIQC" ]; then
VIASH_PAR_INPUT_MULTIQC=$(ViashDockerStripAutomount "$VIASH_PAR_INPUT_MULTIQC")
@@ -1246,6 +1365,9 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
if [ ! -z "$VIASH_PAR_INPUT_RUN_INFORMATION" ]; then
VIASH_PAR_INPUT_RUN_INFORMATION=$(ViashDockerStripAutomount "$VIASH_PAR_INPUT_RUN_INFORMATION")
fi
if [ ! -z "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS" ]; then
VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS=$(ViashDockerStripAutomount "$VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS")
fi
if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
VIASH_PAR_OUTPUT=$(ViashDockerStripAutomount "$VIASH_PAR_OUTPUT")
fi
@@ -1258,6 +1380,9 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
if [ ! -z "$VIASH_PAR_OUTPUT_RUN_INFORMATION" ]; then
VIASH_PAR_OUTPUT_RUN_INFORMATION=$(ViashDockerStripAutomount "$VIASH_PAR_OUTPUT_RUN_INFORMATION")
fi
if [ ! -z "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ]; then
VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS=$(ViashDockerStripAutomount "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS")
fi
if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
VIASH_META_RESOURCES_DIR=$(ViashDockerStripAutomount "$VIASH_META_RESOURCES_DIR")
fi
@@ -1290,6 +1415,10 @@ if [ ! -z "$VIASH_PAR_OUTPUT_RUN_INFORMATION" ] && [ ! -e "$VIASH_PAR_OUTPUT_RUN
ViashError "Output file '$VIASH_PAR_OUTPUT_RUN_INFORMATION' does not exist."
exit 1
fi
if [ ! -z "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ] && [ ! -e "$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS" ]; then
ViashError "Output file '$VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS' does not exist."
exit 1
fi
exit 0

View File

@@ -1,6 +1,6 @@
name: "untar"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -57,6 +57,9 @@ test_resources:
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -135,7 +138,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -152,29 +155,29 @@ build_info:
engine: "docker|native"
output: "target/executable/io/untar"
executable: "target/executable/io/untar/untar"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,8 +1,8 @@
#!/usr/bin/env bash
# untar v0.3.4
# untar v0.3.12
#
# This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
# This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
# Intuitive.
#
@@ -169,32 +169,6 @@ VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
VIASH_META_TEMP_DIR="$VIASH_TEMP"
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "untar v0.3.4"
echo ""
echo "Unpack a .tar file. When the contents of the .tar file is just a single"
echo "directory,"
echo "put the contents of the directory into the output folder instead of that"
echo "directory."
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Tarball file to be unpacked."
echo ""
echo "Output arguments:"
echo " --output"
echo " type: file, required parameter, output, file must exist"
echo " Directory to write the contents of the .tar file to."
echo ""
echo "Other arguments:"
echo " -e, --exclude"
echo " type: string"
echo " example: docs/figures"
echo " Prevents any file or member whose name matches the shell wildcard"
echo " (pattern) from being extracted."
}
# initialise variables
VIASH_MODE='run'
@@ -476,10 +450,10 @@ RUN apt-get update && \
rm -rf /var/lib/apt/lists/*
LABEL org.opencontainers.image.description="Companion container for running component io untar"
LABEL org.opencontainers.image.created="2024-12-20T12:37:34Z"
LABEL org.opencontainers.image.created="2025-05-26T10:39:02Z"
LABEL org.opencontainers.image.source="https://github.com/viash-hub/demultiplex"
LABEL org.opencontainers.image.revision="e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
LABEL org.opencontainers.image.version="v0.3.4"
LABEL org.opencontainers.image.revision="3a84e5409c5758611fff4eb585736f3737c8b896"
LABEL org.opencontainers.image.version="v0.3.12"
VIASHDOCKER
fi
@@ -593,6 +567,58 @@ fi
# initialise docker variables
VIASH_DOCKER_RUN_ARGS=(-i --rm)
# ViashHelp: Display helpful explanation about this executable
function ViashHelp {
echo "untar v0.3.12"
echo ""
echo "Unpack a .tar file. When the contents of the .tar file is just a single"
echo "directory,"
echo "put the contents of the directory into the output folder instead of that"
echo "directory."
echo ""
echo "Input arguments:"
echo " --input"
echo " type: file, required parameter, file must exist"
echo " Tarball file to be unpacked."
echo ""
echo "Output arguments:"
echo " --output"
echo " type: file, required parameter, output, file must exist"
echo " Directory to write the contents of the .tar file to."
echo ""
echo "Other arguments:"
echo " -e, --exclude"
echo " type: string"
echo " example: docs/figures"
echo " Prevents any file or member whose name matches the shell wildcard"
echo " (pattern) from being extracted."
echo ""
echo "Viash built in Computational Requirements:"
echo " ---cpus=INT"
echo " Number of CPUs to use"
echo " ---memory=STRING"
echo " Amount of memory to use. Examples: 4GB, 3MiB."
echo ""
echo "Viash built in Docker:"
echo " ---setup=STRATEGY"
echo " Setup the docker container. Options are: alwaysbuild, alwayscachedbuild, ifneedbebuild, ifneedbecachedbuild, alwayspull, alwayspullelsebuild, alwayspullelsecachedbuild, ifneedbepull, ifneedbepullelsebuild, ifneedbepullelsecachedbuild, push, pushifnotpresent, donothing."
echo " Default: ifneedbepullelsecachedbuild"
echo " ---dockerfile"
echo " Print the dockerfile to stdout."
echo " ---docker_run_args=ARG"
echo " Provide runtime arguments to Docker. See the documentation on \`docker run\` for more information."
echo " ---docker_image_id"
echo " Print the docker image id to stdout."
echo " ---debug"
echo " Enter the docker container for debugging purposes."
echo ""
echo "Viash built in Engines:"
echo " ---engine=ENGINE_ID"
echo " Specify the engine to use. Options are: docker, native."
echo " Default: docker"
}
# initialise array
VIASH_POSITIONAL_ARGS=''
@@ -615,7 +641,7 @@ while [[ $# -gt 0 ]]; do
shift 1
;;
--version)
echo "untar v0.3.4"
echo "untar v0.3.12"
exit
;;
--input)
@@ -745,7 +771,7 @@ if [[ "$VIASH_ENGINE_TYPE" == "docker" ]]; then
# determine docker image id
if [[ "$VIASH_ENGINE_ID" == 'docker' ]]; then
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/untar:v0.3.4'
VIASH_DOCKER_IMAGE_ID='images.viash-hub.com/vsh/demultiplex/io/untar:v0.3.12'
fi
# print dockerfile

View File

@@ -1,6 +1,6 @@
name: "combine_samples"
namespace: "dataflow"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -30,6 +30,15 @@ argument_groups:
direction: "input"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--falco_dir"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
@@ -50,6 +59,15 @@ argument_groups:
direction: "output"
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_falco"
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: true
multiple_sep: ";"
resources:
- type: "nextflow_script"
path: "main.nf"
@@ -62,6 +80,9 @@ description: "Combine fastq files from across samples into one event with a list
\ fastq files per orientation."
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -143,29 +164,29 @@ build_info:
engine: "native|native"
output: "target/nextflow/dataflow/combine_samples"
executable: "target/nextflow/dataflow/combine_samples/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// combine_samples v0.3.4
// combine_samples v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2806,7 +3032,7 @@ meta = [
"config": processConfig(readJsonBlob('''{
"name" : "combine_samples",
"namespace" : "dataflow",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2839,6 +3065,16 @@ meta = [
"direction" : "input",
"multiple" : true,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--falco_dir",
"must_exist" : true,
"create_parent" : true,
"required" : true,
"direction" : "input",
"multiple" : false,
"multiple_sep" : ";"
}
]
},
@@ -2864,6 +3100,16 @@ meta = [
"direction" : "output",
"multiple" : true,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--output_falco",
"must_exist" : true,
"create_parent" : true,
"required" : true,
"direction" : "output",
"multiple" : true,
"multiple_sep" : ";"
}
]
}
@@ -2883,6 +3129,10 @@ meta = [
],
"description" : "Combine fastq files from across samples into one event with a list of fastq files per orientation.",
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -2979,31 +3229,31 @@ meta = [
"runner" : "nextflow",
"engine" : "native|native",
"output" : "target/nextflow/dataflow/combine_samples",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3041,10 +3291,12 @@ workflow run_wf {
// Gather the following state for all samples
def forward_fastqs = states.collect{it.forward_input}.flatten()
def reverse_fastqs = states.collect{it.reverse_input}.findAll{it != null}.flatten()
def falco_dirs = states.collect{it.falco_dir}
def resultState = [
"output_forward": forward_fastqs,
"output_reverse": reverse_fastqs,
"output_falco": falco_dirs,
// The join ID is the same across all samples from the same run
"_meta": ["join_id": states[0]._meta.join_id]
]

View File

@@ -2,7 +2,7 @@ manifest {
name = 'dataflow/combine_samples'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'Combine fastq files from across samples into one event with a list of fastq files per orientation.'
}

View File

@@ -1,126 +1,106 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "combine_samples",
"description": "Combine fastq files from across samples into one event with a list of fastq files per orientation.",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type":
"string",
"description": "Type: `string`, required. ID of the new event",
"help_text": "Type: `string`, required. ID of the new event"
}
,
"forward_input": {
"type":
"string",
"description": "Type: List of `file`, required, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, multiple_sep: `\";\"`. "
}
,
"reverse_input": {
"type":
"string",
"description": "Type: List of `file`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, multiple_sep: `\";\"`. "
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_forward": {
"type":
"string",
"description": "Type: List of `file`, required, default: `$id.$key.output_forward_*.output_forward_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, default: `$id.$key.output_forward_*.output_forward_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.output_forward_*.output_forward_*"
}
,
"output_reverse": {
"type":
"string",
"description": "Type: List of `file`, default: `$id.$key.output_reverse_*.output_reverse_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, default: `$id.$key.output_reverse_*.output_reverse_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.output_reverse_*.output_reverse_*"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "combine_samples",
"description": "Combine fastq files from across samples into one event with a list of fastq files per orientation.",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type": "string",
"description": "ID of the new event",
"help_text": "Type: `string`, multiple: `False`, required. "
},
"forward_input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"exists": true,
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`. "
},
"reverse_input": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, direction: `input`. "
},
"falco_dir": {
"type": "string",
"format": "path",
"exists": true,
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_forward": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, default: `\"$id.$key.output_forward_*\"`, direction: `output`. ",
"default": "$id.$key.output_forward_*"
},
"output_reverse": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.output_reverse_*\"`, direction: `output`. ",
"default": "$id.$key.output_reverse_*"
},
"output_falco": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, default: `\"$id.$key.output_falco_*\"`, direction: `output`. ",
"default": "$id.$key.output_falco_*"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,6 +1,6 @@
name: "gather_fastqs_and_validate"
namespace: "dataflow"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -54,8 +54,26 @@ resources:
dest: "nextflow_labels.config"
description: "From a directory containing fastq files, gather the files per sample\
\ \nand validate according to the contents of the sample sheet.\n"
test_resources:
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_gather_and_validate"
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_undetermined_empty"
- type: "nextflow_script"
path: "test.nf"
is_executable: true
entrypoint: "test_without_index"
- type: "file"
path: "test_data"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -137,29 +155,29 @@ build_info:
engine: "native|native"
output: "target/nextflow/dataflow/gather_fastqs_and_validate"
executable: "target/nextflow/dataflow/gather_fastqs_and_validate/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// gather_fastqs_and_validate v0.3.4
// gather_fastqs_and_validate v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2806,7 +3032,7 @@ meta = [
"config": processConfig(readJsonBlob('''{
"name" : "gather_fastqs_and_validate",
"namespace" : "dataflow",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2875,7 +3101,35 @@ meta = [
}
],
"description" : "From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n",
"test_resources" : [
{
"type" : "nextflow_script",
"path" : "test.nf",
"is_executable" : true,
"entrypoint" : "test_gather_and_validate"
},
{
"type" : "nextflow_script",
"path" : "test.nf",
"is_executable" : true,
"entrypoint" : "test_undetermined_empty"
},
{
"type" : "nextflow_script",
"path" : "test.nf",
"is_executable" : true,
"entrypoint" : "test_without_index"
},
{
"type" : "file",
"path" : "test_data"
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -2972,31 +3226,31 @@ meta = [
"runner" : "nextflow",
"engine" : "native|native",
"output" : "target/nextflow/dataflow/gather_fastqs_and_validate",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3019,6 +3273,29 @@ meta = [
// inner workflow
// user-provided Nextflow code
import java.util.zip.GZIPInputStream
import java.nio.file.Files
import java.io.BufferedInputStream
def is_empty(file_to_check){
/*
Checks if a file has content
*/
if (file_to_check.size() == 0) {
return true
}
def input_stream = Files.newInputStream(file_to_check)
def gzInputStream
try {
gzInputStream = new GZIPInputStream(new BufferedInputStream(input_stream))
} catch (java.io.EOFException ex) {
// This is not a gzipfile...
return false
}
def read_one_byte = gzInputStream.read()
return read_one_byte == -1
}
workflow run_wf {
take:
input_ch
@@ -3031,15 +3308,16 @@ workflow run_wf {
def sample_sheet = state.sample_sheet
def start_parsing = false
def sample_id_column_index = null
def samples = ["Undetermined"]
def undetermined_sample_name = "Undetermined"
def samples = [undetermined_sample_name]
def original_id = id
// Parse sample sheet for sample IDs
println "Processing run information file ${sample_sheet}"
csv_lines = sample_sheet.splitCsv(header: false, sep: ',')
csv_lines.any { csv_items ->
if (csv_items.isEmpty()) {
// skip empty line
if (csv_items.isEmpty() || csv_items[0].startsWith("#")) {
// skip empty or commented line
return
}
def possible_header = csv_items[0]
@@ -3051,8 +3329,8 @@ workflow run_wf {
return true
}
// [Data], [BCLConvert_Data] for illumina
// [Samples] for Element Biosciences
if (header in ["Data", "Samples", "BCLConvert_Data"]) {
// [Samples] or sometimes [SAMPLES] for Element Biosciences
if (header.toLowerCase() in ["data", "samples", "bclconvert_data"]) {
println "Found header [${header}], start parsing."
start_parsing = true
return
@@ -3090,8 +3368,9 @@ workflow run_wf {
processed_samples = samples.collect { sample_id ->
def forward_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R1_(\d+)\.fastq\.gz$/
def reverse_regex = ~/^${sample_id}_S(\d+)_(L(\d+)_)?R2_(\d+)\.fastq\.gz$/
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}
// Sort is needed here because multiple lanes (_L00*_) might be present and they need to be in the same order in both lists
def forward_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ forward_regex}.sort()
def reverse_fastq = state.input.listFiles().findAll{it.isFile() && it.name ==~ reverse_regex}.sort()
assert forward_fastq && !forward_fastq.isEmpty(): "No forward fastq files were found for sample ${sample_id}. " +
"All fastq files in directory: ${allfastqs.collect{it.name}}"
assert (reverse_fastq.isEmpty() || (forward_fastq.size() == reverse_fastq.size())):
@@ -3099,6 +3378,9 @@ workflow run_wf {
"Found forward: ${forward_fastq} and reverse: ${reverse_fastq}."
println "Found ${forward_fastq.size()} forward and ${reverse_fastq.size()} reverse " +
"fastq files for sample ${sample_id}"
assert sample_id == undetermined_sample_name || (forward_fastq.every{!is_empty(it)} && reverse_fastq.every{!is_empty(it)}):
"A fastq file for sample '${sample_id}' appears to be empty!"
def fastqs_state = [
"fastq_forward": forward_fastq,
"fastq_reverse": reverse_fastq,

View File

@@ -2,7 +2,7 @@ manifest {
name = 'dataflow/gather_fastqs_and_validate'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n'
}

View File

@@ -1,116 +1,79 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "gather_fastqs_and_validate",
"description": "From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory containing ",
"help_text": "Type: `file`, required. Directory containing .fastq files"
}
,
"sample_sheet": {
"type":
"string",
"description": "Type: `file`, required. Sample sheet",
"help_text": "Type: `file`, required. Sample sheet"
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_forward": {
"type":
"string",
"description": "Type: List of `file`, required, default: `$id.$key.fastq_forward_*.fastq_forward_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, required, default: `$id.$key.fastq_forward_*.fastq_forward_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.fastq_forward_*.fastq_forward_*"
}
,
"fastq_reverse": {
"type":
"string",
"description": "Type: List of `file`, default: `$id.$key.fastq_reverse_*.fastq_reverse_*`, multiple_sep: `\";\"`. ",
"help_text": "Type: List of `file`, default: `$id.$key.fastq_reverse_*.fastq_reverse_*`, multiple_sep: `\";\"`. "
,
"default":"$id.$key.fastq_reverse_*.fastq_reverse_*"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "gather_fastqs_and_validate",
"description": "From a directory containing fastq files, gather the files per sample \nand validate according to the contents of the sample sheet.\n",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Directory containing .fastq files",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"sample_sheet": {
"type": "string",
"format": "path",
"exists": true,
"description": "Sample sheet",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_forward": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, required, default: `\"$id.$key.fastq_forward_*\"`, direction: `output`. ",
"default": "$id.$key.fastq_forward_*"
},
"fastq_reverse": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.fastq_reverse_*\"`, direction: `output`. ",
"default": "$id.$key.fastq_reverse_*"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,5 +1,5 @@
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -71,7 +71,7 @@ argument_groups:
create_parent: true
required: false
direction: "output"
multiple: false
multiple: true
multiple_sep: ";"
- type: "file"
name: "--output_multiqc"
@@ -96,6 +96,25 @@ argument_groups:
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--demultiplexer_logs"
info: null
default:
- "$id/demultiplexer_logs"
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "boolean_true"
name: "--skip_copycomplete_check"
description: "Disable the check for the presence of a \"CopyComplete.txt\" file\
\ in input\ndirectory in case of Illumina data.\n"
info: null
direction: "input"
resources:
- type: "nextflow_script"
path: "main.nf"
@@ -116,6 +135,9 @@ test_resources:
entrypoint: "test_bases2fastq"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -136,27 +158,27 @@ dependencies:
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
tag: "v0.3.1"
- name: "bases2fastq"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
tag: "v0.3.1"
- name: "falco"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
tag: "v0.3.1"
- name: "multiqc"
repository:
type: "vsh"
repo: "biobox"
tag: "v0.3.0"
tag: "v0.3.1"
repositories:
- type: "vsh"
name: "bb"
repo: "biobox"
tag: "v0.3.0"
tag: "v0.3.1"
license: "MIT"
links:
repository: "https://github.com/viash-hub/demultiplex"
@@ -235,38 +257,38 @@ build_info:
engine: "native|native"
output: "target/nextflow/demultiplex"
executable: "target/nextflow/demultiplex/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
dependencies:
- "target/nextflow/io/untar"
- "target/nextflow/dataflow/gather_fastqs_and_validate"
- "target/nextflow/io/interop_summary_to_csv"
- "target/nextflow/dataflow/combine_samples"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bcl_convert"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bases2fastq"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/falco"
- "target/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/multiqc"
- "target/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/bcl_convert"
- "target/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/bases2fastq"
- "target/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/falco"
- "target/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/multiqc"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// demultiplex v0.3.4
// demultiplex v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2805,7 +3031,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2884,7 +3110,7 @@ meta = [
"create_parent" : true,
"required" : false,
"direction" : "output",
"multiple" : false,
"multiple" : true,
"multiple_sep" : ";"
},
{
@@ -2913,6 +3139,30 @@ meta = [
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--demultiplexer_logs",
"default" : [
"$id/demultiplexer_logs"
],
"must_exist" : true,
"create_parent" : true,
"required" : true,
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
}
]
},
{
"name" : "Other arguments",
"arguments" : [
{
"type" : "boolean_true",
"name" : "--skip_copycomplete_check",
"description" : "Disable the check for the presence of a \\"CopyComplete.txt\\" file in input\ndirectory in case of Illumina data.\n",
"direction" : "input"
}
]
}
@@ -2946,6 +3196,10 @@ meta = [
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -2981,7 +3235,7 @@ meta = [
"repository" : {
"type" : "vsh",
"repo" : "biobox",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
}
},
{
@@ -2989,7 +3243,7 @@ meta = [
"repository" : {
"type" : "vsh",
"repo" : "biobox",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
}
},
{
@@ -2997,7 +3251,7 @@ meta = [
"repository" : {
"type" : "vsh",
"repo" : "biobox",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
}
},
{
@@ -3005,7 +3259,7 @@ meta = [
"repository" : {
"type" : "vsh",
"repo" : "biobox",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
}
}
],
@@ -3014,7 +3268,7 @@ meta = [
"type" : "vsh",
"name" : "bb",
"repo" : "biobox",
"tag" : "v0.3.0"
"tag" : "v0.3.1"
}
],
"license" : "MIT",
@@ -3108,31 +3362,31 @@ meta = [
"runner" : "nextflow",
"engine" : "native|native",
"output" : "target/nextflow/demultiplex",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3156,10 +3410,10 @@ include { untar } from "${meta.resources_dir}/../../nextflow/io/untar/main.nf"
include { gather_fastqs_and_validate } from "${meta.resources_dir}/../../nextflow/dataflow/gather_fastqs_and_validate/main.nf"
include { interop_summary_to_csv } from "${meta.resources_dir}/../../nextflow/io/interop_summary_to_csv/main.nf"
include { combine_samples } from "${meta.resources_dir}/../../nextflow/dataflow/combine_samples/main.nf"
include { bcl_convert } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bcl_convert/main.nf"
include { bases2fastq } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/bases2fastq/main.nf"
include { falco } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/falco/main.nf"
include { multiqc } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.0/nextflow/multiqc/main.nf"
include { bcl_convert } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/bcl_convert/main.nf"
include { bases2fastq } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/bases2fastq/main.nf"
include { falco } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/falco/main.nf"
include { multiqc } from "${meta.root_dir}/dependencies/vsh/vsh/biobox/v0.3.1/nextflow/multiqc/main.nf"
// inner workflow
// user-provided Nextflow code
@@ -3259,6 +3513,10 @@ workflow run_wf {
// step based on the run dir, not the InterOp dir.
def interop_dir = state.input.resolve("InterOp")
assert interop_dir.isDirectory(): "Expected InterOp directory to be present."
def copycomplete_file = state.input.resolve("CopyComplete.txt")
assert (copycomplete_file.isFile() || state.skip_copycomplete_check):
"'CopyComplete.txt' file was not found!"
}
def resultState = state + newState
@@ -3285,14 +3543,15 @@ workflow run_wf {
bcl_input_directory: state.input,
sample_sheet: state.run_information,
output_directory: state.output,
reports: "reports",
logs: "logs"
reports: state.demultiplexer_logs,
logs: state.demultiplexer_logs,
]
},
toState: {id, result, state ->
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.reports,
]
def newState = state + toAdd
return newState
@@ -3302,11 +3561,15 @@ workflow run_wf {
| bases2fastq.run(
runIf: {id, state -> state.demultiplexer in ["bases2fastq"]},
directives: [label: ["highmem", "midcpu"]],
fromState: [
"analysis_directory": "input",
"run_manifest": "run_information",
"output_directory": "output",
],
fromState: { id, state ->
[
"analysis_directory": state.input,
"run_manifest": state.run_information,
"output_directory": state.output,
"report": state.demultiplexer_logs + "/report.html",
"logs": state.demultiplexer_logs,
]
},
args: [
"no_projects": true, // Do not put output files in a subfolder for project
//"split_lanes": true,
@@ -3317,6 +3580,8 @@ workflow run_wf {
def toAdd = [
"output_demultiplexer" : result.output_directory,
"run_id": id,
"demultiplexer_logs": result.logs,
]
def newState = state + toAdd
return newState
@@ -3334,28 +3599,12 @@ workflow run_wf {
)
output_ch = samples_ch
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
]
)
| falco.run(
directives: [label: ["lowcpu", "lowmem"]],
directives: [label: ["verylowcpu", "lowmem"]],
fromState: {id, state ->
reverse_fastqs_list = state.reverse_fastqs ? state.reverse_fastqs : []
[
"input": state.forward_fastqs + reverse_fastqs_list,
"outdir": "${state.output_falco}",
"input": [state.fastq_forward, state.fastq_reverse],
"outdir": "$id/qc/falco",
"summary_filename": null,
"report_filename": null,
"data_filename": null,
@@ -3365,11 +3614,28 @@ workflow run_wf {
state + [ "output_falco" : result.outdir ]
}
)
| combine_samples.run(
fromState: { id, state ->
[
"id": state.run_id,
"forward_input": state.fastq_forward,
"reverse_input": state.fastq_reverse,
"falco_dir": state.output_falco,
]
},
toState: [
"forward_fastqs": "output_forward",
"reverse_fastqs": "output_reverse",
"output_falco": "output_falco",
]
)
| multiqc.run(
directives: [label: ["midcpu", "midmem"]],
fromState: {id, state ->
def new_state = [
"input": [state.output_falco],
"input": state.output_falco,
"output_report": state.output_multiqc,
"cl_config": 'sp: {fastqc/data: {fn: "*_fastqc_data.txt"}}'
]
@@ -3385,6 +3651,7 @@ workflow run_wf {
state + [ "output_multiqc" : result.output_report ]
}
)
| setState(
[
//"_meta": "_meta",
@@ -3392,6 +3659,7 @@ workflow run_wf {
"output_falco": "output_falco",
"output_multiqc": "output_multiqc",
"output_run_information": "run_information",
"demultiplexer_logs": "demultiplexer_logs"
]
)

View File

@@ -2,7 +2,7 @@ manifest {
name = 'demultiplex'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'Demultiplexing of raw sequencing data'
}

View File

@@ -1,160 +1,128 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "demultiplex",
"description": "Demultiplexing of raw sequencing data",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type":
"string",
"description": "Type: `string`. Unique identifier for the run",
"help_text": "Type: `string`. Unique identifier for the run"
}
,
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory containing raw sequencing data",
"help_text": "Type: `file`, required. Directory containing raw sequencing data"
}
,
"run_information": {
"type":
"string",
"description": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer. Canonically called \u0027SampleSheet.csv\u0027 (Illumina)\nor \u0027RunManifest.csv\u0027 (Element Biosciences). If not specified,\nwill try to autodetect the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
}
,
"demultiplexer": {
"type":
"string",
"description": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data",
"help_text": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"enum": ["bases2fastq", "bclconvert"]
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output.output`. Directory to write fastq data to",
"help_text": "Type: `file`, default: `$id.$key.output.output`. Directory to write fastq data to"
,
"default":"$id.$key.output.output"
}
,
"output_falco": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_falco.output_falco`. Directory to write falco output to",
"help_text": "Type: `file`, default: `$id.$key.output_falco.output_falco`. Directory to write falco output to"
,
"default":"$id.$key.output_falco.output_falco"
}
,
"output_multiqc": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_multiqc.html`. Directory to write falco output to",
"help_text": "Type: `file`, default: `$id.$key.output_multiqc.html`. Directory to write falco output to"
,
"default":"$id.$key.output_multiqc.html"
}
,
"output_run_information": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_run_information.csv`. ",
"help_text": "Type: `file`, required, default: `$id.$key.output_run_information.csv`. "
,
"default":"$id.$key.output_run_information.csv"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "demultiplex",
"description": "Demultiplexing of raw sequencing data",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"id": {
"type": "string",
"description": "Unique identifier for the run",
"help_text": "Type: `string`, multiple: `False`. "
},
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Directory containing raw sequencing data",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"run_information": {
"type": "string",
"format": "path",
"description": "CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`, multiple: `False`, direction: `input`. "
},
"demultiplexer": {
"type": "string",
"description": "Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"help_text": "Type: `string`, multiple: `False`, choices: ``bases2fastq`, `bclconvert``. ",
"enum": [
"bases2fastq",
"bclconvert"
]
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type": "string",
"format": "path",
"description": "Directory to write fastq data to",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id/fastq\"`, direction: `output`. ",
"default": "$id/fastq"
},
"output_falco": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"description": "Directory to write falco output to",
"help_text": "Type: `file`, multiple: `True`, default: `[\"$id/qc/fastqc\"]`, direction: `output`. ",
"default": [
"$id/qc/fastqc"
]
},
"output_multiqc": {
"type": "string",
"format": "path",
"description": "Directory to write falco output to",
"help_text": "Type: `file`, multiple: `False`, default: `\"$id/qc/multiqc_report.html\"`, direction: `output`. ",
"default": "$id/qc/multiqc_report.html"
},
"output_run_information": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id/run_information.csv\"`, direction: `output`. ",
"default": "$id/run_information.csv"
},
"demultiplexer_logs": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id/demultiplexer_logs\"`, direction: `output`. ",
"default": "$id/demultiplexer_logs"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"other arguments": {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"skip_copycomplete_check": {
"type": "boolean",
"description": "Disable the check for the presence of a \"CopyComplete.txt\" file in input\ndirectory in case of Illumina data.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/other arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,6 +1,6 @@
name: "interop_summary_to_csv"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -41,10 +41,21 @@ resources:
- type: "file"
path: "nextflow_labels.config"
dest: "nextflow_labels.config"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
- type: "file"
path: "iseq-DI"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "summary"
- "index-summary"
- "ps"
license: "MIT"
links:
@@ -121,7 +132,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -145,29 +156,29 @@ build_info:
engine: "docker|native"
output: "target/nextflow/io/interop_summary_to_csv"
executable: "target/nextflow/io/interop_summary_to_csv/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// interop_summary_to_csv v0.3.4
// interop_summary_to_csv v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2806,7 +3032,7 @@ meta = [
"config": processConfig(readJsonBlob('''{
"name" : "interop_summary_to_csv",
"namespace" : "io",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2862,9 +3088,26 @@ meta = [
"dest" : "nextflow_labels.config"
}
],
"test_resources" : [
{
"type" : "bash_script",
"path" : "test.sh",
"is_executable" : true
},
{
"type" : "file",
"path" : "/testData/iseq-DI"
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"summary",
"index-summary",
"ps"
]
},
@@ -2955,7 +3198,7 @@ meta = [
"id" : "docker",
"image" : "debian:stable-slim",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.4",
"target_tag" : "v0.3.12",
"namespace_separator" : "/",
"setup" : [
{
@@ -2984,31 +3227,31 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/io/interop_summary_to_csv",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3404,7 +3647,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3418,6 +3661,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3431,7 +3695,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/demultiplex/io/interop_summary_to_csv",
"tag" : "v0.3.4"
"tag" : "v0.3.12"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'io/interop_summary_to_csv'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
}
process.container = 'nextflow/bash:latest'

View File

@@ -1,106 +1,66 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "interop_summary_to_csv",
"description": "No description",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Sequencing run folder (*not* InterOp folder)",
"help_text": "Type: `file`, required. Sequencing run folder (*not* InterOp folder)."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_run_summary": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_run_summary.output_run_summary`. ",
"help_text": "Type: `file`, required, default: `$id.$key.output_run_summary.output_run_summary`. "
,
"default":"$id.$key.output_run_summary.output_run_summary"
}
,
"output_index_summary": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output_index_summary.output_index_summary`. ",
"help_text": "Type: `file`, required, default: `$id.$key.output_index_summary.output_index_summary`. "
,
"default":"$id.$key.output_index_summary.output_index_summary"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "interop_summary_to_csv",
"description": "No description",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Sequencing run folder (*not* InterOp folder).",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output_run_summary": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output_run_summary\"`, direction: `output`. ",
"default": "$id.$key.output_run_summary"
},
"output_index_summary": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output_index_summary\"`, direction: `output`. ",
"default": "$id.$key.output_index_summary"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,6 +1,6 @@
name: "publish"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -22,7 +22,7 @@ argument_groups:
create_parent: true
required: true
direction: "input"
multiple: false
multiple: true
multiple_sep: ";"
- type: "file"
name: "--input_multiqc"
@@ -44,6 +44,15 @@ argument_groups:
direction: "input"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--input_demultiplexer_logs"
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
@@ -90,6 +99,17 @@ argument_groups:
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--output_demultiplexer_logs"
info: null
default:
- "demultiplexer_logs"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "code.sh"
@@ -100,6 +120,9 @@ resources:
description: "Publish the processed results of the run"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -178,7 +201,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -195,29 +218,29 @@ build_info:
engine: "docker|native"
output: "target/nextflow/io/publish"
executable: "target/nextflow/io/publish/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// publish v0.3.4
// publish v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2806,7 +3032,7 @@ meta = [
"config": processConfig(readJsonBlob('''{
"name" : "publish",
"namespace" : "io",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2830,7 +3056,7 @@ meta = [
"create_parent" : true,
"required" : true,
"direction" : "input",
"multiple" : false,
"multiple" : true,
"multiple_sep" : ";"
},
{
@@ -2854,6 +3080,16 @@ meta = [
"direction" : "input",
"multiple" : false,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--input_demultiplexer_logs",
"must_exist" : true,
"create_parent" : true,
"required" : true,
"direction" : "input",
"multiple" : false,
"multiple_sep" : ";"
}
]
},
@@ -2911,6 +3147,19 @@ meta = [
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--output_demultiplexer_logs",
"default" : [
"demultiplexer_logs"
],
"must_exist" : true,
"create_parent" : true,
"required" : false,
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
}
]
}
@@ -2929,6 +3178,10 @@ meta = [
],
"description" : "Publish the processed results of the run",
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3021,7 +3274,7 @@ meta = [
"id" : "docker",
"image" : "debian:stable-slim",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.4",
"target_tag" : "v0.3.12",
"namespace_separator" : "/",
"setup" : [
{
@@ -3043,31 +3296,31 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/io/publish",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3100,10 +3353,12 @@ $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'
$( if [ ! -z ${VIASH_PAR_INPUT_FALCO+x} ]; then echo "${VIASH_PAR_INPUT_FALCO}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input_falco='&'#" ; else echo "# par_input_falco="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_MULTIQC+x} ]; then echo "${VIASH_PAR_INPUT_MULTIQC}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input_multiqc='&'#" ; else echo "# par_input_multiqc="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_RUN_INFORMATION+x} ]; then echo "${VIASH_PAR_INPUT_RUN_INFORMATION}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input_run_information='&'#" ; else echo "# par_input_run_information="; fi )
$( if [ ! -z ${VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS+x} ]; then echo "${VIASH_PAR_INPUT_DEMULTIPLEXER_LOGS}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input_demultiplexer_logs='&'#" ; else echo "# par_input_demultiplexer_logs="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_FALCO+x} ]; then echo "${VIASH_PAR_OUTPUT_FALCO}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output_falco='&'#" ; else echo "# par_output_falco="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_MULTIQC+x} ]; then echo "${VIASH_PAR_OUTPUT_MULTIQC}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output_multiqc='&'#" ; else echo "# par_output_multiqc="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_RUN_INFORMATION+x} ]; then echo "${VIASH_PAR_OUTPUT_RUN_INFORMATION}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output_run_information='&'#" ; else echo "# par_output_run_information="; fi )
$( if [ ! -z ${VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS+x} ]; then echo "${VIASH_PAR_OUTPUT_DEMULTIPLEXER_LOGS}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output_demultiplexer_logs='&'#" ; else echo "# par_output_demultiplexer_logs="; fi )
$( if [ ! -z ${VIASH_META_NAME+x} ]; then echo "${VIASH_META_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_name='&'#" ; else echo "# meta_name="; fi )
$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
@@ -3129,9 +3384,9 @@ $( if [ ! -z ${VIASH_META_MEMORY_PIB+x} ]; then echo "${VIASH_META_MEMORY_PIB}"
set -eo pipefail
declare -A input_output_mapping=(["par_input"]="par_output"
["par_input_falco"]="par_output_falco"
["par_input_multiqc"]="par_output_multiqc"
["par_input_run_information"]="par_output_run_information"
["par_input_demultiplexer_logs"]="par_output_demultiplexer_logs"
)
for input_argument_name in "\\${!input_output_mapping[@]}"
@@ -3150,6 +3405,14 @@ do
echo "Output files for \\$output_location:"
ls "\\$output_location"
done
echo "Grouping output from \\$par_input_falco into \\$par_output_falco"
mkdir -p "\\$par_output_falco"
IFS=";" read -ra falco_inputs <<< \\$par_input_falco
for falco_dir in "\\${falco_inputs[@]}"; do
echo "Copying contents of \\$falco_dir"
find -H -D exec "\\$falco_dir" -type f -maxdepth 1 -exec cp -t "\\$par_output_falco" {} +
done
VIASHMAIN
bash "$tempscript"
'''
@@ -3484,7 +3747,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3498,6 +3761,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3511,7 +3795,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/demultiplex/io/publish",
"tag" : "v0.3.4"
"tag" : "v0.3.12"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'io/publish'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'Publish the processed results of the run'
}

View File

@@ -1,158 +1,118 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "publish",
"description": "Publish the processed results of the run",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Directory to write fastq data to",
"help_text": "Type: `file`, required. Directory to write fastq data to"
}
,
"input_falco": {
"type":
"string",
"description": "Type: `file`, required. Directory to write falco output to",
"help_text": "Type: `file`, required. Directory to write falco output to"
}
,
"input_multiqc": {
"type":
"string",
"description": "Type: `file`, required. Location where to write the MultiQC report to",
"help_text": "Type: `file`, required. Location where to write the MultiQC report to."
}
,
"input_run_information": {
"type":
"string",
"description": "Type: `file`, required. Location where to write the run information to",
"help_text": "Type: `file`, required. Location where to write the run information to."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output.output`. ",
"help_text": "Type: `file`, default: `$id.$key.output.output`. "
,
"default":"$id.$key.output.output"
}
,
"output_falco": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_falco.output_falco`. ",
"help_text": "Type: `file`, default: `$id.$key.output_falco.output_falco`. "
,
"default":"$id.$key.output_falco.output_falco"
}
,
"output_multiqc": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_multiqc.html`. ",
"help_text": "Type: `file`, default: `$id.$key.output_multiqc.html`. "
,
"default":"$id.$key.output_multiqc.html"
}
,
"output_run_information": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output_run_information.csv`. ",
"help_text": "Type: `file`, default: `$id.$key.output_run_information.csv`. "
,
"default":"$id.$key.output_run_information.csv"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "publish",
"description": "Publish the processed results of the run",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Directory to write fastq data to",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"input_falco": {
"type": "array",
"items": {
"type": "string"
},
"format": "path",
"exists": true,
"description": "Directory to write falco output to",
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`. "
},
"input_multiqc": {
"type": "string",
"format": "path",
"exists": true,
"description": "Location where to write the MultiQC report to.",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"input_run_information": {
"type": "string",
"format": "path",
"exists": true,
"description": "Location where to write the run information to.",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"input_demultiplexer_logs": {
"type": "string",
"format": "path",
"exists": true,
"description": "",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"fastq\"`, direction: `output`. ",
"default": "fastq"
},
"output_falco": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"qc/fastqc\"`, direction: `output`. ",
"default": "qc/fastqc"
},
"output_multiqc": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"qc/multiqc_report.html\"`, direction: `output`. ",
"default": "qc/multiqc_report.html"
},
"output_run_information": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"run_information.csv\"`, direction: `output`. ",
"default": "run_information.csv"
},
"output_demultiplexer_logs": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"demultiplexer_logs\"`, direction: `output`. ",
"default": "demultiplexer_logs"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,6 +1,6 @@
name: "untar"
namespace: "io"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -57,6 +57,9 @@ test_resources:
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -135,7 +138,7 @@ engines:
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "v0.3.4"
target_tag: "v0.3.12"
namespace_separator: "/"
setup:
- type: "apt"
@@ -152,29 +155,29 @@ build_info:
engine: "docker|native"
output: "target/nextflow/io/untar"
executable: "target/nextflow/io/untar/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// untar v0.3.4
// untar v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2806,7 +3032,7 @@ meta = [
"config": processConfig(readJsonBlob('''{
"name" : "untar",
"namespace" : "io",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2882,6 +3108,10 @@ meta = [
}
],
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -2974,7 +3204,7 @@ meta = [
"id" : "docker",
"image" : "debian:stable-slim",
"target_registry" : "images.viash-hub.com",
"target_tag" : "v0.3.4",
"target_tag" : "v0.3.12",
"namespace_separator" : "/",
"setup" : [
{
@@ -2996,31 +3226,31 @@ meta = [
"runner" : "nextflow",
"engine" : "docker|native",
"output" : "target/nextflow/io/untar",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3446,7 +3676,7 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
// create process from temp file
def binding = new nextflow.script.ScriptBinding([:])
def session = nextflow.Nextflow.getSession()
def parser = new nextflow.script.ScriptParser(session)
def parser = _getScriptLoader(session)
.setModule(true)
.setBinding(binding)
def moduleScript = parser.runScript(tempFile)
@@ -3460,6 +3690,27 @@ def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
return scriptMeta.getProcess(procKey)
}
// use Reflection to get a ScriptParser / ScriptLoader
// <25.02.0-edge: new nextflow.script.ScriptParser(session)
// >=25.02.0-edge: nextflow.script.ScriptLoaderFactory.create(session)
def _getScriptLoader(nextflow.Session session) {
// try using the old method
try {
Class<?> scriptParserClass = Class.forName('nextflow.script.ScriptParser')
return scriptParserClass.getDeclaredConstructor(nextflow.Session).newInstance(session)
} catch (ClassNotFoundException e) {
// else try with the new method
try {
Class<?> scriptLoaderFactoryClass = Class.forName('nextflow.script.ScriptLoaderFactory')
def createMethod = scriptLoaderFactoryClass.getDeclaredMethod('create', nextflow.Session)
return createMethod.invoke(null, session) // null because create is static
} catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException | java.lang.reflect.InvocationTargetException e2) {
// Handle the case where neither class is found
throw new Exception("Neither nextflow.script.ScriptParser nor nextflow.script.ScriptLoaderFactory could be found. Is this a compatible Nextflow version?", e2)
}
}
}
// defaults
meta["defaults"] = [
// key to be used to trace the process and determine output names
@@ -3473,7 +3724,7 @@ meta["defaults"] = [
"container" : {
"registry" : "images.viash-hub.com",
"image" : "vsh/demultiplex/io/untar",
"tag" : "v0.3.4"
"tag" : "v0.3.12"
},
"tag" : "$id"
}'''),

View File

@@ -2,7 +2,7 @@ manifest {
name = 'io/untar'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n'
}

View File

@@ -1,119 +1,74 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "untar",
"description": "Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Tarball file to be unpacked",
"help_text": "Type: `file`, required. Tarball file to be unpacked."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output.output`. Directory to write the contents of the ",
"help_text": "Type: `file`, required, default: `$id.$key.output.output`. Directory to write the contents of the .tar file to."
,
"default":"$id.$key.output.output"
}
}
},
"other arguments" : {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"exclude": {
"type":
"string",
"description": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted",
"help_text": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted."
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "untar",
"description": "Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Tarball file to be unpacked.",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type": "string",
"format": "path",
"description": "Directory to write the contents of the .tar file to.",
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output\"`, direction: `output`. ",
"default": "$id.$key.output"
}
}
},
{
"$ref": "#/definitions/other arguments"
"other arguments": {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"exclude": {
"type": "string",
"description": "Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted.",
"help_text": "Type: `string`, multiple: `False`, example: `\"docs/figures\"`. "
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/other arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}

View File

@@ -1,5 +1,5 @@
name: "runner"
version: "v0.3.4"
version: "v0.3.12"
argument_groups:
- name: "Input arguments"
arguments:
@@ -84,6 +84,25 @@ argument_groups:
direction: "output"
multiple: false
multiple_sep: ";"
- type: "file"
name: "--demultiplexer_logs"
info: null
default:
- "demultiplexer_logs"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "boolean_true"
name: "--skip_copycomplete_check"
description: "Disable the check for the presence of a \"CopyComplete.txt\" file\
\ in input\ndirectory in case of Illumina data.\n"
info: null
direction: "input"
resources:
- type: "nextflow_script"
path: "main.nf"
@@ -95,6 +114,9 @@ resources:
description: "Runner for demultiplexing of raw sequencing data"
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
@@ -183,32 +205,32 @@ build_info:
engine: "native|native"
output: "target/nextflow/runner"
executable: "target/nextflow/runner/main.nf"
viash_version: "0.9.0"
git_commit: "e6f8b20d70be8f47907c08cc3c7802920cc0eee4"
git_remote: "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex"
git_tag: "v0.3.3-3-ge6f8b20"
viash_version: "0.9.4"
git_commit: "3a84e5409c5758611fff4eb585736f3737c8b896"
git_remote: "https://github.com/viash-hub/demultiplex"
git_tag: "v0.3.11-2-g3a84e54"
dependencies:
- "target/nextflow/demultiplex"
- "target/nextflow/io/publish"
package_config:
name: "demultiplex"
version: "v0.3.4"
version: "v0.3.12"
description: "Demultiplexing pipeline\n"
info:
test_resources:
- path: "gs://viash-hub-test-data/demultiplex/v2/"
- path: "gs://viash-hub-resources/demultiplex/v3"
dest: "testData"
viash_version: "0.9.0"
viash_version: "0.9.4"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag\
- ".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag\
\ := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n\
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
)'\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.4'"
- ".engines[.type == 'docker'].target_tag := 'v0.3.12'"
keywords:
- "bioinformatics"
- "sequence"

View File

@@ -1,6 +1,6 @@
// runner v0.3.4
// runner v0.3.12
//
// This wrapper script is auto-generated by viash 0.9.0 and is thus a derivative
// This wrapper script is auto-generated by viash 0.9.4 and is thus a derivative
// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
// Intuitive.
//
@@ -82,64 +82,56 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
foundClass = "List[${e.foundClass}]"
}
} else if (par.type == "string") {
// cast to string if need be
// cast to string if need be. only cast if the value is a GString
if (value instanceof GString) {
value = value.toString()
value = value as String
}
expectedClass = value instanceof String ? null : "String"
} else if (par.type == "integer") {
// cast to integer if need be
if (value instanceof String) {
if (value !instanceof Integer) {
try {
value = value.toInteger()
value = value as Integer
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Integer"
}
}
if (value instanceof java.math.BigInteger) {
value = value.intValue()
}
expectedClass = value instanceof Integer ? null : "Integer"
} else if (par.type == "long") {
// cast to long if need be
if (value instanceof String) {
if (value !instanceof Long) {
try {
value = value.toLong()
value = value as Long
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Long"
}
}
if (value instanceof Integer) {
value = value.toLong()
}
expectedClass = value instanceof Long ? null : "Long"
} else if (par.type == "double") {
// cast to double if need be
if (value instanceof String) {
if (value !instanceof Double) {
try {
value = value.toDouble()
value = value as Double
} catch (NumberFormatException e) {
// do nothing
expectedClass = "Double"
}
}
if (value instanceof java.math.BigDecimal) {
value = value.doubleValue()
} else if (par.type == "float") {
// cast to float if need be
if (value !instanceof Float) {
try {
value = value as Float
} catch (NumberFormatException e) {
expectedClass = "Float"
}
}
if (value instanceof Float) {
value = value.toDouble()
}
expectedClass = value instanceof Double ? null : "Double"
} else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
// cast to boolean if need be
if (value instanceof String) {
def valueLower = value.toLowerCase()
if (valueLower == "true") {
value = true
} else if (valueLower == "false") {
value = false
if (value !instanceof Boolean) {
try {
value = value as Boolean
} catch (Exception e) {
expectedClass = "Boolean"
}
}
expectedClass = value instanceof Boolean ? null : "Boolean"
} else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
// cast to path if need be
if (value instanceof String) {
@@ -151,10 +143,13 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
expectedClass = value instanceof Path ? null : "Path"
} else if (par.type == "file" && stage == "input" && par.direction == "output") {
// cast to string if need be
if (value instanceof GString) {
value = value.toString()
if (value !instanceof String) {
try {
value = value as String
} catch (Exception e) {
expectedClass = "String"
}
}
expectedClass = value instanceof String ? null : "String"
} else {
// didn't find a match for par.type
expectedClass = par.type
@@ -173,7 +168,7 @@ def _checkArgumentType(String stage, Map par, Object value, String errorIdentifi
Map _processInputValues(Map inputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.required) {
if (arg.required && arg.direction == "input") {
assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
}
@@ -192,15 +187,8 @@ Map _processInputValues(Map inputs, Map config, String id, String key) {
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/arguments/_processOutputValues.nf'
Map _processOutputValues(Map outputs, Map config, String id, String key) {
Map _checkValidOutputArgument(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
outputs = outputs.collectEntries { name, value ->
def par = config.allArguments.find { it.plainName == name && it.direction == "output" }
assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
@@ -213,6 +201,16 @@ Map _processOutputValues(Map outputs, Map config, String id, String key) {
return outputs
}
void _checkAllRequiredOuputsPresent(Map outputs, Map config, String id, String key) {
if (!workflow.stubRun) {
config.allArguments.each { arg ->
if (arg.direction == "output" && arg.required) {
assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null :
"Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
}
}
}
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/channel/IDChecker.nf'
class IDChecker {
final def items = [] as Set
@@ -1666,6 +1664,162 @@ def joinStates(Closure apply_) {
}
return joinStatesWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishFiles.nf'
def publishFiles(Map args) {
def key_ = args.get("key")
assert key_ != null : "publishFiles: key must be specified"
workflow publishFilesWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1]
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
[id_, inputFiles_, outputFilenames_]
}
| publishFilesProc
emit: input_ch
}
return publishFilesWf
}
process publishFilesProc {
// todo: check publishpath?
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
output:
tuple val(id), path{outputFiles}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
}
// this assumes that the state contains no other values other than those specified in the config
def publishFilesByConfig(Map args) {
def config = args.get("config")
assert config != null : "publishFilesByConfig: config must be specified"
def key_ = args.get("key", config.name)
assert key_ != null : "publishFilesByConfig: key must be specified"
workflow publishFilesSimpleWf {
take: input_ch
main:
input_ch
| map { tup ->
def id_ = tup[0]
def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
.collectMany { par ->
def plainName_ = par.plainName
// if the state does not contain the key, it's an
// optional argument for which the component did
// not generate any output OR multiple channels were emitted
// and the output was just not added to using the channel
// that is now being parsed
if (!state_.containsKey(plainName_)) {
return []
}
def value = state_[plainName_]
// if the parameter is not a file, it should be stored
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[inputPath: [], outputFilename: []]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
// that it should not be returned as a state
if (!origState_.containsKey(plainName_)) {
return []
}
def filenameTemplate = origState_[plainName_]
// if the pararameter is multiple: true, fetch the template
if (par.multiple && filenameTemplate instanceof List) {
filenameTemplate = filenameTemplate[0]
}
// instantiate the template
def filename = filenameTemplate
.replaceAll('\\$id', id_)
.replaceAll('\\$\\{id\\}', id_)
.replaceAll('\\$key', key_)
.replaceAll('\\$\\{key\\}', key_)
if (par.multiple) {
// if the parameter is multiple: true, the filename
// should contain a wildcard '*' that is replaced with
// the index of the file
assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
def outputPerFile = value.withIndex().collect{ val, ix ->
def filename_ix = filename.replace("*", ix.toString())
def inputPath = val instanceof File ? val.toPath() : val
[inputPath: inputPath, outputFilename: filename_ix]
}
def transposedOutputs = ["inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
} else {
def value_ = java.nio.file.Paths.get(filename)
def inputPath = value instanceof File ? value.toPath() : value
return [[inputPath: [inputPath], outputFilename: [filename]]]
}
}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
[id_, inputPaths, outputFilenames]
}
| publishFilesProc
emit: input_ch
}
return publishFilesSimpleWf
}
// helper file: 'src/main/resources/io/viash/runners/nextflow/states/publishStates.nf'
def collectFiles(obj) {
if (obj instanceof java.io.File || obj instanceof Path) {
@@ -1723,8 +1877,6 @@ def publishStates(Map args) {
// the input files and the target output filenames
def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
def inputFiles_ = inputoutputFilenames_[0]
def outputFilenames_ = inputoutputFilenames_[1]
def yamlFilename = yamlTemplate_
.replaceAll('\\$id', id_)
@@ -1737,7 +1889,7 @@ def publishStates(Map args) {
// convert state to yaml blob
def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
[id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -1749,33 +1901,17 @@ process publishStatesProc {
publishDir path: "${getPublishDir()}/", mode: "copy"
tag "$id"
input:
tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
tuple val(id), val(yamlBlob), val(yamlFile)
output:
tuple val(id), path{[yamlFile] + outputFiles}
tuple val(id), path{[yamlFile]}
script:
def copyCommands = [
inputFiles instanceof List ? inputFiles : [inputFiles],
outputFiles instanceof List ? outputFiles : [outputFiles]
]
.transpose()
.collectMany{infile, outfile ->
if (infile.toString() != outfile.toString()) {
[
"[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
"cp -r '${infile.toString()}' '${outfile.toString()}'"
]
} else {
// no need to copy if infile is the same as outfile
[]
}
}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
echo '${yamlBlob}' > '${yamlFile}'
echo "Copying output files to destination folder"
${copyCommands.join("\n ")}
"""
mkdir -p "\$(dirname '${yamlFile}')"
echo "Storing state as yaml"
cat > '${yamlFile}' << HERE
${yamlBlob}
HERE
"""
}
@@ -1806,13 +1942,10 @@ def publishStatesByConfig(Map args) {
.replaceAll('\\$\\{key\\}', key_)
def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
// the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
// the processed state is a list of [key, value] tuples, where
// - key is a String
// - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
// - inputPath is a List[Path]
// - outputFilename is a List[String]
// - (key, value) are the tuples that will be saved to the state.yaml file
// - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
def processedState =
config.allArguments
.findAll { it.direction == "output" }
@@ -1829,7 +1962,7 @@ def publishStatesByConfig(Map args) {
// in the state as-is, but is not something that needs
// to be copied from the source path to the dest path
if (par.type != "file") {
return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
return [[key: plainName_, value: value]]
}
// if the orig state does not contain this filename,
// it's an optional argument for which the user specified
@@ -1860,13 +1993,9 @@ def publishStatesByConfig(Map args) {
if (yamlDir != null) {
value_ = yamlDir.relativize(value_)
}
def inputPath = val instanceof File ? val.toPath() : val
[value: value_, inputPath: inputPath, outputFilename: filename_ix]
return value_
}
def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key ->
[key, outputPerFile.collect{dic -> dic[key]}]
}
return [[key: plainName_] + transposedOutputs]
return [["key": plainName_, "value": outputPerFile]]
} else {
def value_ = java.nio.file.Paths.get(filename)
// if id contains a slash
@@ -1874,18 +2003,17 @@ def publishStatesByConfig(Map args) {
value_ = yamlDir.relativize(value_)
}
def inputPath = value instanceof File ? value.toPath() : value
return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
return [["key": plainName_, value: value_]]
}
}
def updatedState_ = processedState.collectEntries{[it.key, it.value]}
def inputPaths = processedState.collectMany{it.inputPath}
def outputFilenames = processedState.collectMany{it.outputFilename}
// convert state to yaml blob
def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
[id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
[id_, yamlBlob_, yamlFilename]
}
| publishStatesProc
emit: input_ch
@@ -2559,7 +2687,8 @@ def _debug(workflowArgs, debugKey) {
def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
def key_ = workflowArgs["key"]
def multipleArgs = meta.config.allArguments.findAll{ it.multiple }.collect{it.plainName}
workflow workflowInstance {
take: input_
@@ -2716,12 +2845,36 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
// TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
def chInitialOutput = chArgsWithDefaults
def chInitialOutputMulti = chArgsWithDefaults
| _debug(workflowArgs, "processed")
// run workflow
| innerWorkflowFactory(workflowArgs)
// check output tuple
| map { id_, output_ ->
def chInitialOutputList = chInitialOutputMulti instanceof List ? chInitialOutputMulti : [chInitialOutputMulti]
assert chInitialOutputList.size() > 0: "should have emitted at least one output channel"
// Add a channel ID to the events, which designates the channel the event was emitted from as a running number
// This number is used to sort the events later when the events are gathered from across the channels.
def chInitialOutputListWithIndexedEvents = chInitialOutputList.withIndex().collect{channel, channelIndex ->
def newChannel = channel
| map {tuple ->
assert tuple instanceof List :
"Error in module '${key_}': element in output channel should be a tuple [id, data, ...otherargs...]\n" +
" Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
" Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
def newEvent = [channelIndex] + tuple
return newEvent
}
return newChannel
}
// Put the events into 1 channel, cover case where there is only one channel is emitted
def chInitialOutput = chInitialOutputList.size() > 1 ? \
chInitialOutputListWithIndexedEvents[0].mix(*chInitialOutputListWithIndexedEvents.tail()) : \
chInitialOutputListWithIndexedEvents[0]
def chInitialOutputProcessed = chInitialOutput
| map { tuple ->
def channelId = tuple[0]
def id_ = tuple[1]
def output_ = tuple[2]
// see if output map contains metadata
def meta_ =
@@ -2734,19 +2887,94 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
output_ = output_.findAll{k, v -> k != "_meta"}
// check value types
output_ = _processOutputValues(output_, meta.config, id_, key_)
output_ = _checkValidOutputArgument(output_, meta.config, id_, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
output_ = output_.values()[0]
}
[join_id, id_, output_]
[join_id, channelId, id_, output_]
}
// | view{"chInitialOutput: ${it.take(3)}"}
// join the output [prev_id, channel_id, new_id, output] with the previous state [prev_id, state, ...]
def chPublishWithPreviousState = safeJoin(chInitialOutputProcessed, chRunFiltered, key_)
// input tuple format: [join_id, channel_id, id, output, prev_state, ...]
// output tuple format: [join_id, channel_id, id, new_state, ...]
| map{ tup ->
def new_state = workflowArgs.toState(tup.drop(2).take(3))
tup.take(3) + [new_state] + tup.drop(5)
}
if (workflowArgs.auto.publish == "state") {
def chPublishFiles = chPublishWithPreviousState
// input tuple format: [join_id, channel_id, id, new_state, ...]
// output tuple format: [join_id, channel_id, id, new_state]
| map{ tup ->
tup.take(4)
}
safeJoin(chPublishFiles, chArgsWithDefaults, key_)
// input tuple format: [join_id, channel_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(2).take(3)
}
| publishFilesByConfig(key: key_, config: meta.config)
}
// Join the state from the events that were emitted from different channels
def chJoined = chInitialOutputProcessed
| map {tuple ->
def join_id = tuple[0]
def channel_id = tuple[1]
def id = tuple[2]
def other = tuple.drop(3)
// Below, groupTuple is used to join the events. To make sure resuming a workflow
// keeps working, the output state must be deterministic. This means the state needs to be
// sorted with groupTuple's has a 'sort' argument. This argument can be set to 'hash',
// but hashing the state when it is large can be problematic in terms of performance.
// Therefore, a custom comparator function is provided. We add the channel ID to the
// states so that we can use the channel ID to sort the items.
def stateWithChannelID = [[channel_id] * other.size(), other].transpose()
// A comparator that is provided to groupTuple's 'sort' argument is applied
// to all elements of the event tuple (that is not the 'id'). The comparator
// closure that is used below expects the input to be List. So the join_id and
// channel_id must also be wrapped in a list.
[[join_id], [channel_id], id] + stateWithChannelID
}
| groupTuple(by: 2, sort: {a, b -> a[0] <=> b[0]}, size: chInitialOutputList.size(), remainder: true)
| map {join_ids, _, id, statesWithChannelID ->
// Remove the channel IDs from the states
def states = statesWithChannelID.collect{it[1]}
def newJoinId = join_ids.flatten().unique{a, b -> a <=> b}
assert newJoinId.size() == 1: "Multiple events were emitted for '$id'."
def newJoinIdUnique = newJoinId[0]
// Merge the states from the different channels
def newState = states.inject([:]){ old_state, state_to_add ->
return old_state + state_to_add.collectEntries{k, v ->
if (!multipleArgs.contains(k)) {
// if the key is not a multiple argument, we expect only one value
if (old_state.containsKey(k)) {
assert old_state[k] == v : "ID $id: multiple entries for argument $k were emitted."
}
[k, v]
} else {
// if the key is a multiple argument, append the different values into one list
def prevValue = old_state.getOrDefault(k, [])
def prevValueAsList = prevValue instanceof List ? prevValue : [prevValue]
[k, prevValueAsList + v]
}
}
}
_checkAllRequiredOuputsPresent(newState, meta.config, id, key_)
// simplify output if need be
if (workflowArgs.auto.simplifyOutput && newState.size() == 1) {
newState = newState.values()[0]
}
return [newJoinIdUnique, id, newState]
}
// join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
def chNewState = safeJoin(chInitialOutput, chRunFiltered, key_)
def chNewState = safeJoin(chJoined, chRunFiltered, key_)
// input tuple format: [join_id, id, output, prev_state, ...]
// output tuple format: [join_id, id, new_state, ...]
| map{ tup ->
@@ -2755,23 +2983,21 @@ def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
}
if (workflowArgs.auto.publish == "state") {
def chPublish = chNewState
def chPublishStates = chNewState
// input tuple format: [join_id, id, new_state, ...]
// output tuple format: [join_id, id, new_state]
| map{ tup ->
tup.take(3)
}
safeJoin(chPublish, chArgsWithDefaults, key_)
safeJoin(chPublishStates, chArgsWithDefaults, key_)
// input tuple format: [join_id, id, new_state, orig_state, ...]
// output tuple format: [id, new_state, orig_state]
| map { tup ->
tup.drop(1).take(3)
}
}
| publishStatesByConfig(key: key_, config: meta.config)
}
// remove join_id and meta
chReturn = chNewState
| map { tup ->
// input tuple format: [join_id, id, new_state, ...]
@@ -2805,7 +3031,7 @@ meta = [
"resources_dir": moduleDir.toRealPath().normalize(),
"config": processConfig(readJsonBlob('''{
"name" : "runner",
"version" : "v0.3.4",
"version" : "v0.3.12",
"argument_groups" : [
{
"name" : "Input arguments",
@@ -2899,6 +3125,30 @@ meta = [
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
},
{
"type" : "file",
"name" : "--demultiplexer_logs",
"default" : [
"demultiplexer_logs"
],
"must_exist" : true,
"create_parent" : true,
"required" : false,
"direction" : "output",
"multiple" : false,
"multiple_sep" : ";"
}
]
},
{
"name" : "Other arguments",
"arguments" : [
{
"type" : "boolean_true",
"name" : "--skip_copycomplete_check",
"description" : "Disable the check for the presence of a \\"CopyComplete.txt\\" file in input\ndirectory in case of Illumina data.\n",
"direction" : "input"
}
]
}
@@ -2918,6 +3168,10 @@ meta = [
],
"description" : "Runner for demultiplexing of raw sequencing data",
"status" : "enabled",
"scope" : {
"image" : "public",
"target" : "public"
},
"requirements" : {
"commands" : [
"ps"
@@ -3028,31 +3282,31 @@ meta = [
"runner" : "nextflow",
"engine" : "native|native",
"output" : "target/nextflow/runner",
"viash_version" : "0.9.0",
"git_commit" : "e6f8b20d70be8f47907c08cc3c7802920cc0eee4",
"git_remote" : "https://x-access-token:ghs_FhW4M4TmdkVUw8Om3VZ1e7pO685xf7028pZ3@github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.3-3-ge6f8b20"
"viash_version" : "0.9.4",
"git_commit" : "3a84e5409c5758611fff4eb585736f3737c8b896",
"git_remote" : "https://github.com/viash-hub/demultiplex",
"git_tag" : "v0.3.11-2-g3a84e54"
},
"package_config" : {
"name" : "demultiplex",
"version" : "v0.3.4",
"version" : "v0.3.12",
"description" : "Demultiplexing pipeline\n",
"info" : {
"test_resources" : [
{
"path" : "gs://viash-hub-test-data/demultiplex/v2/",
"path" : "gs://viash-hub-resources/demultiplex/v3",
"dest" : "testData"
}
]
},
"viash_version" : "0.9.0",
"viash_version" : "0.9.4",
"source" : "src",
"target" : "target",
"config_mods" : [
".requirements.commands := ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".requirements.commands += ['ps']\n.runners[.type == 'nextflow'].directives.tag := '$id'\n.resources += {path: '/src/config/labels.config', dest: 'nextflow_labels.config'}\n.runners[.type == 'nextflow'].config.script := 'includeConfig(\\"nextflow_labels.config\\")'\n",
".engines += { type: \\"native\\" }",
".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'",
".engines[.type == 'docker'].target_tag := 'v0.3.4'"
".engines[.type == 'docker'].target_tag := 'v0.3.12'"
],
"keywords" : [
"bioinformatics",
@@ -3103,9 +3357,11 @@ workflow run_wf {
"input": state.input,
"run_information": state.run_information,
"demultiplexer": state.demultiplexer,
"output": "fastq",
"output_falco": "qc/fastqc",
"output_multiqc": "qc/multiqc_report.html",
"skip_copycomplete_check": state.skip_copycomplete_check,
"output": "$id/fastq",
"output_falco": "$id/qc/fastqc",
"output_multiqc": "$id/qc/multiqc_report.html",
"demultiplexer_logs": "$id/demultiplexer_logs",
]
if (state.run_information) {
state_to_pass += ["output_run_information": state.run_information.getName()]
@@ -3126,6 +3382,7 @@ workflow run_wf {
def falco_output_1 = (id2 == "run") ? state.falco_output : "${id2}/" + state.falco_output
def multiqc_output_1 = (id2 == "run") ? state.multiqc_output : "${id2}/" + state.multiqc_output
def run_information_output_1 = (id2 == "run") ? "${state.output_run_information.getName()}" : "${id2}/${state.output_run_information.getName()}"
def demultiplexer_logs_output = (id2 == "run") ? state.demultiplexer_logs : "${id2}/${state.demultiplexer_logs.getName()}"
if (id2 == "run") {
println("Publising to ${params.publish_dir}")
@@ -3138,10 +3395,12 @@ workflow run_wf {
input_falco: state.output_falco,
input_multiqc: state.output_multiqc,
input_run_information: state.output_run_information,
input_demultiplexer_logs: state.demultiplexer_logs,
output: fastq_output_1,
output_falco: falco_output_1,
output_multiqc: multiqc_output_1,
output_run_information: run_information_output_1,
output_demultiplexer_logs: demultiplexer_logs_output,
]
},
toState: { id, result, state -> [:] },

View File

@@ -2,7 +2,7 @@ manifest {
name = 'runner'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'v0.3.4'
version = 'v0.3.12'
description = 'Runner for demultiplexing of raw sequencing data'
}

View File

@@ -1,164 +1,127 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "runner",
"description": "Runner for demultiplexing of raw sequencing data",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Base directory of the canonical form `s3://\u003cbucket\u003e/\u003cpath\u003e/\u003cRunID\u003e/`",
"help_text": "Type: `file`, required. Base directory of the canonical form `s3://\u003cbucket\u003e/\u003cpath\u003e/\u003cRunID\u003e/`.\nA tarball (tar.gz, .tgz, .tar) containing run information can be provided in which\ncase the RunID is set to the name of the tarball without the extension.\n"
}
,
"run_information": {
"type":
"string",
"description": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`. CSV file containing sample information, which will be used as \ninput for the demultiplexer. Canonically called \u0027SampleSheet.csv\u0027 (Illumina)\nor \u0027RunManifest.csv\u0027 (Element Biosciences). If not specified,\nwill try to autodetect the sample sheet in the input directory.\nRequires --demultiplexer to be set.\n"
}
,
"demultiplexer": {
"type":
"string",
"description": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data",
"help_text": "Type: `string`, choices: ``bases2fastq`, `bclconvert``. Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"enum": ["bases2fastq", "bclconvert"]
}
}
},
"annotation flags" : {
"title": "Annotation flags",
"type": "object",
"description": "No description",
"properties": {
"plain_output": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Flag to indicate that the output should be stored directly under $publish_dir rather than\nunder a subdirectory structure runID/\u003cdate_time\u003e_demultiplex_\u003cversion\u003e/",
"help_text": "Type: `boolean_true`, default: `false`. Flag to indicate that the output should be stored directly under $publish_dir rather than\nunder a subdirectory structure runID/\u003cdate_time\u003e_demultiplex_\u003cversion\u003e/.\n"
,
"default":false
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.fastq_output.fastq_output`. ",
"help_text": "Type: `file`, default: `$id.$key.fastq_output.fastq_output`. "
,
"default":"$id.$key.fastq_output.fastq_output"
}
,
"falco_output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.falco_output.falco_output`. ",
"help_text": "Type: `file`, default: `$id.$key.falco_output.falco_output`. "
,
"default":"$id.$key.falco_output.falco_output"
}
,
"multiqc_output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.multiqc_output.html`. ",
"help_text": "Type: `file`, default: `$id.$key.multiqc_output.html`. "
,
"default":"$id.$key.multiqc_output.html"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "runner",
"description": "Runner for demultiplexing of raw sequencing data",
"type": "object",
"$defs": {
"input arguments": {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type": "string",
"format": "path",
"exists": true,
"description": "Base directory of the canonical form `s3://<bucket>/<path>/<RunID>/`.\nA tarball (tar.gz, .tgz, .tar) containing run information can be provided in which\ncase the RunID is set to the name of the tarball without the extension.\n",
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`. "
},
"run_information": {
"type": "string",
"format": "path",
"description": "CSV file containing sample information, which will be used as \ninput for the demultiplexer",
"help_text": "Type: `file`, multiple: `False`, direction: `input`. "
},
"demultiplexer": {
"type": "string",
"description": "Demultiplexer to use, choice depends on the provider\nof the instrument that was used to generate the data.\nWhen not using --sample_sheet, specifying this argument is not\nrequired.\n",
"help_text": "Type: `string`, multiple: `False`, choices: ``bases2fastq`, `bclconvert``. ",
"enum": [
"bases2fastq",
"bclconvert"
]
}
}
},
{
"$ref": "#/definitions/annotation flags"
"annotation flags": {
"title": "Annotation flags",
"type": "object",
"description": "No description",
"properties": {
"plain_output": {
"type": "boolean",
"description": "Flag to indicate that the output should be stored directly under $publish_dir rather than\nunder a subdirectory structure runID/<date_time>_demultiplex_<version>/.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
{
"$ref": "#/definitions/output arguments"
"output arguments": {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"fastq_output": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"fastq\"`, direction: `output`. ",
"default": "fastq"
},
"falco_output": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"qc/fastqc\"`, direction: `output`. ",
"default": "qc/fastqc"
},
"multiqc_output": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"qc/multiqc_report.html\"`, direction: `output`. ",
"default": "qc/multiqc_report.html"
},
"demultiplexer_logs": {
"type": "string",
"format": "path",
"description": "",
"help_text": "Type: `file`, multiple: `False`, default: `\"demultiplexer_logs\"`, direction: `output`. ",
"default": "demultiplexer_logs"
}
}
},
{
"$ref": "#/definitions/nextflow input-output arguments"
"other arguments": {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"skip_copycomplete_check": {
"type": "boolean",
"description": "Disable the check for the presence of a \"CopyComplete.txt\" file in input\ndirectory in case of Illumina data.\n",
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
"default": false
}
}
},
"nextflow input-output arguments": {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type": "string",
"description": "Path to an output directory.",
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
}
}
}
]
},
"allOf": [
{
"$ref": "#/$defs/input arguments"
},
{
"$ref": "#/$defs/annotation flags"
},
{
"$ref": "#/$defs/output arguments"
},
{
"$ref": "#/$defs/other arguments"
},
{
"$ref": "#/$defs/nextflow input-output arguments"
}
]
}