2025-05-14 08:38:23 +00:00
|
|
|
|
|
|
|
|
|
|
|
2025-03-04 06:00:00 +00:00
|
|
|
|
# Demultiplex.vsh
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data.
|
|
|
|
|
|
Currently data from Illumina and Element Biosciences sequencers are
|
|
|
|
|
|
supported.
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
|
|
|
|
|
[](https://web.viash-hub.com/packages/demultiplex)
|
|
|
|
|
|
[](https://github.com/viash-hub/demultiplex)
|
|
|
|
|
|
[](https://github.com/viash-hub/demultiplex/blob/main/LICENSE)
|
|
|
|
|
|
[](https://github.com/viash-hub/demultiplex/issues)
|
|
|
|
|
|
[](https://viash.io)
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
## Introcuction
|
|
|
|
|
|
|
|
|
|
|
|
This workflow is designed to demultiplex raw RNA-seq sequencing data
|
|
|
|
|
|
from Illumina and Element Biosciences sequencers.
|
|
|
|
|
|
|
|
|
|
|
|
The workflow is built in a modular fashion, where most of the base
|
|
|
|
|
|
functionality is provided by components from
|
|
|
|
|
|
[`biobox`](https://www.viash-hub.com/packages/biobox/latest)
|
|
|
|
|
|
supplemented by custom base components and workflow components in this
|
|
|
|
|
|
package. Each of these components can be used independently as
|
|
|
|
|
|
stand-alone modules with a standardized interface.
|
|
|
|
|
|
|
|
|
|
|
|
The full workflow can be run in two ways:
|
|
|
|
|
|
|
|
|
|
|
|
1. Run the [main
|
|
|
|
|
|
workflow](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex)
|
|
|
|
|
|
containing the main functionality.
|
|
|
|
|
|
2. Run the [(opinianated)
|
|
|
|
|
|
`runner`](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/runner)
|
|
|
|
|
|
where a number of choices (input/output structure and location) have
|
|
|
|
|
|
been made.
|
|
|
|
|
|
|
2025-03-04 06:00:00 +00:00
|
|
|
|
## Workflow Overview
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
The workflow executes the following steps:
|
|
|
|
|
|
|
|
|
|
|
|
1. Unpacking the input data (when a TAR archive is provided)
|
|
|
|
|
|
2. Run `bclconvert` or `bases2fastq`
|
|
|
|
|
|
3. Run `falco` and convert Illumina InterOp information to csv
|
|
|
|
|
|
4. Run `multiqc` to generate a report
|
|
|
|
|
|
|
|
|
|
|
|
## Example usage
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Two variants of the same workflow are provided, depending on the
|
|
|
|
|
|
flexibility in the ouput structure required:
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
- The `runner` workflow provides a predifined output structure. It
|
|
|
|
|
|
requires the minimal amount of parameters to be provided, at the cost
|
|
|
|
|
|
of being less flexible. It is located at
|
|
|
|
|
|
`target/nextflow/runner/main.nf`
|
|
|
|
|
|
- The `demultiplex` workflow (`target/nextflow/demultiplex/main.nf`)
|
|
|
|
|
|
allows for more fine-grained tuning, but required more parameters to
|
|
|
|
|
|
be provided.
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
|
|
|
|
|
### Test data
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
We have provided test data at
|
|
|
|
|
|
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
|
|
|
|
|
|
(Illumina), but please feel free to bring your own. The URL of the test
|
|
|
|
|
|
data can be provided as-is to the workflow, or you can download
|
|
|
|
|
|
everything and specify a local path.
|
|
|
|
|
|
|
|
|
|
|
|
The input data should follow the structure of either Illumina or Element
|
|
|
|
|
|
Biosciences sequencers. The workflow will automatically detect which
|
|
|
|
|
|
demultiplexer to use (`bclconvert` or `bases2fastq`) based on the
|
|
|
|
|
|
presence of either `SampleSheet.csv` or `RunParameters.xml` in the input
|
|
|
|
|
|
directory. Demultiplexer can also be set explicitly using the
|
|
|
|
|
|
`--demultiplexer` parameter.
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
|
|
|
|
|
### Setup
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
In order to use the workflows in this package, you’ll need to do the
|
|
|
|
|
|
following:
|
|
|
|
|
|
|
|
|
|
|
|
- Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
|
|
|
|
|
|
- Install a nextflow compatible executor. This workflow provides a
|
|
|
|
|
|
profile for [docker](https://docs.docker.com/get-started/).
|
|
|
|
|
|
|
|
|
|
|
|
### Run from Viash Hub
|
|
|
|
|
|
|
|
|
|
|
|
1. Open [Viash Hub](https://www.viash-hub.com) and browse to the
|
|
|
|
|
|
[demultiplex
|
|
|
|
|
|
component](https://www.viash-hub.com/packages/demultiplex/v0.3.4/components/demultiplex).
|
|
|
|
|
|
Press the ‘Launch’ button and follow the instructions.
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
2. We will start an example run and set profile to `docker`.
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
3. In the next step, we provide the paramters as follows and leave the
|
|
|
|
|
|
rest as defalut:
|
|
|
|
|
|
|
|
|
|
|
|
- `input`:
|
|
|
|
|
|
`gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2`
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
Press the ‘Launch’ button at the end to get the instructions on how to
|
|
|
|
|
|
run the workflow from the CLI.
|
|
|
|
|
|
|
|
|
|
|
|
### Run using NF-Tower / Seqera Cloud
|
|
|
|
|
|
|
|
|
|
|
|
It’s possible to run the workflow directly from [Seqera
|
|
|
|
|
|
Cloud](https://cloud.seqera.io). The necessary [Nextflow schema
|
|
|
|
|
|
file](https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/)
|
|
|
|
|
|
has been built and provided with the workflows in order to use the
|
|
|
|
|
|
form-based input.
|
|
|
|
|
|
|
|
|
|
|
|
1. Select the option to run the workflow using Seqera Cloud. You will
|
|
|
|
|
|
need to create an API token for your account. Once this token is
|
|
|
|
|
|
filled in in the corresponding field, we will get the option to
|
|
|
|
|
|
select a ‘Workspace’ and a ‘Compute environment’.
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
2. Provide the parameters similar to the previous step.
|
|
|
|
|
|
|
|
|
|
|
|
3. In the next screen, pressing the ‘Launch’ button will actually start
|
|
|
|
|
|
the workflow on Seqera Cloud. A message is shown when the submit was
|
|
|
|
|
|
successful.
|
|
|
|
|
|
|
|
|
|
|
|

|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
|
|
|
|
|
### Setting up SCM
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
In order to let nextflow use the viash-hub workflows, you need to setup
|
|
|
|
|
|
a [SCM](https://www.nextflow.io/docs/latest/git.html#git-configuration)
|
|
|
|
|
|
file. This can be done once by creating `$HOME/.nextflow/scm` and adding
|
|
|
|
|
|
the following:
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
providers {
|
|
|
|
|
|
vsh {
|
|
|
|
|
|
platform = 'gitlab'
|
|
|
|
|
|
server = "packages.viash-hub.com"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Alternatively, a custom location for the SCM file can be specified using
|
|
|
|
|
|
the `NXF_SCM_FILE` environment variable.
|
|
|
|
|
|
|
|
|
|
|
|
You can check if everything is working by getting the `--help` for a
|
|
|
|
|
|
workflow:
|
|
|
|
|
|
|
|
|
|
|
|
``` bash
|
2025-03-04 06:00:00 +00:00
|
|
|
|
nextflow run \
|
|
|
|
|
|
vsh/demultiplex \
|
2025-04-25 12:20:54 +00:00
|
|
|
|
-r v0.3.9 \
|
2025-03-04 06:00:00 +00:00
|
|
|
|
--help
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
### Run from the CLI
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Running from the CLI directly without using Viash hub is possible as
|
|
|
|
|
|
well. The easiest is to use the integrated help functionality, for
|
|
|
|
|
|
instance using the following:
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
``` bash
|
|
|
|
|
|
nextflow run vsh/demultiplex \
|
|
|
|
|
|
-revision v0.3.9 \
|
|
|
|
|
|
-main-script target/nextflow/workflows/runner/main.nf \
|
|
|
|
|
|
--help
|
2025-03-04 06:00:00 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Having this project available locally, you can run the following
|
|
|
|
|
|
command:
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
``` bash
|
2025-03-04 06:00:00 +00:00
|
|
|
|
nextflow run vsh/demultiplex \
|
2025-04-25 12:20:54 +00:00
|
|
|
|
-r v0.3.9 \
|
2025-03-04 06:00:00 +00:00
|
|
|
|
-main-script target/nextflow/runner/main.nf \
|
2025-05-14 08:38:23 +00:00
|
|
|
|
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2" \
|
2025-03-04 06:00:00 +00:00
|
|
|
|
--demultiplexer bclconvert \
|
2025-05-14 08:38:23 +00:00
|
|
|
|
--skip_copycomplete_check \
|
2025-03-04 06:00:00 +00:00
|
|
|
|
--publish_dir example_output/ \
|
|
|
|
|
|
-profile docker \
|
2025-05-14 08:38:23 +00:00
|
|
|
|
-c src/config/labels.config
|
2025-03-04 06:00:00 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
### (Optional) Resource usage tuning
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
Nextflow’s labels can be used to specify the amount of resources a
|
|
|
|
|
|
process can use. This workflow uses the following labels for CPU and
|
|
|
|
|
|
memory:
|
|
|
|
|
|
|
|
|
|
|
|
- `verylowmem`, `lowmem`, `midmem`, `highmem`
|
|
|
|
|
|
- `verylowcpu`, `lowcpu`, `midcpu`, `highcpu`
|
|
|
|
|
|
|
|
|
|
|
|
The defaults for these labels can be found at
|
|
|
|
|
|
`src/config/labels.config`. Nextflow checks that the specified resources
|
|
|
|
|
|
for a process do not exceed what is available on the machine and will
|
|
|
|
|
|
not start if it does. Create your own config file to tune the labels to
|
|
|
|
|
|
your needs, for example:
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
// Resource labels
|
|
|
|
|
|
withLabel: verylowcpu { cpus = 2 }
|
|
|
|
|
|
withLabel: lowcpu { cpus = 8 }
|
|
|
|
|
|
withLabel: midcpu { cpus = 16 }
|
|
|
|
|
|
withLabel: highcpu { cpus = 16 }
|
2025-03-04 06:00:00 +00:00
|
|
|
|
|
2025-05-14 08:38:23 +00:00
|
|
|
|
withLabel: verylowmem { memory = 4.GB }
|
|
|
|
|
|
withLabel: lowmem { memory = 8.GB }
|
|
|
|
|
|
withLabel: midmem { memory = 8.GB }
|
|
|
|
|
|
withLabel: highmem { memory = 8.GB }
|
|
|
|
|
|
|
|
|
|
|
|
When starting nextflow using the CLI, you can use `-c` to provide the
|
|
|
|
|
|
file to nextflow and overwrite the defaults.
|
|
|
|
|
|
|
|
|
|
|
|
## Acknowledgements
|
|
|
|
|
|
|
|
|
|
|
|
Developed in collaboration with Data Intuitive and Open Analytics.
|