Build branch openpipeline_composed/set-public-scope with version set-public-scope to openpipeline_composed on branch set-public-scope (5068271)
Build pipeline: openpipelines-bio.openpipeline-composed.main-vnhhz
Source commit: 50682718d6
Source message: Merge branch 'set-public-scope' of github.com:openpipelines-bio/openpipeline_composed into set-public-scope
This commit is contained in:
26
.gitignore
vendored
Normal file
26
.gitignore
vendored
Normal file
@@ -0,0 +1,26 @@
|
||||
# IDEs and editors
|
||||
/.idea
|
||||
.project
|
||||
.classpath
|
||||
*.launch
|
||||
.settings/
|
||||
.vscode
|
||||
|
||||
# Temp
|
||||
gitignore
|
||||
test_results
|
||||
|
||||
# System Files
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Nextflow
|
||||
work
|
||||
.nextflow*
|
||||
trace-*.txt
|
||||
|
||||
# viash
|
||||
/resources_test/
|
||||
|
||||
# pycache
|
||||
*__pycache__*
|
||||
19
CHANGELOG.md
Normal file
19
CHANGELOG.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# openpipeline_composed x.x.x
|
||||
|
||||
## MINOR CHANGES
|
||||
|
||||
* `workflows/single_cell/process_integrate_annotate`: Set scope to `private` (PR #6).
|
||||
|
||||
* Bump `openpipeline` dependency version to `v4.0.4` (PR #9).
|
||||
|
||||
* Bump `viash` version to `0.9.7` (PR #10).
|
||||
|
||||
# openpipeline_composed 0.1.1
|
||||
|
||||
## MINOR CHANGES
|
||||
|
||||
* Add a README (PR #4).
|
||||
|
||||
# openpipeline_composed 0.1.0
|
||||
|
||||
Initial release containing a single-cell meta-workflow to process single cell omics samples, perform batch integration and/or label projection.
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 openpipelines-bio
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
42
README.md
Normal file
42
README.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# OpenPipeline Composed
|
||||
|
||||
OpenPipeline Composed provides a comprehensive meta-workflow that combines multiple stand-alone workflows from the [OpenPipeline](https://github.com/openpipelines-bio/openpipeline/) package. The meta-workflow combines sample processing, batch integration, and cell type annotation into a unified pipeline for single-cell multi-omics data analysis.
|
||||
|
||||
[](https://www.viash-hub.com/packages/openpipeline_composed)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_composed)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_composed/blob/main/LICENSE)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_composed/issues)
|
||||
[](https://viash.io)
|
||||
|
||||
## Overview
|
||||
|
||||
The sole purpose of this package is to provide a meta-workflow that orchestrates and combines various stand-alone workflows from the [OpenPipeline](https://github.com/openpipelines-bio/openpipeline/) package. By integrating multiple processing steps into a single workflow, it enables seamless processing from raw data to fully annotated, integrated datasets suitable for downstream analysis and atlas generation.
|
||||
|
||||
## Functionality
|
||||
|
||||
The meta-workflow combines three core OpenPipeline workflows:
|
||||
- [**Sample Processing**](https://www.viash-hub.com/packages/openpipeline/latest/components/workflows/multiomics/process_samples): Initial quality control, filtering, and preprocessing
|
||||
- [**Batch Integration**](https://www.viash-hub.com/packages/openpipeline/latest/components?search=workflows%2Fintegration): Integration using **Harmony** or **scVI** methods
|
||||
- [**Cell Type Annotation**](https://www.viash-hub.com/packages/openpipeline/latest/components?search=workflows%2Fannotation): Annotation using **scANVI** or **CellTypist** methods
|
||||
|
||||
## Key Features
|
||||
|
||||
- 🔄 **End-to-End Processing**: Complete pipeline from raw data to annotated results
|
||||
- 📊 **Atlas Generation**: Create comprehensive atlases from multiple datasets and sources
|
||||
- 🔬 **Multi-Modal Support**: Process RNA-seq, ATAC-seq, protein, and spatial data
|
||||
- 🎯 **Method Flexibility**: Choose from multiple integration and annotation approaches
|
||||
- 🧬 **Reference Integration**: Leverage existing reference datasets for annotation
|
||||
|
||||
## Execution via CLI or Seqera Cloud
|
||||
|
||||
The openpipeline_composed package is available via [Viash Hub](https://www.viash-hub.com/packages/openpipeline_composed/latest/), where you can receive instructions on how to run the end-to-end workflow as well as individual subworkflows or components.
|
||||
|
||||
It's possible to run the workflow directly from Seqera Cloud. The necessary Nextflow schema files have been built and provided with the workflows in order to use the form-based input. However, Seqera Cloud can not deal with multiple-value parameters for batch processing of multiple samples. Therefore, it's better to use Viash Hub also here for launching the workflow on Seqera Cloud.
|
||||
|
||||
* Navigate to the [Viash Hub package page](https://www.viash-hub.com/packages/openpipeline_composed/latest/), select the workflow you want to launch and click the `launch` button.
|
||||
* Select the execution environment of choice (e.g. `Seqera Cloud`, `Nextflow` or `Executable`)
|
||||
* Fill in the form with the required parameters and launch the workflow.
|
||||
|
||||
## Support
|
||||
|
||||
For issues specific to the composed meta-workflow, please use the [GitHub issues tracker](https://github.com/openpipelines-bio/openpipeline_composed/issues). For general OpenPipeline questions, refer to the main [OpenPipeline documentation](https://openpipelines.bio/).
|
||||
32
_viash.yaml
Normal file
32
_viash.yaml
Normal file
@@ -0,0 +1,32 @@
|
||||
viash_version: 0.9.7
|
||||
source: src
|
||||
target: target
|
||||
name: openpipeline_composed
|
||||
organization: vsh
|
||||
links:
|
||||
repository: https://github.com/openpipelines-bio/openpipeline_composed
|
||||
docker_registry: ghcr.io
|
||||
repositories:
|
||||
- name: openpipeline
|
||||
repo: openpipeline
|
||||
type: vsh
|
||||
tag: v4.0.4
|
||||
- name: openpipeline_qc
|
||||
repo: openpipeline_qc
|
||||
type: vsh
|
||||
tag: v0.2.2
|
||||
- name: biobox
|
||||
repo: biobox
|
||||
type: vsh
|
||||
tag: v0.4.2
|
||||
info:
|
||||
test_resources:
|
||||
- type: s3
|
||||
path: s3://openpipelines-bio/openpipeline_incubator/resources_test
|
||||
dest: resources_test
|
||||
config_mods: |
|
||||
.requirements.commands := ['ps']
|
||||
.runners[.type == 'nextflow'].directives.tag := '$id'
|
||||
.resources += {path: '/src/configs/labels.config', dest: 'nextflow_labels.config'}
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig("nextflow_labels.config")'
|
||||
version: set-public-scope
|
||||
0
nextflow.config
Normal file
0
nextflow.config
Normal file
170
resources_test_scripts/10x_5k_anticmv.sh
Normal file
170
resources_test_scripts/10x_5k_anticmv.sh
Normal file
@@ -0,0 +1,170 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
# settings
|
||||
ID=10x_5k_anticmv
|
||||
OUT=resources_test/$ID
|
||||
|
||||
# create raw directory
|
||||
raw_dir="$OUT/raw"
|
||||
mkdir -p "$raw_dir"
|
||||
|
||||
# Check whether seqkit is available
|
||||
if ! command -v seqkit &> /dev/null; then
|
||||
echo "This script requires seqkit. Please make sure the binary is added to your PATH."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# dataset page:
|
||||
# https://www.10xgenomics.com/resources/datasets/integrated-gex-totalseqc-and-tcr-analysis-of-connect-generated-library-from-5k-cmv-t-cells-2-standard
|
||||
|
||||
# check whether reference is available
|
||||
reference_dir="resources_test/reference_gencodev41_chr1/"
|
||||
genome_tar="$reference_dir/reference_cellranger.tar.gz"
|
||||
if [[ ! -f "$genome_tar" ]]; then
|
||||
echo "$genome_tar does not exist. Please create the reference genome first"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# download and untar source fastq files
|
||||
tar_dir="$HOME/.cache/openpipeline/5k_human_antiCMV_T_TBNK_connect_Multiplex"
|
||||
if [[ ! -d "$tar_dir" ]]; then
|
||||
mkdir -p "$tar_dir"
|
||||
|
||||
# download fastqs and untar
|
||||
wget "https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-vdj/6.1.2/5k_human_antiCMV_T_TBNK_connect_Multiplex/5k_human_antiCMV_T_TBNK_connect_Multiplex_fastqs.tar" -O "$tar_dir.tar"
|
||||
tar -xvf "$tar_dir.tar" -C "$tar_dir" --strip-components=1
|
||||
rm "$tar_dir.tar"
|
||||
fi
|
||||
|
||||
function seqkit_head {
|
||||
input="$1"
|
||||
output="$2"
|
||||
if [[ ! -f "$output" ]]; then
|
||||
echo "> Processing `basename $input`"
|
||||
seqkit head -n 200000 "$input" | gzip > "$output"
|
||||
fi
|
||||
}
|
||||
|
||||
orig_sample_id="5k_human_antiCMV_T_TBNK_connect"
|
||||
|
||||
seqkit_head "$tar_dir/gex_1/${orig_sample_id}_GEX_1_S1_L001_R1_001.fastq.gz" "$raw_dir/${orig_sample_id}_GEX_1_subset_S1_L001_R1_001.fastq.gz"
|
||||
seqkit_head "$tar_dir/gex_1/${orig_sample_id}_GEX_1_S1_L001_R2_001.fastq.gz" "$raw_dir/${orig_sample_id}_GEX_1_subset_S1_L001_R2_001.fastq.gz"
|
||||
|
||||
seqkit_head "$tar_dir/ab/${orig_sample_id}_AB_S2_L004_R1_001.fastq.gz" "$raw_dir/${orig_sample_id}_AB_subset_S2_L004_R1_001.fastq.gz"
|
||||
seqkit_head "$tar_dir/ab/${orig_sample_id}_AB_S2_L004_R2_001.fastq.gz" "$raw_dir/${orig_sample_id}_AB_subset_S2_L004_R2_001.fastq.gz"
|
||||
|
||||
seqkit_head "$tar_dir/vdj/${orig_sample_id}_VDJ_S1_L001_R1_001.fastq.gz" "$raw_dir/${orig_sample_id}_VDJ_subset_S1_L001_R1_001.fastq.gz"
|
||||
seqkit_head "$tar_dir/vdj/${orig_sample_id}_VDJ_S1_L001_R2_001.fastq.gz" "$raw_dir/${orig_sample_id}_VDJ_subset_S1_L001_R2_001.fastq.gz"
|
||||
|
||||
# download immune panel fasta if needed
|
||||
feature_reference="$raw_dir/feature_reference.csv"
|
||||
if [[ ! -f "$feature_reference" ]]; then
|
||||
wget "https://cf.10xgenomics.com/samples/cell-vdj/6.1.2/5k_human_antiCMV_T_TBNK_connect_Multiplex/5k_human_antiCMV_T_TBNK_connect_Multiplex_count_feature_reference.csv" -O "$feature_reference"
|
||||
fi
|
||||
|
||||
# download vdj reference if needed
|
||||
vdj_ref="$raw_dir/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.0.0.tar.gz"
|
||||
if [[ ! -f "$vdj_ref" ]]; then
|
||||
wget "https://cf.10xgenomics.com/supp/cell-vdj/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.0.0.tar.gz" -O "$vdj_ref"
|
||||
fi
|
||||
|
||||
|
||||
# Run mapping pipeline
|
||||
cat > /tmp/params.yaml << HERE
|
||||
param_list:
|
||||
- id: "$ID"
|
||||
input: "$raw_dir"
|
||||
library_id:
|
||||
- "${orig_sample_id}_GEX_1_subset"
|
||||
- "${orig_sample_id}_AB_subset"
|
||||
- "${orig_sample_id}_VDJ_subset"
|
||||
library_type:
|
||||
- "Gene Expression"
|
||||
- "Antibody Capture"
|
||||
- "VDJ"
|
||||
|
||||
gex_reference: "$genome_tar"
|
||||
vdj_reference: "$vdj_ref"
|
||||
feature_reference: "$feature_reference"
|
||||
HERE
|
||||
|
||||
nextflow \
|
||||
run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-r v4.0.4 \
|
||||
-main-script target/nextflow/mapping/cellranger_multi/main.nf \
|
||||
-resume \
|
||||
--publish_dir "${OUT}_v10/processed" \
|
||||
-profile docker,mount_temp \
|
||||
-params-file /tmp/params.yaml \
|
||||
-c ./src/configs/labels_ci.config
|
||||
|
||||
# Convert to h5mu
|
||||
cat > /tmp/params.yaml << HERE
|
||||
id: "$orig_sample_id"
|
||||
input: "$OUT/processed/10x_5k_anticmv.cellranger_multi.output"
|
||||
publish_dir: "$OUT/"
|
||||
output: "*.h5mu"
|
||||
HERE
|
||||
|
||||
nextflow \
|
||||
run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-r v4.0.4 \
|
||||
-main-script target/nextflow/convert/from_cellranger_multi_to_h5mu/main.nf \
|
||||
-resume \
|
||||
-profile docker,mount_temp \
|
||||
-params-file /tmp/params.yaml \
|
||||
-c ./src/configs/labels_ci.config
|
||||
|
||||
mv "$OUT/0.h5mu" "$OUT/${orig_sample_id}.h5mu"
|
||||
|
||||
|
||||
# run qc workflow
|
||||
cat > /tmp/params.yaml << HERE
|
||||
id: "$ID"
|
||||
input: "$OUT/$orig_sample_id.h5mu"
|
||||
var_name_mitochondrial_genes: mitochondrial
|
||||
var_name_ribosomal_genes: ribosomal
|
||||
publish_dir: "$OUT/"
|
||||
output: "${orig_sample_id}_qc.h5mu"
|
||||
HERE
|
||||
|
||||
nextflow \
|
||||
run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-r v4.0.4 \
|
||||
-main-script target/nextflow/workflows/qc/qc/main.nf \
|
||||
-resume \
|
||||
-profile docker,mount_temp \
|
||||
-params-file /tmp/params.yaml \
|
||||
-c ./src/configs/labels_ci.config
|
||||
|
||||
|
||||
# Run full pipeline
|
||||
cat > /tmp/params.yaml << HERE
|
||||
id: "$ID"
|
||||
input: "$OUT/${orig_sample_id}_qc.h5mu"
|
||||
publish_dir: "$OUT/"
|
||||
output: "${orig_sample_id}_mms.h5mu"
|
||||
HERE
|
||||
|
||||
nextflow \
|
||||
run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-r v4.0.4 \
|
||||
-main-script target/nextflow/workflows/multiomics/process_samples/main.nf \
|
||||
-resume \
|
||||
-profile docker,mount_temp \
|
||||
-params-file /tmp/params.yaml \
|
||||
-c ./src/configs/labels_ci.config
|
||||
|
||||
aws s3 sync \
|
||||
"$OUT" \
|
||||
s3://openpipelines-bio/openpipeline_incubator/resources_test/"$ID" \
|
||||
--exclude "*.yaml" \
|
||||
--delete \
|
||||
--dryrun
|
||||
166
resources_test_scripts/annotation_test_data.sh
Normal file
166
resources_test_scripts/annotation_test_data.sh
Normal file
@@ -0,0 +1,166 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
ID=annotation_test_data
|
||||
OUT=resources_test/$ID/
|
||||
|
||||
# ideally, this would be a versioned pipeline run
|
||||
[ -d "$OUT" ] || mkdir -p "$OUT"
|
||||
|
||||
# Download Tabula Sapiens Blood reference h5ad from https://doi.org/10.5281/zenodo.7587774
|
||||
wget "https://zenodo.org/record/7587774/files/TS_Blood_filtered.h5ad?download=1" -O "${OUT}/tmp_TS_Blood_filtered.h5ad"
|
||||
|
||||
# Download Tabula Sapiens Blood pretrained model from https://doi.org/10.5281/zenodo.7580707
|
||||
wget "https://zenodo.org/record/7580707/files/pretrained_models_Blood_ts.tar.gz?download=1" -O "${OUT}/tmp_pretrained_models_Blood_ts.tar.gz"
|
||||
|
||||
# Download PopV specific CL ontology files - needed for OnClass
|
||||
# OUT_ONTOLOGY="${OUT}/ontology"
|
||||
# [ -d "$OUT_ONTOLOGY" ] || mkdir -p "$OUT_ONTOLOGY"
|
||||
# wget https://raw.githubusercontent.com/czbiohub/PopV/main/ontology/cl.obo \
|
||||
# -O "${OUT_ONTOLOGY}/cl.obo"
|
||||
# wget https://raw.githubusercontent.com/czbiohub/PopV/main/ontology/cl.ontology \
|
||||
# -O "${OUT_ONTOLOGY}/cl.ontology"
|
||||
# wget https://raw.githubusercontent.com/czbiohub/PopV/main/ontology/cl.ontology.nlp.emb \
|
||||
# -O "${OUT_ONTOLOGY}/cl.ontology.nlp.emb"
|
||||
|
||||
|
||||
# Process Tabula Sapiens Blood reference h5ad
|
||||
# (Select one individual and 100 cells per cell type)
|
||||
# normalize and log1p transform data
|
||||
# Add treatment and disease columns
|
||||
python <<HEREDOC
|
||||
import anndata as ad
|
||||
import scanpy as sc
|
||||
import numpy as np
|
||||
|
||||
# Read in data
|
||||
ref_adata = ad.read_h5ad("${OUT}/tmp_TS_Blood_filtered.h5ad")
|
||||
sub_ref_adata = ref_adata[ref_adata.obs["donor_assay"] == "TSP14_10x 3' v3"]
|
||||
n=100
|
||||
s=sub_ref_adata.obs.groupby('cell_ontology_class').cell_ontology_class.transform('count')
|
||||
sub_ref_adata_final = sub_ref_adata[sub_ref_adata.obs[s>=n].groupby('cell_ontology_class').head(n).index]
|
||||
|
||||
# Normalize and log1p transform data
|
||||
data_for_scanpy = ad.AnnData(X=sub_ref_adata_final.X)
|
||||
sc.pp.normalize_total(data_for_scanpy, target_sum=10000)
|
||||
sc.pp.log1p(
|
||||
data_for_scanpy,
|
||||
base=None,
|
||||
layer=None,
|
||||
copy=False,
|
||||
)
|
||||
sub_ref_adata_final.layers["log_normalized"] = data_for_scanpy.X
|
||||
|
||||
# Add treatment and disease columns
|
||||
n_cells = sub_ref_adata_final.n_obs
|
||||
treatment = np.random.choice(["ctrl", "stim"], size=n_cells, p=[0.5, 0.5])
|
||||
disease = np.random.choice(["healthy", "diseased"], size=n_cells, p=[0.5, 0.5])
|
||||
sub_ref_adata_final.obs["treatment"] = treatment
|
||||
sub_ref_adata_final.obs["disease"] = disease
|
||||
|
||||
# Write out data
|
||||
sub_ref_adata_final.write("${OUT}/TS_Blood_filtered.h5ad", compression='gzip')
|
||||
HEREDOC
|
||||
|
||||
|
||||
echo "> Converting to h5mu"
|
||||
viash run src/convert/from_h5ad_to_h5mu/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5ad" \
|
||||
--output "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--modality "rna"
|
||||
|
||||
rm "${OUT}/tmp_TS_Blood_filtered.h5ad"
|
||||
|
||||
echo "> Downloading pretrained CellTypist model and sample test data"
|
||||
wget https://celltypist.cog.sanger.ac.uk/models/Pan_Immune_CellTypist/v2/Immune_All_Low.pkl \
|
||||
-O "${OUT}/celltypist_model_Immune_All_Low.pkl"
|
||||
wget https://celltypist.cog.sanger.ac.uk/Notebook_demo_data/demo_2000_cells.h5ad \
|
||||
-O "${OUT}/demo_2000_cells.h5ad"
|
||||
viash run src/convert/from_h5ad_to_h5mu/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/demo_2000_cells.h5ad" \
|
||||
--output "${OUT}/demo_2000_cells.h5mu" \
|
||||
--modality "rna"
|
||||
|
||||
|
||||
echo "> Fetching OnClass data and models"
|
||||
OUT_ONTOLOGY="${OUT}/ontology"
|
||||
[ -d "$OUT_ONTOLOGY" ] || mkdir -p "$OUT_ONTOLOGY"
|
||||
wget https://figshare.com/ndownloader/files/28394466 -O "${OUT_ONTOLOGY}/OnClass_data_public_minimal.tar.gz"
|
||||
tar -xzvf "${OUT_ONTOLOGY}/OnClass_data_public_minimal.tar.gz" -C "${OUT_ONTOLOGY}" --strip-components=2
|
||||
rm "${OUT_ONTOLOGY}/allen.ontology"
|
||||
rm "${OUT_ONTOLOGY}/OnClass_data_public_minimal.tar.gz"
|
||||
|
||||
wget https://figshare.com/ndownloader/files/28394541 -O "${OUT}/OnClass_models.tar.gz"
|
||||
tar -xzvf "${OUT}/OnClass_models.tar.gz" -C "${OUT}" --strip-components=1
|
||||
rm "${OUT}/OnClass_models.tar.gz"
|
||||
rm "${OUT}/tmp_pretrained_models_Blood_ts.tar.gz"
|
||||
|
||||
find "${OUT}/Pretrained_model" ! -name "example_file_model*" -type f -exec rm -f {} +
|
||||
mv "${OUT}/Pretrained_model" "${OUT}/onclass_model"
|
||||
|
||||
echo "> Creating simple SCVI model"
|
||||
viash run src/integrate/scvi/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--obs_batch "donor_id" \
|
||||
--var_gene_names "ensemblid" \
|
||||
--output "${OUT}/scvi_output.h5mu" \
|
||||
--output_model "${OUT}/scvi_model" \
|
||||
--max_epochs 5 \
|
||||
--n_obs_min_count 10 \
|
||||
--n_var_min_count 10
|
||||
|
||||
echo "> Creating SCVI model with covariates"
|
||||
viash run src/integrate/scvi/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--obs_batch "donor_id" \
|
||||
--var_gene_names "ensemblid" \
|
||||
--obs_categorical_covariate "assay" \
|
||||
--obs_categorical_covariate "donor_assay" \
|
||||
--output "${OUT}/scvi_covariate_output.h5mu" \
|
||||
--output_model "${OUT}/scvi_covariate_model" \
|
||||
--max_epochs 5 \
|
||||
--n_obs_min_count 10 \
|
||||
--n_var_min_count 10
|
||||
|
||||
echo "> Creating simple SCANVI model"
|
||||
viash run src/annotate/scanvi/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--var_gene_names "ensemblid" \
|
||||
--obs_labels "cell_ontology_class" \
|
||||
--scvi_model "${OUT}/scvi_model" \
|
||||
--output "${OUT}/scanvi_output.h5mu" \
|
||||
--output_model "${OUT}/scanvi_model" \
|
||||
--max_epochs 5
|
||||
|
||||
echo "> Creating SCANVI model with covariates"
|
||||
viash run src/annotate/scanvi/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--var_gene_names "ensemblid" \
|
||||
--obs_labels "cell_ontology_class" \
|
||||
--scvi_model "${OUT}/scvi_covariate_model" \
|
||||
--output "${OUT}/scanvi_covariate_output.h5mu" \
|
||||
--output_model "${OUT}/scanvi_covariate_model" \
|
||||
--max_epochs 5
|
||||
|
||||
rm "${OUT}/scanvi_output.h5mu"
|
||||
rm "${OUT}/scanvi_covariate_output.h5mu"
|
||||
rm "${OUT}/scvi_output.h5mu"
|
||||
rm "${OUT}/scvi_covariate_output.h5mu"
|
||||
rm -r "${OUT}/Pretrained_model/"
|
||||
|
||||
echo "> Creating Pseudobulk Data for DGEA"
|
||||
viash run src/differential_expression/create_pseudobulk/config.vsh.yaml --engine docker -- \
|
||||
--input "${OUT}/TS_Blood_filtered.h5mu" \
|
||||
--obs_grouping "cell_type" \
|
||||
--obs_sample_conditions "donor_id" \
|
||||
--obs_sample_conditions "treatment" \
|
||||
--obs_sample_conditions "disease" \
|
||||
--min_num_cells_per_sample 5 \
|
||||
--output "${OUT}/TS_Blood_filtered_pseudobulk.h5mu"
|
||||
151
resources_test_scripts/pbmc_1k_protein_v3.sh
Normal file
151
resources_test_scripts/pbmc_1k_protein_v3.sh
Normal file
@@ -0,0 +1,151 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
ID=pbmc_1k_protein_v3
|
||||
OUT=resources_test/$ID/$ID
|
||||
DIR=$(dirname "$OUT")
|
||||
|
||||
# ideally, this would be a versioned pipeline run
|
||||
[ -d "$DIR" ] || mkdir -p "$DIR"
|
||||
|
||||
# dataset page:
|
||||
# https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-gene-expression-and-cell-surface-protein-3-standard-3-0-0
|
||||
|
||||
# download metrics summary
|
||||
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_protein_v3/pbmc_1k_protein_v3_metrics_summary.csv \
|
||||
-O "${OUT}_metrics_summary.csv"
|
||||
|
||||
# download counts h5 file
|
||||
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_protein_v3/pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5 \
|
||||
-O "${OUT}_filtered_feature_bc_matrix.h5"
|
||||
|
||||
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_protein_v3/pbmc_1k_protein_v3_raw_feature_bc_matrix.h5 \
|
||||
-O "${OUT}_raw_feature_bc_matrix.h5"
|
||||
|
||||
# download counts matrix tar gz file
|
||||
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_protein_v3/pbmc_1k_protein_v3_filtered_feature_bc_matrix.tar.gz \
|
||||
-O "${OUT}_filtered_feature_bc_matrix.tar.gz"
|
||||
|
||||
# extract matrix tar gz
|
||||
mkdir -p "${OUT}_filtered_feature_bc_matrix"
|
||||
tar -xvf "${OUT}_filtered_feature_bc_matrix.tar.gz" \
|
||||
-C "${OUT}_filtered_feature_bc_matrix" \
|
||||
--strip-components 1
|
||||
rm "${OUT}_filtered_feature_bc_matrix.tar.gz"
|
||||
|
||||
cat > /tmp/params.yaml << HERE
|
||||
--input "${OUT}_filtered_feature_bc_matrix.h5" \
|
||||
--input_metrics_summary "${OUT}_metrics_summary.csv" \
|
||||
--output "${OUT}_filtered_feature_bc_matrix.h5mu"
|
||||
|
||||
param_list:
|
||||
- id: "$ID"
|
||||
genome_fasta: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
|
||||
transcriptome_gtf: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
|
||||
target: ["bd_rhapsody", "cellranger_arc"]
|
||||
output_fasta: "reference.fa.gz"
|
||||
output_gtf: "reference.gtf.gz"
|
||||
non_nuclear_contigs: null
|
||||
output_cellranger_arc: "reference_cellranger.tar.gz"
|
||||
output_bd_rhapsody: "reference_bd_rhapsody.tar.gz"
|
||||
bdrhap_extra_star_params: "--genomeSAindexNbases 12 --genomeSAsparseD 2"
|
||||
motifs_file: "$motifs_modified"
|
||||
subset_regex: "chr1"
|
||||
HERE
|
||||
|
||||
# convert 10x h5 to h5mu
|
||||
nextflow run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-latest \
|
||||
-r v4.0.4 \
|
||||
-main-script target/docker/convert/from_10xh5_to_h5mu/from_10xh5_to_h5mu \
|
||||
-profile docker \
|
||||
-c ./src/configs/labels_ci.config \
|
||||
-params-file /tmp/params.yaml \
|
||||
--publish_dir $OUT \
|
||||
-resume
|
||||
|
||||
# run single sample
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script target/nextflow/workflows/rna/rna_singlesample/main.nf \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-profile docker \
|
||||
--id pbmc_1k_protein_v3_uss \
|
||||
--input "${OUT}_filtered_feature_bc_matrix.h5mu" \
|
||||
--output "`basename $OUT`_uss.h5mu" \
|
||||
--publishDir `dirname $OUT` \
|
||||
-resume
|
||||
|
||||
# add the sample ID to the mudata object
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script target/nextflow/metadata/add_id/main.nf \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-profile docker \
|
||||
--id pbmc_1k_protein_v3_uss \
|
||||
--input "${OUT}_uss.h5mu" \
|
||||
--input_id "pbmc_1k_protein_v3_uss" \
|
||||
--output "`basename $OUT`_uss_with_id.h5mu" \
|
||||
--output_compression "gzip" \
|
||||
--publishDir `dirname $OUT` \
|
||||
-resume
|
||||
|
||||
# run multisample
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script target/nextflow/workflows/rna/rna_multisample/main.nf \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-profile docker \
|
||||
--id pbmc_1k_protein_v3_ums \
|
||||
--input "${OUT}_uss_with_id.h5mu" \
|
||||
--output "`basename $OUT`_ums.h5mu" \
|
||||
--publishDir `dirname $OUT` \
|
||||
-resume
|
||||
|
||||
rm "${OUT}_uss_with_id.h5mu"
|
||||
|
||||
# run dimred
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script target/nextflow/workflows/multiomics/dimensionality_reduction/main.nf \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-profile docker \
|
||||
--id pbmc_1k_protein_v3_mms \
|
||||
--input "${OUT}_ums.h5mu" \
|
||||
--output "`basename $OUT`_mms.h5mu" \
|
||||
--publishDir `dirname $OUT` \
|
||||
--obs_covariates sample_id \
|
||||
-resume
|
||||
|
||||
# run integration
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script target/nextflow/workflows/integration/harmony_leiden/main.nf \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-profile docker \
|
||||
--id pbmc_1k_protein_v3_mms_integration \
|
||||
--input "${OUT}_mms.h5mu" \
|
||||
--output "`basename $OUT`_mms.h5mu" \
|
||||
--publishDir `dirname $OUT` \
|
||||
--obs_covariates sample_id \
|
||||
-resume
|
||||
|
||||
python <<HEREDOC
|
||||
import mudata as mu
|
||||
mudata = mu.read_h5mu("${DIR}/pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5mu")
|
||||
mudata.mod["rna"].write_h5ad("${DIR}/pbmc_1k_protein_v3_filtered_feature_bc_matrix_rna.h5ad")
|
||||
HEREDOC
|
||||
|
||||
aws s3 sync \
|
||||
"$OUT" \
|
||||
s3://openpipelines-bio/openpipeline_incubator/resources_test/"$ID" \
|
||||
--exclude "*.yaml" \
|
||||
--delete \
|
||||
--dryrun
|
||||
166
resources_test_scripts/qc_sample_data.sh
Executable file
166
resources_test_scripts/qc_sample_data.sh
Executable file
@@ -0,0 +1,166 @@
|
||||
#/bin/bash
|
||||
|
||||
OUT_DIR=resources_test/qc_sample_data
|
||||
OUT_DIR_SPATIAL=resources_test/spatial_qc_sample_data
|
||||
|
||||
[ ! -d "$OUT_DIR" ] && mkdir -p "$OUT_DIR"
|
||||
[ ! -d "$OUT_DIR_SPATIAL" ] && mkdir -p "$OUT_DIR_SPATIAL"
|
||||
|
||||
# fetch/create h5mu from somewhere
|
||||
cat > /tmp/params_create_h5mu.yaml <<EOF
|
||||
param_list:
|
||||
- id: sample_one
|
||||
input_id: sample_one
|
||||
input: s3://openpipelines-data/10x_5k_anticmv/5k_human_antiCMV_T_TBNK_connect_qc.h5mu
|
||||
- id: sample_two
|
||||
input_id: sample_two
|
||||
input: s3://openpipelines-data/10x_5k_anticmv/5k_human_antiCMV_T_TBNK_connect_qc.h5mu
|
||||
output: '\$id.qc.h5mu'
|
||||
output_compression: gzip
|
||||
publish_dir: "$OUT_DIR"
|
||||
EOF
|
||||
|
||||
# add the sample ID to the mudata object
|
||||
nextflow run openpipelines-bio/openpipeline \
|
||||
-latest \
|
||||
-r 2.1.2 \
|
||||
-main-script target/nextflow/metadata/add_id/main.nf \
|
||||
-c src/configs/labels_ci.config \
|
||||
-profile docker \
|
||||
-params-file /tmp/params_create_h5mu.yaml \
|
||||
-resume
|
||||
|
||||
cat > /tmp/params_subset.yaml <<EOF
|
||||
param_list:
|
||||
- id: sample_one
|
||||
input: resources_test/qc_sample_data/sample_one.qc.h5mu
|
||||
- id: sample_two
|
||||
input: resources_test/qc_sample_data/sample_two.qc.h5mu
|
||||
output: '\$id.qc.h5mu'
|
||||
number_of_observations: 10000
|
||||
output_compression: gzip
|
||||
publish_dir: "$OUT_DIR"
|
||||
EOF
|
||||
|
||||
# subset h5mus
|
||||
nextflow run openpipelines-bio/openpipeline \
|
||||
-latest \
|
||||
-r 2.1.2 \
|
||||
-main-script target/nextflow/filter/subset_h5mu/main.nf \
|
||||
-c src/configs/labels_ci.config \
|
||||
-profile docker \
|
||||
-params-file /tmp/params_subset.yaml \
|
||||
-resume
|
||||
|
||||
cat > /tmp/add_metadata_obs.py <<EOF
|
||||
import mudata as mu
|
||||
import glob
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import os
|
||||
|
||||
# Directory containing the h5mu files
|
||||
out_dir = "$(pwd)/resources_test/qc_sample_data"
|
||||
|
||||
# List of h5mu files
|
||||
h5mu_files = glob.glob(os.path.join(out_dir, "*.h5mu"))
|
||||
print(f"Found {len(h5mu_files)} h5mu files: {h5mu_files}")
|
||||
|
||||
# Metadata values to randomly assign
|
||||
donor_ids = ["donor_1", "donor_2", "donor_3"]
|
||||
cell_types = ["CD4+ T cell", "CD8+ T cell", "B cell", "NK cell", "Monocyte"]
|
||||
batches = ["batch_A", "batch_B"]
|
||||
conditions = ["treated", "control"]
|
||||
|
||||
for h5mu_file in h5mu_files:
|
||||
print(f"Processing {h5mu_file}...")
|
||||
|
||||
# Load MuData object
|
||||
mdata = mu.read_h5mu(h5mu_file)
|
||||
rna = mdata.mod["rna"]
|
||||
n_obs = rna.n_obs
|
||||
|
||||
# Generate random metadata
|
||||
np.random.seed(42 + hash(h5mu_file) % 100) # Different seed for each file but reproducible
|
||||
|
||||
# Create metadata
|
||||
rna.obs["donor_id"] = np.random.choice(donor_ids, size=n_obs)
|
||||
rna.obs["cell_type"] = np.random.choice(cell_types, size=n_obs)
|
||||
rna.obs["batch"] = np.random.choice(batches, size=n_obs)
|
||||
rna.obs["condition"] = np.random.choice(conditions, size=n_obs)
|
||||
|
||||
# Add a continuous variable too
|
||||
rna.obs["quality_score"] = np.random.uniform(0, 1, size=n_obs)
|
||||
|
||||
# Save the modified MuData object
|
||||
mu.write_h5mu(h5mu_file, mdata)
|
||||
print(f"Added metadata to {h5mu_file}")
|
||||
|
||||
print("All files processed successfully!")
|
||||
EOF
|
||||
|
||||
# Execute the Python script
|
||||
python /tmp/add_metadata_obs.py
|
||||
|
||||
# generate cellbender out for testing
|
||||
cat > /tmp/params_cellbender.yaml <<EOF
|
||||
param_list:
|
||||
- id: sample_one
|
||||
input: resources_test/qc_sample_data/sample_one.qc.h5mu
|
||||
- id: sample_two
|
||||
input: resources_test/qc_sample_data/sample_two.qc.h5mu
|
||||
output: '\$id.qc.cellbender.h5mu'
|
||||
epochs: 5
|
||||
output_compression: gzip
|
||||
publish_dir: "$OUT_DIR"
|
||||
EOF
|
||||
|
||||
nextflow run openpipelines-bio/openpipeline \
|
||||
-latest \
|
||||
-r 2.1.2 \
|
||||
-main-script target/nextflow/correction/cellbender_remove_background/main.nf \
|
||||
-c src/configs/labels_ci.config \
|
||||
-profile docker \
|
||||
-params-file /tmp/params_cellbender.yaml \
|
||||
-resume
|
||||
|
||||
# fetch spatial sample data from s3
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
s3://openpipelines-bio/openpipeline_incubator/resources_test/spatial_qc_sample_data \
|
||||
"$OUT_DIR_SPATIAL"
|
||||
|
||||
# generate json for testing
|
||||
viash run src/ingestion_qc/h5mu_to_qc_json/config.vsh.yaml --engine docker -- \
|
||||
--input "$OUT_DIR"/sample_one.qc.cellbender.h5mu \
|
||||
--input "$OUT_DIR"/sample_two.qc.cellbender.h5mu \
|
||||
--ingestion_method cellranger_multi \
|
||||
--obs_metadata "donor_id;cell_type;batch;condition" \
|
||||
--output "$OUT_DIR"/sc_dataset.json \
|
||||
--output_reporting_json "$OUT_DIR"/sc_report_structure.json
|
||||
|
||||
viash run src/ingestion_qc/h5mu_to_qc_json/config.vsh.yaml --engine docker -- \
|
||||
--input "$OUT_DIR_SPATIAL"/xenium_tiny.qc.h5mu \
|
||||
--input "$OUT_DIR_SPATIAL"/xenium_tiny.qc.h5mu \
|
||||
--ingestion_method xenium \
|
||||
--min_num_nonzero_vars 1 \
|
||||
--output "$OUT_DIR_SPATIAL"/xenium_dataset.json \
|
||||
--output_reporting_json "$OUT_DIR_SPATIAL"/xenium_report_structure.json
|
||||
|
||||
# remove all state yaml files
|
||||
rm "$OUT_DIR"/*.yaml
|
||||
rm "$OUT_DIR_SPATIAL"/*.yaml
|
||||
|
||||
# copy to s3
|
||||
aws s3 sync \
|
||||
"$OUT_DIR" \
|
||||
s3://openpipelines-bio/openpipeline_incubator/"$OUT_DIR" \
|
||||
--delete \
|
||||
--dryrun
|
||||
|
||||
|
||||
aws s3 sync \
|
||||
"$OUT_DIR_SPATIAL" \
|
||||
s3://openpipelines-bio/openpipeline_incubator/"$OUT_DIR_SPATIAL" \
|
||||
--delete \
|
||||
--dryrun
|
||||
74
resources_test_scripts/reference_gencodev41.sh
Normal file
74
resources_test_scripts/reference_gencodev41.sh
Normal file
@@ -0,0 +1,74 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
# settings
|
||||
ID=reference_gencodev41_chr1
|
||||
OUT=resources_test/$ID
|
||||
|
||||
mkdir -p "$OUT"
|
||||
|
||||
wget "https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip" -O "$OUT/ERCC92.zip"
|
||||
|
||||
# Download JASPAR files for reference building
|
||||
# Source of the code below: https://support.10xgenomics.com/single-cell-atac/software/release-notes/references#GRCh38-2020-A-2.0.0
|
||||
motifs_url="https://jaspar.elixir.no/download/data/2024/CORE/JASPAR2024_CORE_non-redundant_pfms_jaspar.txt"
|
||||
motifs_in="${OUT}/JASPAR2024_CORE_non-redundant_pfms_jaspar.txt"
|
||||
|
||||
if [ ! -f "$motifs_in" ]; then
|
||||
curl -sS "$motifs_url" > "$motifs_in"
|
||||
fi
|
||||
|
||||
# Change motif headers so the human-readable motif name precedes the motif
|
||||
# identifier. So ">MA0004.1 Arnt" -> ">Arnt_MA0004.1".
|
||||
motifs_modified="${OUT}/$(basename "$motifs_in").modified"
|
||||
awk '{
|
||||
if ( substr($1, 1, 1) == ">" ) {
|
||||
print ">" $2 "_" substr($1,2)
|
||||
} else {
|
||||
print
|
||||
}
|
||||
}' "$motifs_in" > "$motifs_modified"
|
||||
|
||||
|
||||
cat > /tmp/params.yaml << HERE
|
||||
param_list:
|
||||
- id: "$ID"
|
||||
genome_fasta: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
|
||||
transcriptome_gtf: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
|
||||
target: ["bd_rhapsody", "cellranger_arc"]
|
||||
output_fasta: "reference.fa.gz"
|
||||
output_gtf: "reference.gtf.gz"
|
||||
non_nuclear_contigs: null
|
||||
output_cellranger_arc: "reference_cellranger.tar.gz"
|
||||
output_bd_rhapsody: "reference_bd_rhapsody.tar.gz"
|
||||
bdrhap_extra_star_params: "--genomeSAindexNbases 12 --genomeSAsparseD 2"
|
||||
motifs_file: "$motifs_modified"
|
||||
subset_regex: "chr1"
|
||||
HERE
|
||||
|
||||
nextflow run https://packages.viash-hub.com/vsh/openpipeline \
|
||||
-latest \
|
||||
-r v4.0.4 \
|
||||
-main-script target/nextflow/workflows/ingestion/make_reference/main.nf \
|
||||
-profile docker \
|
||||
-c ./src/configs/labels_ci.config \
|
||||
-params-file /tmp/params.yaml \
|
||||
--publish_dir $OUT \
|
||||
-resume
|
||||
|
||||
rm "$motifs_modified"
|
||||
rm "$motifs_in"
|
||||
rm "$OUT/ERCC92.zip"
|
||||
|
||||
|
||||
aws s3 sync \
|
||||
"$OUT" \
|
||||
s3://openpipelines-bio/openpipeline_incubator/resources_test/"$ID" \
|
||||
--exclude "*.yaml" \
|
||||
--delete \
|
||||
--dryrun
|
||||
37
resources_test_scripts/spatial_qc_sample_data.sh
Executable file
37
resources_test_scripts/spatial_qc_sample_data.sh
Executable file
@@ -0,0 +1,37 @@
|
||||
#/bin/bash
|
||||
|
||||
OUT_DIR=resources_test/spatial_qc_sample_data
|
||||
|
||||
[ ! -d "$OUT_DIR" ] && mkdir -p "$OUT_DIR"
|
||||
|
||||
# fetch/create h5mu from somewhere
|
||||
cat > /tmp/qc.yaml <<EOF
|
||||
param_list:
|
||||
- id: xenium_tiny
|
||||
input: s3://openpipelines-bio/openpipeline_spatial/resources_test/xenium/xenium_tiny.h5mu
|
||||
- id: Lung5_Rep2_tiny
|
||||
input: s3://openpipelines-bio/openpipeline_spatial/resources_test/cosmx/Lung5_Rep2_tiny.h5mu
|
||||
var_name_mitochondrial_genes: mitochondrial
|
||||
var_name_ribosomal_genes: ribosomal
|
||||
output: '\$id.qc.h5mu'
|
||||
output_compression: gzip
|
||||
publish_dir: "$OUT_DIR"
|
||||
EOF
|
||||
|
||||
nextflow run openpipelines-bio/openpipeline \
|
||||
-latest \
|
||||
-r 2.1.0 \
|
||||
-main-script target/nextflow/workflows/qc/qc/main.nf \
|
||||
-profile docker \
|
||||
-params-file /tmp/qc.yaml \
|
||||
-resume \
|
||||
-config src/configs/labels_ci.config
|
||||
|
||||
# copy to s3
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
resources_test/spatial_qc_sample_data \
|
||||
s3://openpipelines-bio/openpipeline_incubator/resources_test/spatial_qc_sample_data \
|
||||
--delete --dryrun \
|
||||
--exclude "*" --include "*.h5mu" \
|
||||
|
||||
11
src/authors/dorien_roosen.yaml
Normal file
11
src/authors/dorien_roosen.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
name: Dorien Roosen
|
||||
info:
|
||||
role: Core Team Member
|
||||
links:
|
||||
email: dorien@data-intuitive.com
|
||||
github: dorien-er
|
||||
linkedin: dorien-roosen
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist
|
||||
11
src/authors/jakub_majercik.yaml
Normal file
11
src/authors/jakub_majercik.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
name: Jakub Majercik
|
||||
info:
|
||||
role: Contributor
|
||||
links:
|
||||
email: jakub@data-intuitive.com
|
||||
github: jakubmajercik
|
||||
linkedin: jakubmajercik
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatics Engineer
|
||||
15
src/authors/robrecht_cannoodt.yaml
Normal file
15
src/authors/robrecht_cannoodt.yaml
Normal file
@@ -0,0 +1,15 @@
|
||||
name: Robrecht Cannoodt
|
||||
info:
|
||||
role: Core Team Member
|
||||
links:
|
||||
email: robrecht@data-intuitive.com
|
||||
github: rcannood
|
||||
orcid: "0000-0003-3641-729X"
|
||||
linkedin: robrechtcannoodt
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Science Engineer
|
||||
- name: Open Problems
|
||||
href: https://openproblems.bio
|
||||
role: Core Member
|
||||
6
src/authors/weiwei_schultz.yaml
Normal file
6
src/authors/weiwei_schultz.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
name: Weiwei Schultz
|
||||
info:
|
||||
role: Contributor
|
||||
organizations:
|
||||
- name: Janssen R&D US
|
||||
role: Associate Director Data Sciences
|
||||
36
src/configs/integration_tests.config
Normal file
36
src/configs/integration_tests.config
Normal file
@@ -0,0 +1,36 @@
|
||||
profiles {
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
}
|
||||
66
src/configs/labels.config
Normal file
66
src/configs/labels.config
Normal file
@@ -0,0 +1,66 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
maxMemory = null
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: verylowmem { memory = { get_memory( 4.GB * task.attempt ) } }
|
||||
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 16.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 64.GB * task.attempt ) } }
|
||||
withLabel: veryhighmem { memory = { get_memory( 75.GB * task.attempt ) } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
105
src/configs/labels_ci.config
Normal file
105
src/configs/labels_ci.config
Normal file
@@ -0,0 +1,105 @@
|
||||
process {
|
||||
withLabel: lowmem { memory = 13.Gb }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midmem { memory = 13.Gb }
|
||||
withLabel: midcpu { cpus = 4 }
|
||||
withLabel: highmem { memory = 13.Gb }
|
||||
withLabel: highcpu { cpus = 4 }
|
||||
withLabel: veryhighmem { memory = 13.Gb }
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
}
|
||||
|
||||
env.NUMBA_CACHE_DIR = '/tmp'
|
||||
|
||||
trace {
|
||||
enabled = true
|
||||
overwrite = true
|
||||
}
|
||||
dag {
|
||||
overwrite = true
|
||||
}
|
||||
|
||||
process.maxForks = 1
|
||||
|
||||
profiles {
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.fixOwnership = true
|
||||
docker.enabled = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
|
||||
local {
|
||||
// This config is for local processing.
|
||||
process {
|
||||
maxMemory = 25.GB
|
||||
withLabel: verylowcpu { cpus = 2 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 6 }
|
||||
withLabel: highcpu { cpus = 12 }
|
||||
|
||||
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
372
src/single_cell/cellranger_multi_qc/cellranger_multi.yaml
Normal file
372
src/single_cell/cellranger_multi_qc/cellranger_multi.yaml
Normal file
@@ -0,0 +1,372 @@
|
||||
argument_groups:
|
||||
- name: Input files
|
||||
arguments:
|
||||
- type: file
|
||||
name: --input
|
||||
required: false
|
||||
description: |
|
||||
The FASTQ files to be analyzed. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq:
|
||||
`[Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz`
|
||||
example: [ mysample_S1_L001_R1_001.fastq.gz, mysample_S1_L001_R2_001.fastq.gz ]
|
||||
multiple: true
|
||||
|
||||
- name: Library arguments
|
||||
arguments:
|
||||
- type: string
|
||||
name: --library_id
|
||||
required: false
|
||||
description: |
|
||||
The Illumina sample name to analyze. This must exactly match the 'Sample Name'part
|
||||
of the FASTQ files specified in the `--input` argument.
|
||||
example: ["mysample1"]
|
||||
multiple: true
|
||||
- type: string
|
||||
name: --library_type
|
||||
required: false
|
||||
description: |
|
||||
The underlying feature type of the library.
|
||||
choices: ["Gene Expression", "VDJ", "VDJ-T", "VDJ-B", "VDJ-T-GD", "Antibody Capture",
|
||||
"CRISPR Guide Capture", "Multiplexing Capture", "Antigen Capture", "Custom"]
|
||||
example: "Gene Expression"
|
||||
multiple: true
|
||||
- type: string
|
||||
name: --library_subsample
|
||||
required: false
|
||||
description: |
|
||||
The rate at which reads from the provided FASTQ files are sampled.
|
||||
Must be strictly greater than 0 and less than or equal to 1.
|
||||
example: "0.5"
|
||||
multiple: true
|
||||
- type: string
|
||||
name: --library_lanes
|
||||
required: false
|
||||
description: Lanes associated with this sample. Defaults to using all lanes.
|
||||
example: "1-4"
|
||||
multiple: true
|
||||
- type: string
|
||||
name: "--library_chemistry"
|
||||
description: |
|
||||
Only applicable to FRP. Library-specific assay configuration. By default,
|
||||
the assay configuration is detected automatically. Typically, users will
|
||||
not need to specify a chemistry.
|
||||
|
||||
- name: Sample parameters
|
||||
# Corresponds to the [samples] section
|
||||
arguments:
|
||||
- type: string
|
||||
name: --sample_ids
|
||||
alternatives: "--cell_multiplex_sample_id"
|
||||
multiple: true
|
||||
description: |
|
||||
A name to identify a multiplexed sample. Must be alphanumeric with hyphens and/or underscores,
|
||||
and less than 64 characters. Required for Cell Multiplexing libraries.
|
||||
- type: string
|
||||
multiple: true
|
||||
name: --sample_description
|
||||
alternatives: [--cell_multiplex_description]
|
||||
description: A description for the sample.
|
||||
- type: integer
|
||||
multiple: true
|
||||
name: --sample_expect_cells
|
||||
example: 3000
|
||||
description: |
|
||||
Expected number of recovered cells, used as input to cell calling algorithm.
|
||||
- type: integer
|
||||
name: "--sample_force_cells"
|
||||
example: 3000
|
||||
multiple: true
|
||||
required: false
|
||||
description: |
|
||||
Force pipeline to use this number of cells, bypassing cell detection.
|
||||
|
||||
- name: "Feature Barcode library specific arguments"
|
||||
# Corresponds to the [feature] section
|
||||
arguments:
|
||||
- name: "--feature_reference"
|
||||
type: file
|
||||
description: |
|
||||
Path to the Feature reference CSV file, declaring Feature Barcode constructs and associated barcodes.
|
||||
Required only for Antibody Capture or CRISPR Guide Capture libraries.
|
||||
See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref for more information."
|
||||
example: "feature_reference.csv"
|
||||
required: false
|
||||
- name: "--feature_r1_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 1 sequence of V(D)J libraries to the first N bases,
|
||||
where N is the user-supplied value. Note that the length includes the Barcode and UMI
|
||||
sequences so do not set this below 26.
|
||||
- name: "--feature_r2_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 2 sequence of V(D)J libraries to the first N bases,
|
||||
where N is a user-supplied value. Trimming occurs before sequencing metrics are computed
|
||||
and therefore, limiting the length of Read 2 may affect Q30 scores.
|
||||
- name: "--min_crispr_umi"
|
||||
type: integer
|
||||
min: 1
|
||||
required: false
|
||||
description: |
|
||||
Set the minimum number of CRISPR guide RNA UMIs required for protospacer detection.
|
||||
If a lower or higher sensitivity is desired for detection, this value can be customized
|
||||
according to specific experimental needs. Applicable only to datasets that include a
|
||||
CRISPR Guide Capture library.
|
||||
- name: Gene expression arguments
|
||||
# Corresponds to the [gene-expression] section
|
||||
description: Arguments relevant to the analysis of gene expression data.
|
||||
arguments:
|
||||
- name: "--gex_reference"
|
||||
type: file
|
||||
description: "Genome refence index built by Cell Ranger mkref."
|
||||
example: "reference_genome.tar.gz"
|
||||
required: true
|
||||
- type: boolean
|
||||
name: "--gex_secondary_analysis"
|
||||
default: false
|
||||
description: Whether or not to run the secondary analysis e.g. clustering.
|
||||
- type: boolean
|
||||
name: "--gex_generate_bam"
|
||||
default: false
|
||||
description: Whether to generate a BAM file.
|
||||
- type: file
|
||||
name: "--tenx_cloud_token_path"
|
||||
description: The 10x Cloud Analysis user token used to enable cell annotation.
|
||||
- type: string
|
||||
name: "--cell_annotation_model"
|
||||
description: |
|
||||
"Cell annotation model to use. If auto, uses the default model for the species.
|
||||
If not given, does not run cell annotation."
|
||||
choices: ["auto", "human_pca_v1_beta", "mouse_pca_v1_beta"]
|
||||
- type: integer
|
||||
name: --gex_expect_cells
|
||||
example: 3000
|
||||
description: |
|
||||
Expected number of recovered cells, used as input to cell calling algorithm.
|
||||
- type: integer
|
||||
name: "--gex_force_cells"
|
||||
example: 3000
|
||||
description: |
|
||||
Force pipeline to use this number of cells, bypassing cell detection.
|
||||
- type: boolean
|
||||
name: "--gex_include_introns"
|
||||
default: true
|
||||
description: |
|
||||
Whether or not to include intronic reads in counts.
|
||||
This option does not apply to Fixed RNA Profiling analysis.
|
||||
- name: "--gex_r1_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 1 sequence of V(D)J libraries to the first N bases,
|
||||
where N is the user-supplied value. Note that the length includes the Barcode and UMI
|
||||
sequences so do not set this below 26.
|
||||
- name: "--gex_r2_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 2 sequence of V(D)J libraries to the first N bases,
|
||||
where N is a user-supplied value. Trimming occurs before sequencing metrics are computed
|
||||
and therefore, limiting the length of Read 2 may affect Q30 scores.
|
||||
- type: string
|
||||
name: --gex_chemistry
|
||||
default: auto
|
||||
description: |
|
||||
Assay configuration. Either specify a single value which will be applied to all libraries,
|
||||
or a number of values that is equal to the number of libararies. The latter is only applicable
|
||||
to only applicable to Fixed RNA Profiling.
|
||||
- auto: Chemistry autodetection (default)
|
||||
- threeprime: Single Cell 3'
|
||||
- SC3Pv1, SC3Pv2, SC3Pv3(-polyA), SC3Pv4(-polyA): Single Cell 3' v1, v2, v3, or v4
|
||||
- SC3Pv3HT(-polyA): Single Cell 3' v3.1 HT
|
||||
- SC-FB: Single Cell Antibody-only 3' v2 or 5'
|
||||
- fiveprime: Single Cell 5'
|
||||
- SC5P-PE: Paired-end Single Cell 5'
|
||||
- SC5P-PE-v3: Paired-end Single Cell 5' v3
|
||||
- SC5P-R2: R2-only Single Cell 5'
|
||||
- SC5P-R2-v3: R2-only Single Cell 5' v3
|
||||
- SCP5-PE-v3: Single Cell 5' paired-end v3 (GEM-X)
|
||||
- SC5PHT : Single Cell 5' v2 HT
|
||||
- SFRP: Fixed RNA Profiling (Singleplex)
|
||||
- MFRP: Fixed RNA Profiling (Multiplex, Probe Barcode on R2)
|
||||
- MFRP-R1: Fixed RNA Profiling (Multiplex, Probe Barcode on R1)
|
||||
- MFRP-RNA: Fixed RNA Profiling (Multiplex, RNA, Probe Barcode on R2)
|
||||
- MFRP-Ab: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode at R2:69)
|
||||
- MFRP-Ab-R2pos50: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode at R2:50)
|
||||
- MFRP-RNA-R1: Fixed RNA Profiling (Multiplex, RNA, Probe Barcode on R1)
|
||||
- MFRP-Ab-R1: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode on R1)
|
||||
- ARC-v1 for analyzing the Gene Expression portion of Multiome data. If Cell Ranger auto-detects ARC-v1 chemistry, an error is triggered.
|
||||
See https://kb.10xgenomics.com/hc/en-us/articles/115003764132-How-does-Cell-Ranger-auto-detect-chemistry- for more information.
|
||||
choices: [ auto, threeprime, fiveprime, SC3Pv1, SC3Pv2, SC3Pv3, SC3Pv3-polyA, SC3Pv4, SC3Pv4-polyA, SC3Pv3LT, SC3Pv3HT, SC3Pv3HT-polyA,
|
||||
SC5P-PE, SC5P-PE-v3, SC5P-R2, SC-FB, SC5P-R2-v3, SCP5-PE-v3, SC5PHT, MFRP, MFRP-R1, MFRP-RNA, MFRP-Ab,
|
||||
SFRP, MFRP-Ab-R2pos50, MFRP-RNA-R1, MFRP-Ab-R1, ARC-v1]
|
||||
|
||||
- name: "VDJ related parameters"
|
||||
# The [vdj] section
|
||||
arguments:
|
||||
- name: "--vdj_reference"
|
||||
type: file
|
||||
description: "VDJ refence index built by Cell Ranger mkref."
|
||||
example: "reference_vdj.tar.gz"
|
||||
required: false
|
||||
- name: "--vdj_inner_enrichment_primers"
|
||||
type: file
|
||||
description: |
|
||||
V(D)J Immune Profiling libraries: if inner enrichment primers other than those provided
|
||||
in the 10x Genomics kits are used, they need to be specified here as a
|
||||
text file with one primer per line.
|
||||
example: "enrichment_primers.txt"
|
||||
required: false
|
||||
- name: "--vdj_r1_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 1 sequence of V(D)J libraries to the first N bases, where N is the user-supplied value.
|
||||
Note that the length includes the Barcode and UMI sequences so do not set this below 26.
|
||||
- name: "--vdj_r2_length"
|
||||
type: integer
|
||||
required: false
|
||||
description: |
|
||||
Limit the length of the input Read 2 sequence of V(D)J libraries to the first N bases, where N is a user-supplied value.
|
||||
Trimming occurs before sequencing metrics are computed and therefore, limiting the length of Read 2 may affect Q30 scores
|
||||
- name: "--vdj_denovo"
|
||||
type: boolean
|
||||
required: false
|
||||
description: |
|
||||
Run in reference-free mode (i.e., do not use annotations). This option is not supported for multiplexed experiments.
|
||||
- name: 3' Cell multiplexing parameters (CellPlex Multiplexing)
|
||||
# cell_multiplex_oligo_ids adds to [samples] section
|
||||
# min_assignment_confidence, cmo_set barcode_sample_assignment are added to [gene-expression]
|
||||
arguments:
|
||||
- type: string
|
||||
name: --cell_multiplex_oligo_ids
|
||||
alternatives: [--cmo_ids]
|
||||
multiple: true
|
||||
description: |
|
||||
The Cell Multiplexing oligo IDs used to multiplex this sample. If multiple CMOs were used for a sample,
|
||||
separate IDs with a pipe (e.g., CMO301|CMO302). Required for Cell Multiplexing libraries.
|
||||
|
||||
- type: double
|
||||
name: --min_assignment_confidence
|
||||
description: |
|
||||
The minimum estimated likelihood to call a sample as tagged with a Cell Multiplexing Oligo (CMO) instead of "Unassigned".
|
||||
Users may wish to tolerate a higher rate of mis-assignment in order to obtain more singlets to include in their analysis,
|
||||
or a lower rate of mis-assignment at the cost of obtaining fewer singlets.
|
||||
- type: file
|
||||
direction: input
|
||||
required: false
|
||||
name: "--cmo_set"
|
||||
description: |
|
||||
Path to a custom CMO set CSV file, declaring CMO constructs and associated barcodes. If the default CMO reference IDs that are built into
|
||||
the Cell Ranger software are required, this option does not need to be used.
|
||||
- type: file
|
||||
direction: input
|
||||
required: false
|
||||
name: "--barcode_sample_assignment"
|
||||
description: |
|
||||
Path to a barcode-sample assignment CSV file that specifies the barcodes that belong to each sample.
|
||||
|
||||
- name: Hashtag multiplexing parameters
|
||||
# Is added to [samples]
|
||||
arguments:
|
||||
- name: --hashtag_ids
|
||||
type: string
|
||||
multiple: true
|
||||
description: |
|
||||
The hashtag IDs used to multiplex this sample. If multiple antibody hashtags were used for the same sample,
|
||||
you can separate IDs with a pipe.
|
||||
|
||||
- name: On-chip multiplexing parameters
|
||||
# Is added to [samples]
|
||||
arguments:
|
||||
- name: --ocm_barcode_ids
|
||||
type: string
|
||||
multiple: true
|
||||
# Note: choices is not an option here because multiple values can be added using pipe
|
||||
description: |
|
||||
The OCM barcode IDs used to multiplex this sample. Must be one of OB1, OB2, OB3, OB4.
|
||||
If multiple OCM Barcodes were used for the same sample, you can separate IDs
|
||||
with a pipe (e.g., OB1|OB2).
|
||||
|
||||
- name: Flex multiplexing paramaters
|
||||
# probe_set, filter_probes and emptydrops_minimum_umis end up in [gene-expression]
|
||||
# probe_barcode_ids ends up in [samples]
|
||||
arguments:
|
||||
- type: file
|
||||
name: "--probe_set"
|
||||
description: |
|
||||
A probe set reference CSV file. It specifies the sequences used as a reference for probe alignment and the gene ID associated with each probe.
|
||||
It must include 4 columns (probe file format 1.0.0): gene_id,probe_seq,probe_id,included,region and an optional 5th column (probe file format 1.0.1).
|
||||
- gene_id: The Ensembl gene identifier targeted by the probe.
|
||||
- probe_seq: The nucleotide sequence of the probe, which is complementary to the transcript sequence.
|
||||
- probe_id: The probe identifier, whose format is described in Probe identifiers.
|
||||
- included: A TRUE or FALSE flag specifying whether the probe is included in the filtered counts matrix output or excluded by the probe filter.
|
||||
See filter-probes option of cellranger multi. All probes of a gene must be marked TRUE in the included column for that gene to be included.
|
||||
- region: Present only in v1.0.1 probe set reference CSV. The gene boundary targeted by the probe. Accepted values are spliced or unspliced.
|
||||
|
||||
The file also contains a number of required metadata fields in the header in the format #key=value:
|
||||
- panel_name: The name of the probe set.
|
||||
- panel_type: Always predesigned for predesigned probe sets.
|
||||
- reference_genome: The reference genome build used for probe design.
|
||||
- reference_version: The version of the Cell Ranger reference transcriptome used for probe design.
|
||||
- probe_set_file_format: The version of the probe set file format specification that this file conforms to.
|
||||
- type: boolean # Null is also a valid option because passing this argument to cellranger (true or false) requires --probe_set
|
||||
name: "--filter_probes"
|
||||
description: |
|
||||
If 'false', include all non-deprecated probes listed in the probe set reference CSV file.
|
||||
If 'true' or not set, probes that are predicted to have off-target activity to homologous genes are excluded from analysis.
|
||||
Not filtering will result in UMI counts from all non-deprecated probes,
|
||||
including those with predicted off-target activity, to be used in the analysis.
|
||||
Probes whose ID is prefixed with DEPRECATED are always excluded from the analysis.
|
||||
|
||||
- type: string
|
||||
name: "--probe_barcode_ids"
|
||||
multiple: true
|
||||
description: |
|
||||
The Fixed RNA Probe Barcode ID used for this sample, and for multiplex GEX + Antibody Capture libraries,
|
||||
the corresponding Antibody Multiplexing Barcode IDs. 10x recommends specifying both barcodes (e.g., BC001+AB001)
|
||||
when an Antibody Capture library is present. The barcode pair order is BC+AB and they
|
||||
are separated with a "+" (no spaces). Alternatively, you can specify the Probe Barcode ID alone and
|
||||
Cell Ranger's barcode pairing auto-detection algorithm will automatically match to the corresponding Antibody
|
||||
Multiplexing Barcode.
|
||||
- type: integer
|
||||
name: --emptydrops_minimum_umis
|
||||
min: 1
|
||||
description: |
|
||||
For singleplex Flex experiments, use this option to adjust the UMI cutoff during the second step of cell calling.
|
||||
Cell Ranger will still perform the full cell calling process but will only evaluate barcodes with UMIs above
|
||||
the threshold you specify.
|
||||
|
||||
- name: Antigen Capture (BEAM) libary arguments
|
||||
# These end up in the [antigen-specificity] section
|
||||
description: |
|
||||
These arguments are recommended if an Antigen Capture (BEAM) library is present.
|
||||
It is needed to calculate the antigen specificity score.
|
||||
arguments:
|
||||
- type: string
|
||||
name: --control_id
|
||||
multiple: true
|
||||
description: |
|
||||
A user-defined ID for any negative controls used in the T/BCR Antigen Capture assay. Must match id specified in the feature reference CSV.
|
||||
May only include ASCII characters and must not use whitespace, slash, quote, or comma characters.
|
||||
Each ID must be unique and must not collide with a gene identifier from the transcriptome.
|
||||
- type: string
|
||||
multiple: true
|
||||
name: --mhc_allele
|
||||
description: |
|
||||
The MHC allele for TCR Antigen Capture libraries. Must match mhc_allele name specified in the Feature Reference CSV.
|
||||
- name: "General arguments"
|
||||
description: |
|
||||
These arguments are applicable to all library types.
|
||||
arguments:
|
||||
- name: "--check_library_compatibility"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Optional. This option allows users to disable the check that evaluates 10x Barcode overlap between
|
||||
ibraries when multiple libraries are specified (e.g., Gene Expression + Antibody Capture). Setting
|
||||
this option to false will disable the check across all library combinations. We recommend running
|
||||
this check (default), however if the pipeline errors out, users can bypass the check to generate
|
||||
outputs for troubleshooting.
|
||||
|
||||
101
src/single_cell/cellranger_multi_qc/config.vsh.yaml
Normal file
101
src/single_cell/cellranger_multi_qc/config.vsh.yaml
Normal file
@@ -0,0 +1,101 @@
|
||||
name: "cellranger_multi_qc"
|
||||
namespace: "single_cell"
|
||||
# scope: "private"
|
||||
description: "A pipeline for running Cell Ranger multi followed by QC."
|
||||
authors:
|
||||
- __merge__: /src/authors/jakub_majercik.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
__merge__: /src/single_cell/cellranger_multi_qc/cellranger_multi.yaml
|
||||
argument_groups:
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: "--output_raw"
|
||||
type: file
|
||||
direction: output
|
||||
description: "The raw output folder."
|
||||
required: true
|
||||
example: output_dir/
|
||||
- name: "--output_h5mu"
|
||||
type: file
|
||||
direction: output
|
||||
description: |
|
||||
Locations for the output files. Must contain a wildcard (*) character,
|
||||
which will be replaced with the sample name.
|
||||
example: "*.h5mu"
|
||||
required: true
|
||||
- name: "--uns_metrics"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to QC metrics (if any).
|
||||
default: "metrics_cellranger"
|
||||
- name: "--output_ingestion_qc_report"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
multiple: true
|
||||
description: |
|
||||
Ingestion QC report in HTML format. Generated when --create_sample_qc_report is true.
|
||||
example: "*.sample_qc_report.html"
|
||||
- name: "--output_processed_h5mu"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
description: |
|
||||
Folder containing the QC-processed h5mu files. Generated when
|
||||
--create_sample_qc_report is true.
|
||||
example: "processed_h5mu/"
|
||||
- name: "--output_multiqc_report"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
description: |
|
||||
MultiQC report in HTML format. Generated when
|
||||
--create_multiqc_report is true.
|
||||
example: "multiqc_report.html"
|
||||
|
||||
- name: "QC reports"
|
||||
description: "Options for generating optional QC reports."
|
||||
arguments:
|
||||
- name: "--create_sample_qc_report"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to generate an ingestion QC report.
|
||||
- name: "--create_multiqc_report"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to run FastQC on the input FASTQ files and aggregate results
|
||||
into a MultiQC report.
|
||||
- name: "--run_cellbender"
|
||||
type: boolean
|
||||
default: false
|
||||
description: |
|
||||
Whether to run CellBender for ambient RNA removal as part of the
|
||||
sample QC report generation. Only used when --create_sample_qc_report is true.
|
||||
|
||||
dependencies:
|
||||
- name: workflows/ingestion/cellranger_multi
|
||||
repository: openpipeline
|
||||
- name: workflows/generate_qc_report
|
||||
repository: openpipeline_qc
|
||||
- name: fastqc
|
||||
repository: biobox
|
||||
- name: multiqc
|
||||
repository: biobox
|
||||
|
||||
resources:
|
||||
- type: nextflow_script
|
||||
path: main.nf
|
||||
entrypoint: run_wf
|
||||
|
||||
test_resources:
|
||||
- type: nextflow_script
|
||||
path: test.nf
|
||||
entrypoint: test_wf
|
||||
- path: /resources_test/10x_5k_anticmv/raw/
|
||||
- path: /resources_test/10x_5k_fixed/raw/
|
||||
- path: /resources_test/reference_gencodev41_chr1
|
||||
runners:
|
||||
- type: nextflow
|
||||
38
src/single_cell/cellranger_multi_qc/integration_test.sh
Executable file
38
src/single_cell/cellranger_multi_qc/integration_test.sh
Executable file
@@ -0,0 +1,38 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
COMMON_ARGS=(
|
||||
run .
|
||||
-main-script src/single_cell/cellranger_multi_qc/test.nf
|
||||
-resume
|
||||
-profile docker
|
||||
-c src/configs/labels_ci.config
|
||||
-c src/configs/integration_tests.config
|
||||
)
|
||||
|
||||
# test_wf: GEX + AB, sample QC report only
|
||||
nextflow "${COMMON_ARGS[@]}" \
|
||||
-entry test_wf \
|
||||
--publish_dir test_output/cellranger_multi_qc/test_wf
|
||||
|
||||
# test_wf_ab_only: AB-only input, all report steps skipped
|
||||
nextflow "${COMMON_ARGS[@]}" \
|
||||
-entry test_wf_ab_only \
|
||||
--publish_dir test_output/cellranger_multi_qc/test_wf_ab_only
|
||||
|
||||
# test_wf_both_reports: GEX + AB, both MultiQC and sample QC reports
|
||||
nextflow "${COMMON_ARGS[@]}" \
|
||||
-entry test_wf_both_reports \
|
||||
--publish_dir test_output/cellranger_multi_qc/test_wf_both_reports
|
||||
|
||||
# test_wf_multiqc_only: GEX + AB, MultiQC report only
|
||||
nextflow "${COMMON_ARGS[@]}" \
|
||||
-entry test_wf_multiqc_only \
|
||||
--publish_dir test_output/cellranger_multi_qc/test_wf_multiqc_only
|
||||
81
src/single_cell/cellranger_multi_qc/main.nf
Normal file
81
src/single_cell/cellranger_multi_qc/main.nf
Normal file
@@ -0,0 +1,81 @@
|
||||
workflow run_wf {
|
||||
take:
|
||||
input_ch
|
||||
|
||||
main:
|
||||
output_ch = input_ch
|
||||
| map { id, state ->
|
||||
[id, state + [_meta: [join_id: id]]]
|
||||
}
|
||||
|
||||
| fastqc.run(
|
||||
runIf: { id, state -> state.create_multiqc_report && state.library_type?.contains("Gene Expression") },
|
||||
fromState: { id, state ->
|
||||
[
|
||||
input: state.input,
|
||||
outdir: "${id}_fastqc"
|
||||
]
|
||||
},
|
||||
toState: { id, output, state ->
|
||||
state + [output_fastqc: output.outdir]
|
||||
}
|
||||
)
|
||||
|
||||
| cellranger_multi.run(
|
||||
fromState: { id, state -> state },
|
||||
toState: { id, output, state ->
|
||||
state + [
|
||||
output_raw: output.output_raw,
|
||||
output_h5mu: output.output_h5mu
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
| multiqc.run(
|
||||
runIf: { id, state -> state.create_multiqc_report && state.library_type?.contains("Gene Expression") },
|
||||
fromState: { id, state ->
|
||||
[
|
||||
input: [state.output_fastqc, state.output_raw],
|
||||
output_report: state.output_multiqc_report
|
||||
]
|
||||
},
|
||||
toState: { id, output, state ->
|
||||
state + [_multiqc_produced: true, output_multiqc_report: output.output_report]
|
||||
}
|
||||
)
|
||||
|
||||
| generate_qc_report.run(
|
||||
runIf: { id, state -> state.create_sample_qc_report && state.library_type?.contains("Gene Expression") },
|
||||
fromState: { id, state ->
|
||||
[
|
||||
id: id,
|
||||
input: state.output_h5mu,
|
||||
ingestion_method: "cellranger_multi",
|
||||
run_cellbender: state.run_cellbender,
|
||||
output_qc_report: state.output_ingestion_qc_report,
|
||||
output_processed_h5mu: state.output_processed_h5mu
|
||||
]
|
||||
},
|
||||
toState: { id, output, state ->
|
||||
state + [
|
||||
_qc_report_produced: true,
|
||||
output_ingestion_qc_report: output.output_qc_report,
|
||||
output_processed_h5mu: output.output_processed_h5mu
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
| map { id, state ->
|
||||
def out = [output_raw: state.output_raw, output_h5mu: state.output_h5mu]
|
||||
if (state._meta) out._meta = state._meta
|
||||
if (state._multiqc_produced) out.output_multiqc_report = state.output_multiqc_report
|
||||
if (state._qc_report_produced) {
|
||||
out.output_ingestion_qc_report = state.output_ingestion_qc_report
|
||||
out.output_processed_h5mu = state.output_processed_h5mu
|
||||
}
|
||||
[id, out]
|
||||
}
|
||||
|
||||
emit:
|
||||
output_ch
|
||||
}
|
||||
10
src/single_cell/cellranger_multi_qc/nextflow.config
Normal file
10
src/single_cell/cellranger_multi_qc/nextflow.config
Normal file
@@ -0,0 +1,10 @@
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
params {
|
||||
rootDir = java.nio.file.Paths.get("$projectDir/../../../").toAbsolutePath().normalize().toString()
|
||||
}
|
||||
|
||||
// include common settings
|
||||
includeConfig("${params.rootDir}/src/configs/labels.config")
|
||||
210
src/single_cell/cellranger_multi_qc/test.nf
Normal file
210
src/single_cell/cellranger_multi_qc/test.nf
Normal file
@@ -0,0 +1,210 @@
|
||||
nextflow.enable.dsl=2
|
||||
|
||||
include { cellranger_multi_qc } from params.rootDir + "/target/nextflow/single_cell/cellranger_multi_qc/main.nf"
|
||||
|
||||
params.resources_test = "s3://openpipelines-bio/openpipeline_incubator/resources_test/"
|
||||
|
||||
workflow test_wf {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "sample_anticmv",
|
||||
input: [
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R2_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R2_001.fastq.gz")
|
||||
],
|
||||
gex_reference: resources_test.resolve("reference_gencodev41_chr1/reference_cellranger.tar.gz"),
|
||||
feature_reference: resources_test.resolve("10x_5k_anticmv/raw/feature_reference.csv"),
|
||||
library_id: ["5k_human_antiCMV_T_TBNK_connect_GEX_1_subset", "5k_human_antiCMV_T_TBNK_connect_AB_subset"],
|
||||
library_type: ["Gene Expression", "Antibody Capture"],
|
||||
output_raw: "sample_anticmv_raw/",
|
||||
output_h5mu: "sample_anticmv.h5mu",
|
||||
create_sample_qc_report: true,
|
||||
output_ingestion_qc_report: "sample_anticmv_qc_report_*.html",
|
||||
output_processed_h5mu: "sample_anticmv_processed"
|
||||
]
|
||||
])
|
||||
| map { state -> [state.id, state] }
|
||||
| cellranger_multi_qc
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
def id = output[0]
|
||||
assert id == "combined" : "Output ID should be 'combined'. Found: ${id}"
|
||||
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
|
||||
assert state.containsKey("output_raw") : "State should contain key 'output_raw'."
|
||||
assert state.output_raw.isDirectory() : "'output_raw' should be a directory."
|
||||
|
||||
assert state.containsKey("output_h5mu") : "State should contain key 'output_h5mu'."
|
||||
assert state.output_h5mu.isFile() : "'output_h5mu' should be a file."
|
||||
assert state.output_h5mu.toString().endsWith(".h5mu") : "output_h5mu should end with '.h5mu'. Found: ${state.output_h5mu}"
|
||||
|
||||
assert state.containsKey("output_ingestion_qc_report") : "State should contain key 'output_ingestion_qc_report'."
|
||||
assert state.output_ingestion_qc_report instanceof List : "'output_ingestion_qc_report' should be a list."
|
||||
assert state.output_ingestion_qc_report.every { it.isFile() } : "All QC report files should exist."
|
||||
|
||||
assert state.containsKey("output_processed_h5mu") : "State should contain key 'output_processed_h5mu'."
|
||||
assert state.output_processed_h5mu.isDirectory() : "'output_processed_h5mu' should be a directory."
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
workflow test_wf_ab_only {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "sample_ab_only",
|
||||
input: [
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R2_001.fastq.gz")
|
||||
],
|
||||
gex_reference: resources_test.resolve("reference_gencodev41_chr1/reference_cellranger.tar.gz"),
|
||||
feature_reference: resources_test.resolve("10x_5k_anticmv/raw/feature_reference.csv"),
|
||||
library_id: ["5k_human_antiCMV_T_TBNK_connect_AB_subset"],
|
||||
library_type: ["Antibody Capture"],
|
||||
output_raw: "sample_ab_only_raw/",
|
||||
output_h5mu: "sample_ab_only.h5mu",
|
||||
create_sample_qc_report: true,
|
||||
create_multiqc_report: true,
|
||||
output_ingestion_qc_report: "sample_ab_only_qc_report_*.html",
|
||||
output_processed_h5mu: "sample_ab_only_processed"
|
||||
]
|
||||
])
|
||||
| map { state -> [state.id, state] }
|
||||
| cellranger_multi_qc
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
def id = output[0]
|
||||
assert id == "sample_ab_only" : "Output ID should be 'sample_ab_only'. Found: ${id}"
|
||||
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
|
||||
assert state.containsKey("output_raw") : "State should contain key 'output_raw'."
|
||||
assert state.output_raw.isDirectory() : "'output_raw' should be a directory."
|
||||
|
||||
assert state.containsKey("output_h5mu") : "State should contain key 'output_h5mu'."
|
||||
assert state.output_h5mu.isFile() : "'output_h5mu' should be a file."
|
||||
|
||||
assert !state.containsKey("output_ingestion_qc_report") : "State should NOT contain 'output_ingestion_qc_report' for AB-only input."
|
||||
assert !state.containsKey("output_multiqc_report") : "State should NOT contain 'output_multiqc_report' for AB-only input."
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
workflow test_wf_both_reports {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "sample_both_reports",
|
||||
input: [
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R2_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R2_001.fastq.gz")
|
||||
],
|
||||
gex_reference: resources_test.resolve("reference_gencodev41_chr1/reference_cellranger.tar.gz"),
|
||||
feature_reference: resources_test.resolve("10x_5k_anticmv/raw/feature_reference.csv"),
|
||||
library_id: ["5k_human_antiCMV_T_TBNK_connect_GEX_1_subset", "5k_human_antiCMV_T_TBNK_connect_AB_subset"],
|
||||
library_type: ["Gene Expression", "Antibody Capture"],
|
||||
output_raw: "sample_both_reports_raw/",
|
||||
output_h5mu: "sample_both_reports.h5mu",
|
||||
create_sample_qc_report: true,
|
||||
create_multiqc_report: true,
|
||||
output_ingestion_qc_report: "sample_both_reports_qc_report_*.html",
|
||||
output_processed_h5mu: "sample_both_reports_processed"
|
||||
]
|
||||
])
|
||||
| map { state -> [state.id, state] }
|
||||
| cellranger_multi_qc
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
def id = output[0]
|
||||
assert id == "combined" : "Output ID should be 'combined'. Found: ${id}"
|
||||
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
|
||||
assert state.containsKey("output_raw") : "State should contain key 'output_raw'."
|
||||
assert state.output_raw.isDirectory() : "'output_raw' should be a directory."
|
||||
|
||||
assert state.containsKey("output_h5mu") : "State should contain key 'output_h5mu'."
|
||||
assert state.output_h5mu.isFile() : "'output_h5mu' should be a file."
|
||||
assert state.output_h5mu.toString().endsWith(".h5mu") : "output_h5mu should end with '.h5mu'."
|
||||
|
||||
assert state.containsKey("output_multiqc_report") : "State should contain key 'output_multiqc_report'."
|
||||
assert state.output_multiqc_report.isFile() : "'output_multiqc_report' should be a file."
|
||||
|
||||
assert state.containsKey("output_ingestion_qc_report") : "State should contain key 'output_ingestion_qc_report'."
|
||||
assert state.output_ingestion_qc_report instanceof List : "'output_ingestion_qc_report' should be a list."
|
||||
assert state.output_ingestion_qc_report.every { it.isFile() } : "All QC report files should exist."
|
||||
|
||||
assert state.containsKey("output_processed_h5mu") : "State should contain key 'output_processed_h5mu'."
|
||||
assert state.output_processed_h5mu.isDirectory() : "'output_processed_h5mu' should be a directory."
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
workflow test_wf_multiqc_only {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "sample_multiqc_only",
|
||||
input: [
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_1_subset_S1_L001_R2_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R1_001.fastq.gz"),
|
||||
resources_test.resolve("10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_subset_S2_L004_R2_001.fastq.gz")
|
||||
],
|
||||
gex_reference: resources_test.resolve("reference_gencodev41_chr1/reference_cellranger.tar.gz"),
|
||||
feature_reference: resources_test.resolve("10x_5k_anticmv/raw/feature_reference.csv"),
|
||||
library_id: ["5k_human_antiCMV_T_TBNK_connect_GEX_1_subset", "5k_human_antiCMV_T_TBNK_connect_AB_subset"],
|
||||
library_type: ["Gene Expression", "Antibody Capture"],
|
||||
output_raw: "sample_multiqc_only_raw/",
|
||||
output_h5mu: "sample_multiqc_only.h5mu",
|
||||
create_sample_qc_report: false,
|
||||
create_multiqc_report: true
|
||||
]
|
||||
])
|
||||
| map { state -> [state.id, state] }
|
||||
| cellranger_multi_qc
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
def id = output[0]
|
||||
assert id == "sample_multiqc_only" : "Output ID should be 'sample_multiqc_only'. Found: ${id}"
|
||||
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
|
||||
assert state.containsKey("output_raw") : "State should contain key 'output_raw'."
|
||||
assert state.output_raw.isDirectory() : "'output_raw' should be a directory."
|
||||
|
||||
assert state.containsKey("output_h5mu") : "State should contain key 'output_h5mu'."
|
||||
assert state.output_h5mu.isFile() : "'output_h5mu' should be a file."
|
||||
assert state.output_h5mu.toString().endsWith(".h5mu") : "output_h5mu should end with '.h5mu'."
|
||||
|
||||
assert state.containsKey("output_multiqc_report") : "State should contain key 'output_multiqc_report'."
|
||||
assert state.output_multiqc_report.isFile() : "'output_multiqc_report' should be a file."
|
||||
|
||||
assert !state.containsKey("output_ingestion_qc_report") : "State should NOT contain 'output_ingestion_qc_report' when only MultiQC is enabled."
|
||||
assert !state.containsKey("output_processed_h5mu") : "State should NOT contain 'output_processed_h5mu' when only MultiQC is enabled."
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
386
src/single_cell/process_integrate_annotate/config.vsh.yaml
Normal file
386
src/single_cell/process_integrate_annotate/config.vsh.yaml
Normal file
@@ -0,0 +1,386 @@
|
||||
name: "process_integrate_annotate"
|
||||
# scope: private
|
||||
namespace: "single_cell"
|
||||
description: |
|
||||
A pipeline to process, integrate and annotate single cell (multi-)omics data.
|
||||
Available integration methods:
|
||||
- Harmony
|
||||
- scVI
|
||||
Available annotation methods:
|
||||
- CellTypist
|
||||
- scANVI (with scArches)
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
|
||||
argument_groups:
|
||||
- name: Input (query) data arguments
|
||||
description: The input query dataset(s) to be annotated
|
||||
arguments:
|
||||
- name: "--id"
|
||||
required: true
|
||||
type: string
|
||||
description: ID of the sample.
|
||||
example: foo
|
||||
- name: "--input"
|
||||
required: true
|
||||
type: file
|
||||
description: Input query dataset(s) to be annotated
|
||||
example: input.h5mu
|
||||
- name: "--modality"
|
||||
default: "rna"
|
||||
type: string
|
||||
description: Modality to be processed. Should match the modality in the --reference dataset, if provided.
|
||||
- name: "--input_layer"
|
||||
type: string
|
||||
description: "The layer in the input data containing the raw counts, if .X is not to be used."
|
||||
required: false
|
||||
- name: "--input_var_gene_names"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
The name of the adata var column containing gene names; when no gene_name_layer is provided, the var index will be used.
|
||||
- name: "--input_reference_gene_overlap"
|
||||
type: integer
|
||||
default: 100
|
||||
min: 1
|
||||
description: |
|
||||
The minimum number of genes present in both the reference and query datasets.
|
||||
|
||||
- name: Reference data arguments
|
||||
description: Dataset to be used as a reference for label transfer and to train annotation algorithms on.
|
||||
arguments:
|
||||
- name: "--reference"
|
||||
type: file
|
||||
required: false
|
||||
example: reference.h5mu
|
||||
description: |
|
||||
The reference dataset in .h5mu format to be used as a reference mapper and to train annotation algorithms on.
|
||||
- name: "--reference_layer_raw_counts"
|
||||
type: string
|
||||
description: "The layer in the reference dataset containing the raw counts, if .X is not to be used."
|
||||
required: false
|
||||
- name: "--reference_layer_lognormalized_counts"
|
||||
type: string
|
||||
default: log_normalized
|
||||
description: "The layer in the reference dataset containing the log-normalized counts, if .X is not to be used."
|
||||
- name: "--reference_var_gene_names"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
The name of the adata .var column containing gene names if the .var index is not to be used.
|
||||
- name: "--reference_obs_batch"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
The .obs column of the reference dataset containing the batch information.
|
||||
- name: "--reference_obs_label"
|
||||
type: string
|
||||
example: cell_type
|
||||
required: false
|
||||
description: The `.obs` key of the target labels to tranfer.
|
||||
- name: "--reference_obs_label_unlabeled_category"
|
||||
type: string
|
||||
default: "Unkown"
|
||||
description: "Value in the --reference_obs_label field that indicates unlabeled observations"
|
||||
- name: "--reference_var_input"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
.var column containing highly variable genes. By default, do not subset genes.
|
||||
|
||||
|
||||
- name: Methods
|
||||
description: The available annotation and integration methods to integrate and/or annotate the query dataset(s) with.
|
||||
arguments:
|
||||
- name: "--integration_methods"
|
||||
type: string
|
||||
multiple: true
|
||||
required: false
|
||||
choices: [harmony, scvi]
|
||||
example: harmony;scvi
|
||||
description: Integration methods to be executed.
|
||||
- name: "--annotation_methods"
|
||||
type: string
|
||||
multiple: true
|
||||
required: false
|
||||
choices: [celltypist, scanvi_scarches]
|
||||
example: celltypist;scanvi_scarches
|
||||
description: Annotation methods to be executed.
|
||||
|
||||
- name: "Pre-processing options: RNA filtering"
|
||||
description: Pre-processing options for filtering RNA data
|
||||
arguments:
|
||||
- name: "--rna_min_counts"
|
||||
example: 200
|
||||
type: integer
|
||||
description: Minimum number of counts captured per cell.
|
||||
- name: "--rna_max_counts"
|
||||
example: 5000000
|
||||
type: integer
|
||||
description: Maximum number of counts captured per cell.
|
||||
- name: "--rna_min_genes_per_cell"
|
||||
type: integer
|
||||
example: 200
|
||||
description: Minimum of non-zero values per cell.
|
||||
- name: "--rna_max_genes_per_cell"
|
||||
example: 1500000
|
||||
type: integer
|
||||
description: Maximum of non-zero values per cell.
|
||||
- name: "--rna_min_cells_per_gene"
|
||||
example: 3
|
||||
type: integer
|
||||
description: Minimum of non-zero values per gene.
|
||||
- name: "--rna_min_fraction_mito"
|
||||
example: 0
|
||||
type: double
|
||||
description: Minimum fraction of UMIs that are mitochondrial.
|
||||
- name: "--rna_max_fraction_mito"
|
||||
type: double
|
||||
example: 0.2
|
||||
description: Maximum fraction of UMIs that are mitochondrial.
|
||||
|
||||
- name: "Pre-processing options: Highly variable features detection"
|
||||
description: Pre-processing options for detecting highly variable features
|
||||
arguments:
|
||||
- name: "--n_hvg"
|
||||
type: integer
|
||||
description: |
|
||||
Number of highly-variable features to keep.
|
||||
Only relevant if HVG need to be calculated across query and reference datasets (e.g. for --annotation_methods scvi_knn and harmony_knn).
|
||||
For reference mapping-based methods, the HVG's specified in --reference_var_input will be used.
|
||||
default: 2000
|
||||
|
||||
- name: "Pre-processing options: Mitochondrial & Ribosomal Gene Detection"
|
||||
description: Pre-processing options for detecting mitochondrial genes
|
||||
arguments:
|
||||
- name: "--var_name_mitochondrial_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the mitochondrial genes.
|
||||
- name: "--var_name_ribosomal_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the ribosomal genes.
|
||||
- name: "--obs_name_mitochondrial_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
When specified, write the fraction of counts originating from mitochondrial genes
|
||||
(based on --mitochondrial_gene_regex) to an .obs column with the specified name.
|
||||
Requires --var_name_mitochondrial_genes.
|
||||
- name: "--obs_name_ribosomal_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
When specified, write the fraction of counts originating from ribosomal genes
|
||||
(based on --ribosomal_gene_regex) to an .obs column with the specified name.
|
||||
Requires --var_name_ribosomal_genes.
|
||||
- name: --mitochondrial_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies mitochondrial genes from --var_gene_names.
|
||||
By default will detect human and mouse mitochondrial genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[mM][tT]-"
|
||||
- name: --ribosomal_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies ribosomal genes from --var_gene_names.
|
||||
By default will detect human and mouse ribosomal genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[Mm]?[Rr][Pp][LlSs]"
|
||||
|
||||
- name: "Pre-processing options: QC metrics calculation options"
|
||||
description: Pre-processing options for calculating QC metrics
|
||||
arguments:
|
||||
- name: "--var_qc_metrics"
|
||||
description: |
|
||||
Keys to select a boolean (containing only True or False) column from .var.
|
||||
For each cell, calculate the proportion of total values for genes which are labeled 'True',
|
||||
compared to the total sum of the values for all genes. Defaults to the combined values specified for
|
||||
--var_name_mitochondrial_genes and --highly_variable_features_var_output.
|
||||
type: string
|
||||
multiple: True
|
||||
multiple_sep: ','
|
||||
required: false
|
||||
example: "ercc,highly_variable"
|
||||
|
||||
- name: Harmony integration options
|
||||
description: Specifications for harmony integration.
|
||||
arguments:
|
||||
- name: "--harmony_theta"
|
||||
type: double
|
||||
description: |
|
||||
Diversity clustering penalty parameter. Specify for each variable in group.by.vars.
|
||||
theta=0 does not encourage any diversity. Larger values of theta
|
||||
result in more diverse clusters."
|
||||
default: 2
|
||||
example: [0, 1, 2]
|
||||
multiple: true
|
||||
- name: "--harmony_obs_covariates"
|
||||
type: string
|
||||
description: "The .obs field(s) that define the covariate(s) to regress out."
|
||||
example: ["batch", "sample"]
|
||||
required: true
|
||||
multiple: true
|
||||
default: "sample_id"
|
||||
|
||||
- name: scVI, scANVI and scArches training options
|
||||
# TODO - possibly provide separate training options for scVI, scANVI and scArches
|
||||
description: Training arguments for scVI, scANVI and scArches. Relevant for --annotation_methods 'scvi_knn' and 'scanvi_scarches'.
|
||||
arguments:
|
||||
- name: "--early_stopping"
|
||||
required: false
|
||||
type: boolean
|
||||
description: "Whether to perform early stopping with respect to the validation set."
|
||||
- name: "--early_stopping_monitor"
|
||||
choices: ["elbo_validation", "reconstruction_loss_validation", "kl_local_validation"]
|
||||
default: "elbo_validation"
|
||||
type: string
|
||||
description: "Metric logged during validation set epoch."
|
||||
- name: "--early_stopping_patience"
|
||||
type: integer
|
||||
min: 1
|
||||
default: 45
|
||||
description: "Number of validation epochs with no improvement after which training will be stopped."
|
||||
- name: "--early_stopping_min_delta"
|
||||
min: 0
|
||||
type: double
|
||||
default: 0.0
|
||||
description: "Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement."
|
||||
- name: "--max_epochs"
|
||||
type: integer
|
||||
description: "Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest."
|
||||
required: false
|
||||
- name: "--reduce_lr_on_plateau"
|
||||
description: "Whether to monitor validation loss and reduce learning rate when validation set `lr_scheduler_metric` plateaus."
|
||||
type: boolean
|
||||
default: True
|
||||
- name: "--lr_factor"
|
||||
description: "Factor to reduce learning rate."
|
||||
type: double
|
||||
default: 0.6
|
||||
min: 0
|
||||
- name: "--lr_patience"
|
||||
description: "Number of epochs with no improvement after which learning rate will be reduced."
|
||||
type: double
|
||||
default: 30
|
||||
min: 0
|
||||
|
||||
- name: CellTypist reference model
|
||||
description: The CellTypist reference model to use for annotation. If not provided, the reference dataset will be used for model training.
|
||||
arguments:
|
||||
- name: "--celltypist_model"
|
||||
type: file
|
||||
description: "Pretrained model in pkl format. If not provided, the model will be trained on the reference data and --reference should be provided."
|
||||
required: false
|
||||
example: pretrained_model.pkl
|
||||
|
||||
- name: CellTypist annotation options
|
||||
description: Specifications for CellTypist annotation.
|
||||
arguments:
|
||||
- name: "--celltypist_feature_selection"
|
||||
type: boolean
|
||||
description: "Whether to perform feature selection."
|
||||
default: false
|
||||
- name: "--celltypist_majority_voting"
|
||||
type: boolean
|
||||
description: "Whether to refine the predicted labels by running the majority voting classifier after over-clustering."
|
||||
default: false
|
||||
- name: "--celltypist_C"
|
||||
type: double
|
||||
description: "Inverse of regularization strength in logistic regression."
|
||||
default: 1.0
|
||||
- name: "--celltypist_max_iter"
|
||||
type: integer
|
||||
description: "Maximum number of iterations before reaching the minimum of the cost function."
|
||||
default: 1000
|
||||
- name: "--celltypist_use_SGD"
|
||||
type: boolean_true
|
||||
description: "Whether to use the stochastic gradient descent algorithm."
|
||||
- name: "--celltypist_min_prop"
|
||||
type: double
|
||||
description: |
|
||||
"For the dominant cell type within a subcluster, the minimum proportion of cells required to
|
||||
support naming of the subcluster by this cell type. Ignored if majority_voting is set to False.
|
||||
Subcluster that fails to pass this proportion threshold will be assigned 'Heterogeneous'."
|
||||
default: 0
|
||||
|
||||
- name: Clustering options
|
||||
description: Arguments for Leiden clustering. Only relevant for --annotation_methods `scvi_knn`, `scanvi_scarches` and `harmony_knn`.
|
||||
arguments:
|
||||
- name: "--leiden_resolution"
|
||||
type: double
|
||||
description: Control the coarseness of the clustering. Higher values lead to more clusters.
|
||||
default: [1]
|
||||
multiple: true
|
||||
|
||||
- name: Neighbor classifier arguments
|
||||
description: Arguments related to calculating the n nearest neighbors. Only relevant for --annotation_methods `scvi_knn`, `scanvi_scarches` and `harmony_knn`.
|
||||
arguments:
|
||||
- name: "--knn_weights"
|
||||
type: string
|
||||
default: "uniform"
|
||||
choices: ["uniform", "distance"]
|
||||
description: |
|
||||
Weight function used in prediction. Possible values are:
|
||||
`uniform` (all points in each neighborhood are weighted equally) or
|
||||
`distance` (weight points by the inverse of their distance)
|
||||
- name: "--knn_n_neighbors"
|
||||
type: integer
|
||||
default: 15
|
||||
min: 5
|
||||
required: false
|
||||
description: |
|
||||
The number of neighbors to use in k-neighbor graph structure used for fast approximate nearest neighbor search with PyNNDescent.
|
||||
Larger values will result in more accurate search results at the cost of computation time.
|
||||
|
||||
- name: Outputs
|
||||
description: The output file to write the annotated dataset to.
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: |
|
||||
The output file.
|
||||
example: output.h5mu
|
||||
|
||||
dependencies:
|
||||
- name: workflows/multiomics/process_samples
|
||||
alias: process_samples_workflow
|
||||
repository: openpipeline
|
||||
- name: annotate/celltypist
|
||||
repository: openpipeline
|
||||
alias: celltypist_annotation
|
||||
- name: workflows/annotation/scanvi_scarches
|
||||
repository: openpipeline
|
||||
alias: scanvi_scarches_annotation
|
||||
- name: workflows/integration/harmony_leiden
|
||||
repository: openpipeline
|
||||
alias: harmony_integration
|
||||
- name: workflows/integration/scvi_leiden
|
||||
repository: openpipeline
|
||||
alias: scvi_integration
|
||||
|
||||
resources:
|
||||
- type: nextflow_script
|
||||
path: main.nf
|
||||
entrypoint: run_wf
|
||||
|
||||
test_resources:
|
||||
- type: nextflow_script
|
||||
path: test.nf
|
||||
entrypoint: test_wf
|
||||
- path: /resources_test/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu
|
||||
- path: /resources_test/annotation_test_data/TS_Blood_filtered.h5mu
|
||||
- path: /resources_test/annotation_test_data/celltypist_model_Immune_All_Low.pkl
|
||||
|
||||
runners:
|
||||
- type: nextflow
|
||||
37
src/single_cell/process_integrate_annotate/integration_test.sh
Executable file
37
src/single_cell/process_integrate_annotate/integration_test.sh
Executable file
@@ -0,0 +1,37 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/single_cell/process_integrate_annotate/test.nf \
|
||||
-entry test_wf \
|
||||
-resume \
|
||||
-profile docker \
|
||||
-c src/configs/labels_ci.config \
|
||||
-c src/configs/integration_tests.config \
|
||||
--publish_dir test
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/single_cell/process_integrate_annotate/test.nf \
|
||||
-profile docker,no_publish \
|
||||
-resume \
|
||||
-entry test_wf_2 \
|
||||
-c src/configs/labels_ci.config \
|
||||
-c src/configs/integration_tests.config
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/single_cell/process_integrate_annotate/test.nf \
|
||||
-profile docker,no_publish \
|
||||
-resume \
|
||||
-entry test_wf_3 \
|
||||
-c src/configs/labels_ci.config \
|
||||
-c src/configs/integration_tests.config
|
||||
210
src/single_cell/process_integrate_annotate/main.nf
Normal file
210
src/single_cell/process_integrate_annotate/main.nf
Normal file
@@ -0,0 +1,210 @@
|
||||
workflow run_wf {
|
||||
take:
|
||||
input_ch
|
||||
|
||||
main:
|
||||
output_ch = input_ch
|
||||
| map { id, state ->
|
||||
def new_state = state + [ "query_processed": state.output, "_meta": ["join_id": id] ]
|
||||
[id, new_state]
|
||||
}
|
||||
// Make sure parameters are filled out correctly
|
||||
| map { id, state ->
|
||||
def new_state = [:]
|
||||
// Check that at least one of annotation_methods or integration_methods is not empty
|
||||
if (!state.annotation_methods && !state.integration_methods) {
|
||||
throw new RuntimeException("At least one of --annotation_methods or --integration_methods must be provided")
|
||||
}
|
||||
// Check CellTypist arguments
|
||||
if (state.annotation_methods && state.annotation_methods.contains("celltypist") &&
|
||||
(!state.celltypist_model && !state.reference)) {
|
||||
throw new RuntimeException("Celltypist was selected as an annotation method. Either --celltypist_model or --reference must be provided.")
|
||||
}
|
||||
if (state.annotation_methods && state.annotation_methods.contains("celltypist") && state.celltypist_model && state.reference ) {
|
||||
System.err.println(
|
||||
"Warning: --celltypist_model is set and a --reference was provided. \
|
||||
The pre-trained Celltypist model will be used for annotation, the reference will be ignored."
|
||||
)
|
||||
}
|
||||
|
||||
[id, state + new_state]
|
||||
}
|
||||
| process_samples_workflow.run(
|
||||
fromState: [
|
||||
"input": "input",
|
||||
"id": "id",
|
||||
"rna_layer": "input_layer",
|
||||
"rna_min_counts": "rna_min_counts",
|
||||
"rna_max_counts": "rna_max_counts",
|
||||
"rna_min_genes_per_cell": "rna_min_genes_per_cell",
|
||||
"rna_max_genes_per_cell": "rna_max_genes_per_cell",
|
||||
"rna_min_cells_per_gene": "rna_min_cells_per_gene",
|
||||
"rna_min_fraction_mito": "rna_min_fraction_mito",
|
||||
"rna_max_fraction_mito": "rna_max_fraction_mito",
|
||||
"rna_min_fraction_ribo": "rna_min_fraction_ribo",
|
||||
"rna_max_fraction_ribo": "rna_max_fraction_ribo",
|
||||
"var_name_mitochondrial_genes": "var_name_mitochondrial_genes",
|
||||
"var_name_ribosomal_genes": "var_name_ribosomal_genes",
|
||||
"var_gene_names": "input_var_gene_names",
|
||||
"mitochondrial_gene_regex": "mitochondrial_gene_regex",
|
||||
"ribosomal_gene_regex": "ribosomal_gene_regex",
|
||||
"var_qc_metrics": "var_qc_metrics"
|
||||
],
|
||||
args: [
|
||||
"pca_overwrite": "true",
|
||||
"add_id_obs_output": "sample_id",
|
||||
"highly_variable_features_var_output": "filter_with_hvg_query"
|
||||
],
|
||||
toState: ["query_processed": "output"],
|
||||
)
|
||||
// Integration methods
|
||||
| harmony_integration.run(
|
||||
runIf: { id, state ->
|
||||
state.integration_methods && state.integration_methods.contains("harmony")
|
||||
},
|
||||
fromState: [
|
||||
"id": "id",
|
||||
"input": "query_processed",
|
||||
"modality": "modality",
|
||||
"theta": "harmony_theta",
|
||||
"leiden_resolution": "leiden_resolution",
|
||||
"obs_covariates": "harmony_obs_covariates"
|
||||
],
|
||||
args: [
|
||||
"layer": "log_normalized",
|
||||
"embedding": "X_pca",
|
||||
"obsm_integrated": "X_harmony_integrated",
|
||||
"uns_neighbors": "harmony_integration_neighbors",
|
||||
"obsp_neighbor_distances": "harmony_integration_neighbor_distances",
|
||||
"obsp_neighbor_connectivities": "harmony_integration_neighbor_connectivities",
|
||||
"obs_cluster": "harmony_integration_leiden",
|
||||
"obsm_umap": "X_harmony_umap"
|
||||
],
|
||||
toState: [ "query_processed": "output" ]
|
||||
)
|
||||
|
||||
| scvi_integration.run(
|
||||
runIf: { id, state ->
|
||||
state.integration_methods && state.integration_methods.contains("scvi")
|
||||
},
|
||||
fromState: [
|
||||
"id": "id",
|
||||
"input": "query_processed",
|
||||
"layer": "input_layer",
|
||||
"modality": "modality",
|
||||
"leiden_resolution": "leiden_resolution",
|
||||
"early_stopping": "early_stopping",
|
||||
"early_stopping_monitor": "early_stopping_monitor",
|
||||
"early_stopping_patience": "early_stopping_patience",
|
||||
"early_stopping_min_delta": "early_stopping_min_delta",
|
||||
"max_epochs": "max_epochs",
|
||||
"reduce_lr_on_plateau": "reduce_lr_on_plateau",
|
||||
"lr_factor": "lr_factor",
|
||||
"lr_patience": "lr_patience"
|
||||
],
|
||||
args: [
|
||||
"obsm_output": "X_scvi_integrated",
|
||||
"obs_batch": "sample_id",
|
||||
"var_input": "filter_with_hvg_query",
|
||||
"uns_neighbors": "scvi_integration_neighbors",
|
||||
"obsp_neighbor_distances": "scvi_integration_neighbor_distances",
|
||||
"obsp_neighbor_connectivities": "scvi_integration_neighbor_connectivities",
|
||||
"obs_cluster": "scvi_integration_leiden",
|
||||
"obsm_umap": "X_scvi_umap"
|
||||
],
|
||||
toState: [ "query_processed": "output", "scvi_model": "output_model" ]
|
||||
)
|
||||
|
||||
// Annotation methods
|
||||
| celltypist_annotation.run(
|
||||
runIf: { id, state -> state.annotation_methods && state.annotation_methods.contains("celltypist") && state.celltypist_model },
|
||||
fromState: [
|
||||
"input": "query_processed",
|
||||
"modality": "modality",
|
||||
"input_var_gene_names": "input_var_gene_names",
|
||||
"input_reference_gene_overlap": "input_reference_gene_overlap",
|
||||
"model": "celltypist_model",
|
||||
"majority_voting": "celltypist_majority_voting"
|
||||
],
|
||||
args: [
|
||||
// log normalized counts are expected for celltypist
|
||||
"input_layer": "log_normalized",
|
||||
"output_obs_predictions": "celltypist_pred",
|
||||
"output_obs_probability": "celltypist_proba"
|
||||
],
|
||||
toState: [ "query_processed": "output" ]
|
||||
)
|
||||
|
||||
| celltypist_annotation.run(
|
||||
runIf: { id, state -> state.annotation_methods && state.annotation_methods.contains("celltypist") && !state.celltypist_model },
|
||||
fromState: [
|
||||
"input": "query_processed",
|
||||
"modality": "modality",
|
||||
"input_var_gene_names": "input_var_gene_names",
|
||||
"input_reference_gene_overlap": "input_reference_gene_overlap",
|
||||
"reference": "reference",
|
||||
"reference_layer": "reference_layer_lognormalized_counts",
|
||||
"reference_obs_target": "reference_obs_label",
|
||||
"reference_var_gene_names": "reference_var_gene_names",
|
||||
"reference_obs_batch": "reference_obs_batch",
|
||||
"reference_var_input": "reference_var_input",
|
||||
"feature_selection": "celltypist_feature_selection",
|
||||
"C": "celltypist_C",
|
||||
"max_iter": "celltypist_max_iter",
|
||||
"use_SGD": "celltypist_use_SGD",
|
||||
"min_prop": "celltypist_min_prop",
|
||||
"majority_voting": "celltypist_majority_voting"
|
||||
],
|
||||
args: [
|
||||
// log normalized counts are expected for celltypist
|
||||
"input_layer": "log_normalized",
|
||||
"output_obs_predictions": "celltypist_pred",
|
||||
"output_obs_probability": "celltypist_proba"
|
||||
],
|
||||
toState: [ "query_processed": "output" ]
|
||||
)
|
||||
|
||||
| scanvi_scarches_annotation.run(
|
||||
runIf: { id, state -> state.annotation_methods && state.annotation_methods.contains("scanvi_scarches")},
|
||||
fromState: [
|
||||
"id": "id",
|
||||
"input": "query_processed",
|
||||
"modality": "modality",
|
||||
"layer": "input_layer",
|
||||
"input_var_gene_names": "input_var_gene_names",
|
||||
"reference": "reference",
|
||||
"reference_obs_target": "reference_obs_label",
|
||||
"reference_obs_batch_label": "reference_obs_batch",
|
||||
"reference_var_hvg": "reference_var_input",
|
||||
"reference_var_gene_names": "reference_var_gene_names",
|
||||
"unlabeled_category": "reference_obs_label_unlabeled_category",
|
||||
"early_stopping": "early_stopping",
|
||||
"early_stopping_monitor": "early_stopping_monitor",
|
||||
"early_stopping_patience": "early_stopping_patience",
|
||||
"early_stopping_min_delta": "early_stopping_min_delta",
|
||||
"max_epochs": "max_epochs",
|
||||
"reduce_lr_on_plateau": "reduce_lr_on_plateau",
|
||||
"lr_factor": "lr_factor",
|
||||
"lr_patience": "lr_patience",
|
||||
"leiden_resolution": "leiden_resolution",
|
||||
"knn_weights": "knn_weights",
|
||||
"knn_n_neighbors": "knn_n_neighbors"
|
||||
],
|
||||
args: [
|
||||
"input_obs_batch_label": "sample_id",
|
||||
"output_obs_predictions": "scanvi_knn_pred",
|
||||
"output_obs_probability": "scanvi_knn_proba"
|
||||
],
|
||||
toState: [ "query_processed": "output" ]
|
||||
)
|
||||
|
||||
| map {id, state ->
|
||||
def new_state = state + ["output": state.query_processed]
|
||||
[id, new_state]
|
||||
}
|
||||
|
||||
| setState(["output", "_meta"])
|
||||
|
||||
emit:
|
||||
output_ch
|
||||
}
|
||||
10
src/single_cell/process_integrate_annotate/nextflow.config
Normal file
10
src/single_cell/process_integrate_annotate/nextflow.config
Normal file
@@ -0,0 +1,10 @@
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
params {
|
||||
rootDir = java.nio.file.Paths.get("$projectDir/../../../").toAbsolutePath().normalize().toString()
|
||||
}
|
||||
|
||||
// include common settings
|
||||
includeConfig("${params.rootDir}/src/configs/labels.config")
|
||||
151
src/single_cell/process_integrate_annotate/test.nf
Normal file
151
src/single_cell/process_integrate_annotate/test.nf
Normal file
@@ -0,0 +1,151 @@
|
||||
nextflow.enable.dsl=2
|
||||
|
||||
include { process_integrate_annotate } from params.rootDir + "/target/nextflow/single_cell/process_integrate_annotate/main.nf"
|
||||
params.resources_test = "s3://openpipelines-bio/openpipeline_incubator/resources_test/"
|
||||
|
||||
workflow test_wf {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList(
|
||||
[
|
||||
[
|
||||
id: "simple_annotation_test",
|
||||
input: resources_test.resolve("pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"),
|
||||
reference: resources_test.resolve("annotation_test_data/TS_Blood_filtered.h5mu"),
|
||||
reference_var_gene_names: "ensemblid",
|
||||
reference_layer_lognormalized_counts: "log_normalized",
|
||||
reference_obs_batch: "donor_assay",
|
||||
reference_obs_label: "cell_type",
|
||||
max_epochs: "5",
|
||||
annotation_methods: "celltypist;scanvi_scarches"
|
||||
],
|
||||
[
|
||||
id: "simple_integration_test",
|
||||
input: resources_test.resolve("pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"),
|
||||
integration_methods: "harmony;scvi"
|
||||
],
|
||||
[
|
||||
id: "simple_execution_test",
|
||||
input: resources_test.resolve("pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"),
|
||||
reference: resources_test.resolve("annotation_test_data/TS_Blood_filtered.h5mu"),
|
||||
reference_var_gene_names: "ensemblid",
|
||||
reference_layer_lognormalized_counts: "log_normalized",
|
||||
reference_obs_batch: "donor_assay",
|
||||
reference_obs_label: "cell_type",
|
||||
max_epochs: "5",
|
||||
annotation_methods: "scanvi_scarches",
|
||||
integration_methods: "harmony"
|
||||
]
|
||||
])
|
||||
| view {"State at start: $it"}
|
||||
| map{ state -> [state.id, state] }
|
||||
| process_integrate_annotate
|
||||
| view {"After AaaS: $it"}
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
// check id
|
||||
def id = output[0]
|
||||
assert id == "merged" : "Output ID should be `merged`"
|
||||
|
||||
// check output
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
assert state.containsKey("output") : "Output should contain key 'output'."
|
||||
assert state.output.isFile() : "'output' should be a file."
|
||||
assert state.output.toString().endsWith(".h5mu") : "Output file should end with '.h5mu'. Found: ${state.output}"
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
workflow test_wf_2 {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList(
|
||||
[
|
||||
[
|
||||
id: "pbmc_with_more_params",
|
||||
input: resources_test.resolve("pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"),
|
||||
rna_min_counts: 2,
|
||||
rna_max_counts: 1000000,
|
||||
rna_min_genes_per_cell: 1,
|
||||
rna_max_genes_per_cell: 1000000,
|
||||
rna_min_cells_per_gene: 1,
|
||||
rna_min_fraction_mito: 0.0,
|
||||
rna_max_fraction_mito: 1.0,
|
||||
prot_min_counts: 3,
|
||||
prot_max_counts: 1000000,
|
||||
prot_min_proteins_per_cell: 1,
|
||||
prot_max_proteins_per_cell: 1000000,
|
||||
prot_min_cells_per_protein: 1,
|
||||
var_name_mitochondrial_genes: 'mitochondrial',
|
||||
obs_name_mitochondrial_fraction: 'fraction_mitochondrial',
|
||||
add_id_to_obs: true,
|
||||
add_id_make_observation_keys_unique: true,
|
||||
add_id_obs_output: "sample_id",
|
||||
reference: resources_test.resolve("annotation_test_data/TS_Blood_filtered.h5mu"),
|
||||
reference_var_gene_names: "ensemblid",
|
||||
reference_layer_lognormalized_counts: "log_normalized",
|
||||
reference_obs_batch: "donor_assay",
|
||||
reference_obs_label: "cell_type",
|
||||
annotation_methods: "celltypist",
|
||||
integration_methods: "scvi"
|
||||
]
|
||||
])
|
||||
| view {"State at start: $it"}
|
||||
| map { state -> [state.id, state] }
|
||||
| process_integrate_annotate
|
||||
| view {"After AaaS: $it"}
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
// check id
|
||||
def id = output[0]
|
||||
assert id == "merged" : "Output ID should be `merged`"
|
||||
|
||||
// check output
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
assert state.containsKey("output") : "Output should contain key 'output'."
|
||||
assert state.output.isFile() : "'output' should be a file."
|
||||
assert state.output.toString().endsWith(".h5mu") : "Output file should end with '.h5mu'. Found: ${state.output}"
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
|
||||
workflow test_wf_3 {
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList(
|
||||
[
|
||||
[
|
||||
id: "celltypist_model",
|
||||
input: resources_test.resolve("pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"),
|
||||
celltypist_model: resources_test.resolve("annotation_test_data/celltypist_model_Immune_All_Low.pkl"),
|
||||
annotation_methods: "celltypist",
|
||||
input_var_gene_names: "gene_symbol"
|
||||
]
|
||||
])
|
||||
| view {"State at start: $it"}
|
||||
| map{ state -> [state.id, state] }
|
||||
| process_integrate_annotate
|
||||
| view {"After AaaS: $it"}
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
// check id
|
||||
def id = output[0]
|
||||
assert id == "merged" : "Output ID should be `merged`"
|
||||
|
||||
// check output
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
assert state.containsKey("output") : "Output should contain key 'output'."
|
||||
assert state.output.isFile() : "'output' should be a file."
|
||||
assert state.output.toString().endsWith(".h5mu") : "Output file should end with '.h5mu'. Found: ${state.output}"
|
||||
|
||||
"Output: $output"
|
||||
}
|
||||
}
|
||||
0
target/.build.yaml
Normal file
0
target/.build.yaml
Normal file
@@ -0,0 +1,388 @@
|
||||
name: "fastqc"
|
||||
version: "v0.4.2"
|
||||
authors:
|
||||
- name: "Theodoro Gasperin Terra Camargo"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
email: "theodorogtc@gmail.com"
|
||||
github: "tgaspe"
|
||||
linkedin: "theodoro-gasperin-terra-camargo"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
description: "FASTQ file(s) to be analyzed.\n"
|
||||
info: null
|
||||
example:
|
||||
- "input.fq"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
description: "At least one of the output options (--html, --zip, --summary, --data)\
|
||||
\ must be used.\n"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--outdir"
|
||||
description: "Output directory where the results will be saved.\n"
|
||||
info: null
|
||||
example:
|
||||
- "results"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--html"
|
||||
description: "Create the HTML report of the results. \n'*' wild card must be provided\
|
||||
\ in the output file name. \nWild card will be replaced by the input file basename.\n\
|
||||
e.g. \n --input \"sample_1.fq\"\n --html \"*.html\"\n would create an output\
|
||||
\ html file named sample_1.html\n"
|
||||
info: null
|
||||
example:
|
||||
- "*.html"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--zip"
|
||||
description: "Create the zip file(s) containing: html report, data, images, icons,\
|
||||
\ summary, etc.\n'*' wild card must be provided in the output file name.\nWild\
|
||||
\ card will be replaced by the input basename.\ne.g. \n --input \"sample_1.fq\"\
|
||||
\n --html \"*.zip\"\n would create an output zip file named sample_1.zip\n"
|
||||
info: null
|
||||
example:
|
||||
- "*.zip"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--summary"
|
||||
description: "Create the summary file(s).\n'*' wild card must be provided in the\
|
||||
\ output file name.\nWild card will be replaced by the input basename.\ne.g.\
|
||||
\ \n --input \"sample_1.fq\"\n --summary \"*_summary.txt\"\n would create\
|
||||
\ an output summary.txt file named sample_1_summary.txt\n"
|
||||
info: null
|
||||
example:
|
||||
- "*_summary.txt"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--data"
|
||||
description: "Create the data file(s).\n'*' wild card must be provided in the\
|
||||
\ output file name.\nWild card will be replaced by the input basename.\ne.g.\
|
||||
\ \n --input \"sample_1.fq\"\n --summary \"*_data.txt\"\n would create an\
|
||||
\ output data.txt file named sample_1_data.txt\n"
|
||||
info: null
|
||||
example:
|
||||
- "*_data.txt"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "Options"
|
||||
arguments:
|
||||
- type: "boolean_true"
|
||||
name: "--casava"
|
||||
description: "Files come from raw casava output. Files in the same sample\ngroup\
|
||||
\ (differing only by the group number) will be analysed\nas a set rather than\
|
||||
\ individually. Sequences with the filter\nflag set in the header will be excluded\
|
||||
\ from the analysis.\nFiles must have the same names given to them by casava\n\
|
||||
(including being gzipped and ending with .gz) otherwise they\nwon't be grouped\
|
||||
\ together correctly.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--nano"
|
||||
description: "Files come from nanopore sequences and are in fast5 format. In\n\
|
||||
this mode you can pass in directories to process and the program\nwill take\
|
||||
\ in all fast5 files within those directories and produce\na single output file\
|
||||
\ from the sequences found in all files.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--nofilter"
|
||||
description: "If running with --casava then don't remove read flagged by\ncasava\
|
||||
\ as poor quality when performing the QC analysis.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--nogroup"
|
||||
description: "Disable grouping of bases for reads >50bp. \nAll reports will show\
|
||||
\ data for every base in the read. \nWARNING: Using this option will cause fastqc\
|
||||
\ to crash \nand burn if you use it on really long reads, and your \nplots may\
|
||||
\ end up a ridiculous size. You have been warned!\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "integer"
|
||||
name: "--min_length"
|
||||
description: "Sets an artificial lower limit on the length of the \nsequence to\
|
||||
\ be shown in the report. As long as you \nset this to a value greater or equal\
|
||||
\ to your longest \nread length then this will be the sequence length used \n\
|
||||
to create your read groups. This can be useful for making\ndirectly comparable\
|
||||
\ statistics from datasets with somewhat \nvariable read lengths.\n"
|
||||
info: null
|
||||
example:
|
||||
- 0
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--format"
|
||||
alternatives:
|
||||
- "-f"
|
||||
description: "Bypasses the normal sequence file format detection and \nforces\
|
||||
\ the program to use the specified format. \nValid formats are bam, sam, bam_mapped,\
|
||||
\ sam_mapped, and fastq.\n"
|
||||
info: null
|
||||
example:
|
||||
- "bam"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--contaminants"
|
||||
alternatives:
|
||||
- "-c"
|
||||
description: "Specifies a non-default file which contains the list \nof contaminants\
|
||||
\ to screen overrepresented sequences against. \nThe file must contain sets\
|
||||
\ of named contaminants in the form\nname[tab]sequence. Lines prefixed with\
|
||||
\ a hash will be ignored.\n"
|
||||
info: null
|
||||
example:
|
||||
- "contaminants.txt"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--adapters"
|
||||
alternatives:
|
||||
- "-a"
|
||||
description: "Specifies a non-default file which contains the list of \nadapter\
|
||||
\ sequences which will be explicitly searched against \nthe library. The file\
|
||||
\ must contain sets of named adapters \nin the form name[tab]sequence. Lines\
|
||||
\ prefixed with a hash will be ignored.\n"
|
||||
info: null
|
||||
example:
|
||||
- "adapters.txt"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--limits"
|
||||
alternatives:
|
||||
- "-l"
|
||||
description: "Specifies a non-default file which contains \na set of criteria\
|
||||
\ which will be used to determine \nthe warn/error limits for the various modules.\
|
||||
\ \nThis file can also be used to selectively remove \nsome modules from the\
|
||||
\ output altogether. The format \nneeds to mirror the default limits.txt file\
|
||||
\ found in \nthe Configuration folder.\n"
|
||||
info: null
|
||||
example:
|
||||
- "limits.txt"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--kmers"
|
||||
alternatives:
|
||||
- "-k"
|
||||
description: "Specifies the length of Kmer to look for in the Kmer \ncontent module.\
|
||||
\ Specified Kmer length must be between \n2 and 10. Default length is 7 if not\
|
||||
\ specified.\n"
|
||||
info: null
|
||||
example:
|
||||
- 7
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--quiet"
|
||||
alternatives:
|
||||
- "-q"
|
||||
description: "Suppress all progress messages on stdout and only report errors.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
resources:
|
||||
- type: "bash_script"
|
||||
path: "script.sh"
|
||||
is_executable: true
|
||||
description: "FastQC - A high throughput sequence QC analysis tool."
|
||||
test_resources:
|
||||
- type: "bash_script"
|
||||
path: "test.sh"
|
||||
is_executable: true
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
requirements:
|
||||
commands:
|
||||
- "ps"
|
||||
keywords:
|
||||
- "Quality control"
|
||||
- "BAM"
|
||||
- "SAM"
|
||||
- "FASTQ"
|
||||
license: "GPL-3.0, Apache-2.0"
|
||||
links:
|
||||
repository: "https://github.com/s-andrews/FastQC"
|
||||
homepage: "https://www.bioinformatics.babraham.ac.uk/projects/fastqc/"
|
||||
documentation: "https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/"
|
||||
issue_tracker: "https://github.com/s-andrews/FastQC/issues"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "biocontainers/fastqc:v0.11.9_cv8"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v0.4.2"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "docker"
|
||||
run:
|
||||
- "echo \"fastqc: $(fastqc --version | sed -n 's/^FastQC //p')\" > /var/software_versions.txt\n"
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/fastqc/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/fastqc"
|
||||
executable: "target/nextflow/fastqc/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "02b470d967226478af69c37eae2b1256be1b78fd"
|
||||
git_remote: "https://github.com/viash-hub/biobox"
|
||||
package_config:
|
||||
name: "biobox"
|
||||
version: "v0.4.2"
|
||||
summary: "A curated collection of high-quality, standalone bioinformatics components\
|
||||
\ built with [Viash](https://viash.io).\n"
|
||||
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
|
||||
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
|
||||
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
|
||||
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
|
||||
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
|
||||
\ Run components directly via the command line or seamlessly integrate them into\
|
||||
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
|
||||
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
|
||||
\ * Containerized (Docker) for dependency management and reproducibility.\n\
|
||||
\ * Unit tested for verified functionality.\n"
|
||||
info: null
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".requirements.commands += ['ps']\n"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v0.4.2'"
|
||||
keywords:
|
||||
- "bioinformatics"
|
||||
- "modules"
|
||||
- "sequencing"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/biobox"
|
||||
issue_tracker: "https://github.com/viash-hub/biobox/issues"
|
||||
4111
target/dependencies/vsh/vsh/biobox/v0.4.2/nextflow/fastqc/main.nf
Normal file
4111
target/dependencies/vsh/vsh/biobox/v0.4.2/nextflow/fastqc/main.nf
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'fastqc'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v0.4.2'
|
||||
description = 'FastQC - A high throughput sequence QC analysis tool.'
|
||||
author = 'Theodoro Gasperin Terra Camargo'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,175 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "fastqc",
|
||||
"description": "FastQC - A high throughput sequence QC analysis tool.",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"inputs": {
|
||||
"title": "Inputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "FASTQ file(s) to be analyzed.\n",
|
||||
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`, example: `[\"input.fq\"]`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": {
|
||||
"title": "Outputs",
|
||||
"type": "object",
|
||||
"description": "At least one of the output options (--html, --zip, --summary, --data) must be used.\n",
|
||||
"properties": {
|
||||
"outdir": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Output directory where the results will be saved.\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.outdir\"`, direction: `output`, example: `\"results\"`. ",
|
||||
"default": "$id.$key.outdir"
|
||||
},
|
||||
"html": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"description": "Create the HTML report of the results",
|
||||
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.html_*.html\"`, direction: `output`, example: `[\"*.html\"]`. ",
|
||||
"default": "$id.$key.html_*.html"
|
||||
},
|
||||
"zip": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"description": "Create the zip file(s) containing: html report, data, images, icons, summary, etc.\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
|
||||
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.zip_*.zip\"`, direction: `output`, example: `[\"*.zip\"]`. ",
|
||||
"default": "$id.$key.zip_*.zip"
|
||||
},
|
||||
"summary": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"description": "Create the summary file(s).\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
|
||||
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.summary_*.txt\"`, direction: `output`, example: `[\"*_summary.txt\"]`. ",
|
||||
"default": "$id.$key.summary_*.txt"
|
||||
},
|
||||
"data": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"description": "Create the data file(s).\n'*' wild card must be provided in the output file name.\nWild card will be replaced by the input basename.\ne.g",
|
||||
"help_text": "Type: `file`, multiple: `True`, default: `\"$id.$key.data_*.txt\"`, direction: `output`, example: `[\"*_data.txt\"]`. ",
|
||||
"default": "$id.$key.data_*.txt"
|
||||
}
|
||||
}
|
||||
},
|
||||
"options": {
|
||||
"title": "Options",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"casava": {
|
||||
"type": "boolean",
|
||||
"description": "Files come from raw casava output",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"nano": {
|
||||
"type": "boolean",
|
||||
"description": "Files come from nanopore sequences and are in fast5 format",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"nofilter": {
|
||||
"type": "boolean",
|
||||
"description": "If running with --casava then don't remove read flagged by\ncasava as poor quality when performing the QC analysis.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"nogroup": {
|
||||
"type": "boolean",
|
||||
"description": "Disable grouping of bases for reads >50bp",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"min_length": {
|
||||
"type": "integer",
|
||||
"description": "Sets an artificial lower limit on the length of the \nsequence to be shown in the report",
|
||||
"help_text": "Type: `integer`, multiple: `False`, example: `0`. "
|
||||
},
|
||||
"format": {
|
||||
"type": "string",
|
||||
"description": "Bypasses the normal sequence file format detection and \nforces the program to use the specified format",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"bam\"`. "
|
||||
},
|
||||
"contaminants": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Specifies a non-default file which contains the list \nof contaminants to screen overrepresented sequences against",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"contaminants.txt\"`. "
|
||||
},
|
||||
"adapters": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Specifies a non-default file which contains the list of \nadapter sequences which will be explicitly searched against \nthe library",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"adapters.txt\"`. "
|
||||
},
|
||||
"limits": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Specifies a non-default file which contains \na set of criteria which will be used to determine \nthe warn/error limits for the various modules",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"limits.txt\"`. "
|
||||
},
|
||||
"kmers": {
|
||||
"type": "integer",
|
||||
"description": "Specifies the length of Kmer to look for in the Kmer \ncontent module",
|
||||
"help_text": "Type: `integer`, multiple: `False`, example: `7`. "
|
||||
},
|
||||
"quiet": {
|
||||
"type": "boolean",
|
||||
"description": "Suppress all progress messages on stdout and only report errors.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/inputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/outputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/options"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,496 @@
|
||||
name: "multiqc"
|
||||
version: "v0.4.2"
|
||||
authors:
|
||||
- name: "Dorien Roosen"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
email: "dorien@data-intuitive.com"
|
||||
github: "dorien-er"
|
||||
linkedin: "dorien-roosen"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
description: "File paths to be searched for analysis results to be included in\
|
||||
\ the report.\n"
|
||||
info: null
|
||||
example:
|
||||
- "data/results"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "Ouput"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output_report"
|
||||
description: "Filepath of the generated report.\n"
|
||||
info: null
|
||||
example:
|
||||
- "multiqc_report.html"
|
||||
must_exist: false
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output_data"
|
||||
description: "Output directory for parsed data files. If not provided, parsed\
|
||||
\ data will not be published.\n"
|
||||
info: null
|
||||
example:
|
||||
- "multiqc_data"
|
||||
must_exist: false
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output_plots"
|
||||
description: "Output directory for generated plots. If not provided, plots will\
|
||||
\ not be published.\n"
|
||||
info: null
|
||||
example:
|
||||
- "multiqc_plots"
|
||||
must_exist: false
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Modules and analyses to run"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--include_modules"
|
||||
description: "Use only these module"
|
||||
info: null
|
||||
example:
|
||||
- "fastqc"
|
||||
- "cutadapt"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--exclude_modules"
|
||||
description: "Do not use only these modules"
|
||||
info: null
|
||||
example:
|
||||
- "fastqc"
|
||||
- "cutadapt"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--ignore_analysis"
|
||||
info: null
|
||||
example:
|
||||
- "run_one/*"
|
||||
- "run_two/*"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--ignore_samples"
|
||||
info: null
|
||||
example:
|
||||
- "sample_1*"
|
||||
- "sample_3*"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--ignore_symlinks"
|
||||
description: "Ignore symlinked directories and files"
|
||||
info: null
|
||||
direction: "input"
|
||||
- name: "Sample name handling"
|
||||
arguments:
|
||||
- type: "boolean_true"
|
||||
name: "--dirs"
|
||||
description: "Prepend directory to sample names to avoid clashing filenames"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "integer"
|
||||
name: "--dirs_depth"
|
||||
description: "Prepend n directories to sample names. Negative number to take from\
|
||||
\ start of path."
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--full_names"
|
||||
description: "Do not clean the sample names (leave as full file name)"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--fn_as_s_name"
|
||||
description: "Use the log filename as the sample name"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "file"
|
||||
name: "--replace_names"
|
||||
description: "TSV file to rename sample names during report generation"
|
||||
info: null
|
||||
example:
|
||||
- "replace_names.tsv"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Report Customisation"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--title"
|
||||
description: "Report title. Printed as page header, used for filename if not otherwise\
|
||||
\ specified.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--comment"
|
||||
description: "Custom comment, will be printed at the top of the report.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--template"
|
||||
description: "Report template to use.\n"
|
||||
info: null
|
||||
required: false
|
||||
choices:
|
||||
- "default"
|
||||
- "gathered"
|
||||
- "geo"
|
||||
- "highcharts"
|
||||
- "sections"
|
||||
- "simple"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--sample_names"
|
||||
description: "TSV file containing alternative sample names for renaming buttons\
|
||||
\ in the report.\n"
|
||||
info: null
|
||||
example:
|
||||
- "sample_names.tsv"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--sample_filters"
|
||||
description: "TSV file containing show/hide patterns for the report\n"
|
||||
info: null
|
||||
example:
|
||||
- "sample_filters.tsv"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--custom_css_file"
|
||||
description: "Custom CSS file to add to the final report\n"
|
||||
info: null
|
||||
example:
|
||||
- "custom_style_sheet.css"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--profile_runtime"
|
||||
description: "Add analysis of how long MultiQC takes to run to the report\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- name: "MultiQC behaviour"
|
||||
arguments:
|
||||
- type: "boolean_true"
|
||||
name: "--verbose"
|
||||
description: "Increase output verbosity.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--quiet"
|
||||
description: "Only show log warnings\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--strict"
|
||||
description: "Don't catch exceptions, run additional code checks to help development.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--development"
|
||||
description: "Development mode. Do not compress and minimise JS, export uncompressed\
|
||||
\ plot data.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--require_logs"
|
||||
description: "Require all explicitly requested modules to have log files. If not,\
|
||||
\ MultiQC will exit with an error.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--no_megaqc_upload"
|
||||
description: "Don't upload generated report to MegaQC, even if MegaQC options\
|
||||
\ are found.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--no_ansi"
|
||||
description: "Disable coloured log output.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "string"
|
||||
name: "--cl_config"
|
||||
description: "YAML formatted string that allows to customize MultiQC behaviour\
|
||||
\ like input file detection.\n"
|
||||
info: null
|
||||
example:
|
||||
- "qualimap_config: { general_stats_coverage: [20,40,200] }"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Output format"
|
||||
arguments:
|
||||
- type: "boolean_true"
|
||||
name: "--flat"
|
||||
description: "Use only flat plots (static images).\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--interactive"
|
||||
description: "Use only interactive plots (in-browser Javascript).\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--data_dir"
|
||||
description: "Force the parsed data directory to be created.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--no_data_dir"
|
||||
description: "Prevent the parsed data directory from being created.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--zip_data_dir"
|
||||
description: "Compress the data directory.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "string"
|
||||
name: "--data_format"
|
||||
description: "Output parsed data in a different format than the default 'txt'.\n"
|
||||
info: null
|
||||
required: false
|
||||
choices:
|
||||
- "tsv"
|
||||
- "csv"
|
||||
- "json"
|
||||
- "yaml"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--pdf"
|
||||
description: "Creates PDF report with the 'simple' template. Requires Pandoc to\
|
||||
\ be installed.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
resources:
|
||||
- type: "bash_script"
|
||||
path: "script.sh"
|
||||
is_executable: true
|
||||
description: "MultiQC aggregates results from bioinformatics analyses across many\
|
||||
\ samples into a single report.\nIt searches a given directory for analysis logs\
|
||||
\ and compiles a HTML report. It's a general use tool, perfect for summarising the\
|
||||
\ output from numerous bioinformatics tools.\n"
|
||||
test_resources:
|
||||
- type: "bash_script"
|
||||
path: "test.sh"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "test_data"
|
||||
info:
|
||||
keywords:
|
||||
- "QC"
|
||||
- "html report"
|
||||
- "aggregate analysis"
|
||||
links:
|
||||
homepage: "https://multiqc.info/"
|
||||
documentation: "https://multiqc.info/docs/"
|
||||
repository: "https://github.com/MultiQC/MultiQC"
|
||||
references:
|
||||
doi: "10.1093/bioinformatics/btw354"
|
||||
licence: "GPL v3 or later"
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
requirements:
|
||||
commands:
|
||||
- "ps"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/biobox"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v0.4.2"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "docker"
|
||||
run:
|
||||
- "multiqc --version | sed 's/multiqc, version\\s\\(.*\\)/multiqc: \"\\1\"/' >\
|
||||
\ /var/software_versions.txt\n"
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "jq"
|
||||
interactive: false
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/multiqc/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/multiqc"
|
||||
executable: "target/nextflow/multiqc/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "02b470d967226478af69c37eae2b1256be1b78fd"
|
||||
git_remote: "https://github.com/viash-hub/biobox"
|
||||
package_config:
|
||||
name: "biobox"
|
||||
version: "v0.4.2"
|
||||
summary: "A curated collection of high-quality, standalone bioinformatics components\
|
||||
\ built with [Viash](https://viash.io).\n"
|
||||
description: "`biobox` offers a suite of reliable bioinformatics components, similar\
|
||||
\ to [nf-core/modules](https://github.com/nf-core/modules) and [snakemake-wrappers/bio](https://github.com/snakemake/snakemake-wrappers/tree/master/bio),\
|
||||
\ but built using the [Viash](https://viash.io) framework.\n\nThis approach emphasizes\
|
||||
\ **reusability**, **reproducibility**, and adherence to **best practices**. Key\
|
||||
\ features of `biobox` components include:\n\n* **Standalone & Nextflow Ready:**\
|
||||
\ Run components directly via the command line or seamlessly integrate them into\
|
||||
\ Nextflow workflows.\n* **High Quality Standards:**\n * Comprehensive documentation\
|
||||
\ for components and parameters.\n * Full exposure of underlying tool arguments.\n\
|
||||
\ * Containerized (Docker) for dependency management and reproducibility.\n\
|
||||
\ * Unit tested for verified functionality.\n"
|
||||
info: null
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".requirements.commands += ['ps']\n"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v0.4.2'"
|
||||
keywords:
|
||||
- "bioinformatics"
|
||||
- "modules"
|
||||
- "sequencing"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/biobox"
|
||||
issue_tracker: "https://github.com/viash-hub/biobox/issues"
|
||||
4335
target/dependencies/vsh/vsh/biobox/v0.4.2/nextflow/multiqc/main.nf
Normal file
4335
target/dependencies/vsh/vsh/biobox/v0.4.2/nextflow/multiqc/main.nf
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'multiqc'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v0.4.2'
|
||||
description = 'MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It\'s a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n'
|
||||
author = 'Dorien Roosen'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,334 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "multiqc",
|
||||
"description": "MultiQC aggregates results from bioinformatics analyses across many samples into a single report.\nIt searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.\n",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"input": {
|
||||
"title": "Input",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "File paths to be searched for analysis results to be included in the report.\n",
|
||||
"help_text": "Type: `file`, multiple: `True`, required, direction: `input`, example: `[\"data/results\"]`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"ouput": {
|
||||
"title": "Ouput",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"output_report": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Filepath of the generated report.\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_report.html\"`, direction: `output`, example: `\"multiqc_report.html\"`. ",
|
||||
"default": "$id.$key.output_report.html"
|
||||
},
|
||||
"output_data": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Output directory for parsed data files",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_data\"`, direction: `output`, example: `\"multiqc_data\"`. ",
|
||||
"default": "$id.$key.output_data"
|
||||
},
|
||||
"output_plots": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Output directory for generated plots",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output_plots\"`, direction: `output`, example: `\"multiqc_plots\"`. ",
|
||||
"default": "$id.$key.output_plots"
|
||||
}
|
||||
}
|
||||
},
|
||||
"modules and analyses to run": {
|
||||
"title": "Modules and analyses to run",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"include_modules": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "Use only these module",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"fastqc\";\"cutadapt\"]`. "
|
||||
},
|
||||
"exclude_modules": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "Do not use only these modules",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"fastqc\";\"cutadapt\"]`. "
|
||||
},
|
||||
"ignore_analysis": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"run_one/*\";\"run_two/*\"]`. "
|
||||
},
|
||||
"ignore_samples": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"sample_1*\";\"sample_3*\"]`. "
|
||||
},
|
||||
"ignore_symlinks": {
|
||||
"type": "boolean",
|
||||
"description": "Ignore symlinked directories and files",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"sample name handling": {
|
||||
"title": "Sample name handling",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"dirs": {
|
||||
"type": "boolean",
|
||||
"description": "Prepend directory to sample names to avoid clashing filenames",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"dirs_depth": {
|
||||
"type": "integer",
|
||||
"description": "Prepend n directories to sample names",
|
||||
"help_text": "Type: `integer`, multiple: `False`. "
|
||||
},
|
||||
"full_names": {
|
||||
"type": "boolean",
|
||||
"description": "Do not clean the sample names (leave as full file name)",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"fn_as_s_name": {
|
||||
"type": "boolean",
|
||||
"description": "Use the log filename as the sample name",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"replace_names": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "TSV file to rename sample names during report generation",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"replace_names.tsv\"`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"report customisation": {
|
||||
"title": "Report Customisation",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"description": "Report title",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"comment": {
|
||||
"type": "string",
|
||||
"description": "Custom comment, will be printed at the top of the report.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"template": {
|
||||
"type": "string",
|
||||
"description": "Report template to use.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, choices: ``default`, `gathered`, `geo`, `highcharts`, `sections`, `simple``. ",
|
||||
"enum": [
|
||||
"default",
|
||||
"gathered",
|
||||
"geo",
|
||||
"highcharts",
|
||||
"sections",
|
||||
"simple"
|
||||
]
|
||||
},
|
||||
"sample_names": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "TSV file containing alternative sample names for renaming buttons in the report.\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"sample_names.tsv\"`. "
|
||||
},
|
||||
"sample_filters": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "TSV file containing show/hide patterns for the report\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"sample_filters.tsv\"`. "
|
||||
},
|
||||
"custom_css_file": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Custom CSS file to add to the final report\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, direction: `input`, example: `\"custom_style_sheet.css\"`. "
|
||||
},
|
||||
"profile_runtime": {
|
||||
"type": "boolean",
|
||||
"description": "Add analysis of how long MultiQC takes to run to the report\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"multiqc behaviour": {
|
||||
"title": "MultiQC behaviour",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"verbose": {
|
||||
"type": "boolean",
|
||||
"description": "Increase output verbosity.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"quiet": {
|
||||
"type": "boolean",
|
||||
"description": "Only show log warnings\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"strict": {
|
||||
"type": "boolean",
|
||||
"description": "Don't catch exceptions, run additional code checks to help development.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"development": {
|
||||
"type": "boolean",
|
||||
"description": "Development mode",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"require_logs": {
|
||||
"type": "boolean",
|
||||
"description": "Require all explicitly requested modules to have log files",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"no_megaqc_upload": {
|
||||
"type": "boolean",
|
||||
"description": "Don't upload generated report to MegaQC, even if MegaQC options are found.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"no_ansi": {
|
||||
"type": "boolean",
|
||||
"description": "Disable coloured log output.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"cl_config": {
|
||||
"type": "string",
|
||||
"description": "YAML formatted string that allows to customize MultiQC behaviour like input file detection.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"qualimap_config: { general_stats_coverage: [20,40,200] }\"`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"output format": {
|
||||
"title": "Output format",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"flat": {
|
||||
"type": "boolean",
|
||||
"description": "Use only flat plots (static images).\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"interactive": {
|
||||
"type": "boolean",
|
||||
"description": "Use only interactive plots (in-browser Javascript).\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"data_dir": {
|
||||
"type": "boolean",
|
||||
"description": "Force the parsed data directory to be created.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"no_data_dir": {
|
||||
"type": "boolean",
|
||||
"description": "Prevent the parsed data directory from being created.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"zip_data_dir": {
|
||||
"type": "boolean",
|
||||
"description": "Compress the data directory.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"data_format": {
|
||||
"type": "string",
|
||||
"description": "Output parsed data in a different format than the default 'txt'.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, choices: ``tsv`, `csv`, `json`, `yaml``. ",
|
||||
"enum": [
|
||||
"tsv",
|
||||
"csv",
|
||||
"json",
|
||||
"yaml"
|
||||
]
|
||||
},
|
||||
"pdf": {
|
||||
"type": "boolean",
|
||||
"description": "Creates PDF report with the 'simple' template",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/input"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/ouput"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/modules and analyses to run"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/sample name handling"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/report customisation"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/multiqc behaviour"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/output format"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,187 @@
|
||||
name: "move_files_to_directory"
|
||||
version: "v0.2.0"
|
||||
authors:
|
||||
- name: "Dorien Roosen"
|
||||
roles:
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
email: "dorien@data-intuitive.com"
|
||||
github: "dorien-er"
|
||||
linkedin: "dorien-roosen"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
description: "Paths of the files that will be copied into the output directory."
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
description: "Path to output directory"
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "bash_script"
|
||||
path: "script.sh"
|
||||
is_executable: true
|
||||
summary: "Publish one or multiple files to the same directory"
|
||||
description: "This component copies one or multiple files to the same destination\
|
||||
\ directory, creating the output directory if it doesn't exist."
|
||||
test_resources:
|
||||
- type: "bash_script"
|
||||
path: "test.sh"
|
||||
is_executable: true
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
requirements:
|
||||
commands:
|
||||
- "ps"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/craftbox"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "debian:latest"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v0.2.0"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/move_files_to_directory/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/move_files_to_directory"
|
||||
executable: "target/nextflow/move_files_to_directory/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "1c1b0a4a1aff891ab678072b0ba915ac3ac71610"
|
||||
git_remote: "https://github.com/viash-hub/craftbox"
|
||||
git_tag: "v0.1.0-8-g1c1b0a4"
|
||||
package_config:
|
||||
name: "craftbox"
|
||||
version: "v0.2.0"
|
||||
summary: "A collection of custom-tailored scripts and applied utilities built with\
|
||||
\ Viash.\n"
|
||||
description: "`craftbox` is a curated collection of custom scripts and utilities\
|
||||
\ designed to tackle context-specific tasks.\n\nEmphasizing the Viash principles,\
|
||||
\ `craftbox` components aim for **reusability**, **reproducibility**, and adherence\
|
||||
\ to **best practices**. Key features generally include:\n\n* **Standalone & Nextflow\
|
||||
\ Ready:** Components are built to run directly via the command line or be smoothly\
|
||||
\ integrated into Nextflow workflows.\n* **Custom Implementations:** Contains\
|
||||
\ scripts and tools developed for particular tasks that may not be found in broader\
|
||||
\ collections.\n* **High Quality Standards (promoted by Viash):**\n * Clear\
|
||||
\ documentation for components and their parameters.\n * Full exposure of underlying\
|
||||
\ script/tool arguments for fine-grained control.\n * Containerized (Docker)\
|
||||
\ to ensure dependency management and a consistent, reproducible runtime environment.\n\
|
||||
\ * Unit tested where applicable to ensure components function as expected.\n"
|
||||
info: null
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".requirements.commands := ['ps']\n"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v0.2.0'"
|
||||
keywords:
|
||||
- "scripts"
|
||||
- "custom"
|
||||
- "implementations"
|
||||
- "utilities"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/craftbox"
|
||||
issue_tracker: "https://github.com/viash-hub/craftbox/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'move_files_to_directory'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v0.2.0'
|
||||
description = 'This component copies one or multiple files to the same destination directory, creating the output directory if it doesn\'t exist.'
|
||||
author = 'Dorien Roosen'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema",
|
||||
"title": "move_files_to_directory",
|
||||
"description": "This component copies one or multiple files to the same destination directory, creating the output directory if it doesn\u0027t exist.",
|
||||
"type": "object",
|
||||
"definitions": {
|
||||
|
||||
|
||||
|
||||
"arguments" : {
|
||||
"title": "Arguments",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
|
||||
|
||||
"input": {
|
||||
"type":
|
||||
"string",
|
||||
"description": "Type: List of `file`, required, multiple_sep: `\";\"`. Paths of the files that will be copied into the output directory",
|
||||
"help_text": "Type: List of `file`, required, multiple_sep: `\";\"`. Paths of the files that will be copied into the output directory."
|
||||
|
||||
}
|
||||
|
||||
|
||||
,
|
||||
"output": {
|
||||
"type":
|
||||
"string",
|
||||
"description": "Type: `file`, required, default: `$id.$key.output`. Path to output directory",
|
||||
"help_text": "Type: `file`, required, default: `$id.$key.output`. Path to output directory"
|
||||
,
|
||||
"default":"$id.$key.output"
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
},
|
||||
|
||||
|
||||
"nextflow input-output arguments" : {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
|
||||
|
||||
"publish_dir": {
|
||||
"type":
|
||||
"string",
|
||||
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
|
||||
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
|
||||
|
||||
}
|
||||
|
||||
|
||||
,
|
||||
"param_list": {
|
||||
"type":
|
||||
"string",
|
||||
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
|
||||
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
|
||||
"hidden": true
|
||||
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
|
||||
{
|
||||
"$ref": "#/definitions/arguments"
|
||||
},
|
||||
|
||||
{
|
||||
"$ref": "#/definitions/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,659 @@
|
||||
name: "cellbender_remove_background"
|
||||
namespace: "correction"
|
||||
version: "v4.0.0"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Input h5mu file. Data file on which to run tool. Data must be un-filtered:\
|
||||
\ it should include empty droplets."
|
||||
info: null
|
||||
example:
|
||||
- "input.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "List of modalities to process."
|
||||
info: null
|
||||
default:
|
||||
- "rna"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "Full count matrix as an h5mu file, with background RNA removed.\
|
||||
\ This file contains all the original droplet barcodes."
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--layer_output"
|
||||
description: "Output layer"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_corrected"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_background_fraction"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_background_fraction"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_cell_probability"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_cell_probability"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_cell_size"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_cell_size"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_droplet_efficiency"
|
||||
description: "Name of the column in the .obs dataframe to store the droplet efficiencies\
|
||||
\ in.\n"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_droplet_efficiency"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_latent_scale"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_latent_scale"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--var_ambient_expression"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_ambient_expression"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obsm_gene_expression_encoding"
|
||||
info: null
|
||||
default:
|
||||
- "cellbender_gene_expression_encoding"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_compression"
|
||||
description: "Compression format to use for the output AnnData and/or Mudata objects.\n\
|
||||
By default no compression is applied.\n"
|
||||
info: null
|
||||
example:
|
||||
- "gzip"
|
||||
required: false
|
||||
choices:
|
||||
- "gzip"
|
||||
- "lzf"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "boolean"
|
||||
name: "--expected_cells_from_qc"
|
||||
description: "Will use the Cell Ranger QC to determine the estimated number of\
|
||||
\ cells"
|
||||
info: null
|
||||
default:
|
||||
- false
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--expected_cells"
|
||||
description: "Number of cells expected in the dataset (a rough estimate within\
|
||||
\ a factor of 2 is sufficient)."
|
||||
info: null
|
||||
example:
|
||||
- 1000
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--total_droplets_included"
|
||||
description: "The number of droplets from the rank-ordered UMI plot\nthat will\
|
||||
\ have their cell probabilities inferred as an\noutput. Include the droplets\
|
||||
\ which might contain cells.\nDroplets beyond TOTAL_DROPLETS_INCLUDED should\
|
||||
\ be\n'surely empty' droplets.\n"
|
||||
info: null
|
||||
example:
|
||||
- 25000
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--force_cell_umi_prior"
|
||||
description: "Ignore CellBender's heuristic prior estimation, and use this prior\
|
||||
\ for UMI counts in cells."
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--force_empty_umi_prior"
|
||||
description: "Ignore CellBender's heuristic prior estimation, and use this prior\
|
||||
\ for UMI counts in empty droplets."
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--model"
|
||||
description: "Which model is being used for count data.\n\n* 'naive' subtracts\
|
||||
\ the estimated ambient profile.\n* 'simple' does not model either ambient RNA\
|
||||
\ or random barcode swapping (for debugging purposes -- not recommended).\n\
|
||||
* 'ambient' assumes background RNA is incorporated into droplets.\n* 'swapping'\
|
||||
\ assumes background RNA comes from random barcode swapping (via PCR chimeras).\n\
|
||||
* 'full' uses a combined ambient and swapping model.\n"
|
||||
info: null
|
||||
default:
|
||||
- "full"
|
||||
required: false
|
||||
choices:
|
||||
- "naive"
|
||||
- "simple"
|
||||
- "ambient"
|
||||
- "swapping"
|
||||
- "full"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--epochs"
|
||||
description: "Number of epochs to train."
|
||||
info: null
|
||||
default:
|
||||
- 150
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--low_count_threshold"
|
||||
description: "Droplets with UMI counts below this number are completely \nexcluded\
|
||||
\ from the analysis. This can help identify the correct \nprior for empty droplet\
|
||||
\ counts in the rare case where empty \ncounts are extremely high (over 200).\n"
|
||||
info: null
|
||||
default:
|
||||
- 5
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--z_dim"
|
||||
description: "Dimension of latent variable z.\n"
|
||||
info: null
|
||||
default:
|
||||
- 64
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--z_layers"
|
||||
description: "Dimension of hidden layers in the encoder for z.\n"
|
||||
info: null
|
||||
default:
|
||||
- 512
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--training_fraction"
|
||||
description: "Training detail: the fraction of the data used for training.\nThe\
|
||||
\ rest is never seen by the inference algorithm. Speeds up learning.\n"
|
||||
info: null
|
||||
default:
|
||||
- 0.9
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--empty_drop_training_fraction"
|
||||
description: "Training detail: the fraction of the training data each epoch that\
|
||||
\ \nis drawn (randomly sampled) from surely empty droplets.\n"
|
||||
info: null
|
||||
default:
|
||||
- 0.2
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--ignore_features"
|
||||
description: "Integer indices of features to ignore entirely. In the output\n\
|
||||
count matrix, the counts for these features will be unchanged.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--fpr"
|
||||
description: "Target 'delta' false positive rate in [0, 1). Use 0 for a cohort\n\
|
||||
of samples which will be jointly analyzed for differential expression.\nA false\
|
||||
\ positive is a true signal count that is erroneously removed.\nMore background\
|
||||
\ removal is accompanied by more signal removal at\nhigh values of FPR. You\
|
||||
\ can specify multiple values, which will\ncreate multiple output files.\n"
|
||||
info: null
|
||||
default:
|
||||
- 0.01
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--exclude_feature_types"
|
||||
description: "Feature types to ignore during the analysis. These features will\n\
|
||||
be left unchanged in the output file.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--projected_ambient_count_threshold"
|
||||
description: "Controls how many features are included in the analysis, which\n\
|
||||
can lead to a large speedup. If a feature is expected to have less\nthan PROJECTED_AMBIENT_COUNT_THRESHOLD\
|
||||
\ counts total in all cells\n(summed), then that gene is excluded, and it will\
|
||||
\ be unchanged\nin the output count matrix. For example, \nPROJECTED_AMBIENT_COUNT_THRESHOLD\
|
||||
\ = 0 will include all features\nwhich have even a single count in any empty\
|
||||
\ droplet.\n"
|
||||
info: null
|
||||
default:
|
||||
- 0.1
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--learning_rate"
|
||||
description: "Training detail: lower learning rate for inference.\nA OneCycle\
|
||||
\ learning rate schedule is used, where the\nupper learning rate is ten times\
|
||||
\ this value. (For this\nvalue, probably do not exceed 1e-3).\n"
|
||||
info: null
|
||||
default:
|
||||
- 1.0E-4
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--final_elbo_fail_fraction"
|
||||
description: "Training is considered to have failed if \n(best_test_ELBO - final_test_ELBO)/(best_test_ELBO\
|
||||
\ - initial_test_ELBO) > FINAL_ELBO_FAIL_FRACTION.\nTraining will automatically\
|
||||
\ re-run if --num-training-tries > 1.\nBy default, will not fail training based\
|
||||
\ on final_training_ELBO.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--epoch_elbo_fail_fraction"
|
||||
description: "Training is considered to have failed if \n(previous_epoch_test_ELBO\
|
||||
\ - current_epoch_test_ELBO)/(previous_epoch_test_ELBO - initial_train_ELBO)\
|
||||
\ > EPOCH_ELBO_FAIL_FRACTION.\nTraining will automatically re-run if --num-training-tries\
|
||||
\ > 1.\nBy default, will not fail training based on epoch_training_ELBO.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--num_training_tries"
|
||||
description: "Number of times to attempt to train the model. At each subsequent\
|
||||
\ attempt,\nthe learning rate is multiplied by LEARNING_RATE_RETRY_MULT.\n"
|
||||
info: null
|
||||
default:
|
||||
- 1
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--learning_rate_retry_mult"
|
||||
description: "Learning rate is multiplied by this amount each time a new training\n\
|
||||
attempt is made. (This parameter is only used if training fails based\non EPOCH_ELBO_FAIL_FRACTION\
|
||||
\ or FINAL_ELBO_FAIL_FRACTION and\nNUM_TRAINING_TRIES is > 1.) \n"
|
||||
info: null
|
||||
default:
|
||||
- 0.2
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--posterior_batch_size"
|
||||
description: "Training detail: size of batches when creating the posterior.\n\
|
||||
Reduce this to avoid running out of GPU memory creating the posterior\n(will\
|
||||
\ be slower).\n"
|
||||
info: null
|
||||
default:
|
||||
- 128
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--posterior_regulation"
|
||||
description: "Posterior regularization method. (For experts: not required for\
|
||||
\ normal usage,\nsee documentation). \n\n* PRq is approximate quantile-targeting.\n\
|
||||
* PRmu is approximate mean-targeting aggregated over genes (behavior of v0.2.0).\n\
|
||||
* PRmu_gene is approximate mean-targeting per gene.\n"
|
||||
info: null
|
||||
required: false
|
||||
choices:
|
||||
- "PRq"
|
||||
- "PRmu"
|
||||
- "PRmu_gene"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--alpha"
|
||||
description: "Tunable parameter alpha for the PRq posterior regularization method\n\
|
||||
(not normally used: see documentation).\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "double"
|
||||
name: "--q"
|
||||
description: "Tunable parameter q for the CDF threshold estimation method (not\n\
|
||||
normally used: see documentation).\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--estimator"
|
||||
description: "Output denoised count estimation method. (For experts: not required\n\
|
||||
for normal usage, see documentation).\n"
|
||||
info: null
|
||||
default:
|
||||
- "mckp"
|
||||
required: false
|
||||
choices:
|
||||
- "map"
|
||||
- "mean"
|
||||
- "cdf"
|
||||
- "sample"
|
||||
- "mckp"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--estimator_multiple_cpu"
|
||||
description: "Including the flag --estimator-multiple-cpu will use more than one\n\
|
||||
CPU to compute the MCKP output count estimator in parallel (does nothing\nfor\
|
||||
\ other estimators).\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean"
|
||||
name: "--constant_learning_rate"
|
||||
description: "Including the flag --constant-learning-rate will use the ClippedAdam\n\
|
||||
optimizer instead of the OneCycleLR learning rate schedule, which is\nthe default.\
|
||||
\ Learning is faster with the OneCycleLR schedule.\nHowever, training can easily\
|
||||
\ be continued from a checkpoint for more\nepochs than the initial command specified\
|
||||
\ when using ClippedAdam. On\nthe other hand, if using the OneCycleLR schedule\
|
||||
\ with 150 epochs\nspecified, it is not possible to pick up from that final\
|
||||
\ checkpoint\nand continue training until 250 epochs.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--debug"
|
||||
description: "Including the flag --debug will log extra messages useful for debugging.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "boolean_true"
|
||||
name: "--cuda"
|
||||
description: "Including the flag --cuda will run the inference on a\nGPU.\n"
|
||||
info: null
|
||||
direction: "input"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Eliminating technical artifacts from high-throughput single-cell RNA\
|
||||
\ sequencing data.\n\nThis module removes counts due to ambient RNA molecules and\
|
||||
\ random barcode swapping from (raw) UMI-based scRNA-seq count matrices. \nAt the\
|
||||
\ moment, only the count matrices produced by the CellRanger count pipeline is supported.\
|
||||
\ Support for additional tools and protocols \nwill be added in the future. A quick\
|
||||
\ start tutorial can be found here.\n\nFleming et al. 2022, bioRxiv.\n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5mu"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "midcpu"
|
||||
- "midmem"
|
||||
- "gpu"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.0"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "docker"
|
||||
run:
|
||||
- "apt update && DEBIAN_FRONTEND=noninteractive apt install -y make build-essential\
|
||||
\ libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget ca-certificates\
|
||||
\ curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev\
|
||||
\ liblzma-dev mecab-ipadic-utf8 git \\\n&& curl https://pyenv.run | bash \\\n\
|
||||
&& pyenv update \\\n&& pyenv install $PYTHON_VERSION \\\n&& pyenv global $PYTHON_VERSION\
|
||||
\ \\\n&& apt-get clean\n"
|
||||
env:
|
||||
- "PYENV_ROOT=\"/root/.pyenv\""
|
||||
- "PATH=\"$PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH\""
|
||||
- "PYTHON_VERSION=3.7.16"
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "lxml~=4.8.0"
|
||||
- "mudata~=0.2.1"
|
||||
- "cellbender~=0.3.0"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/correction/cellbender_remove_background/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/correction/cellbender_remove_background"
|
||||
executable: "target/nextflow/correction/cellbender_remove_background/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "de02293c9e13198622b988dac952b2c8c70a1e35"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.0"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.0'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,125 @@
|
||||
manifest {
|
||||
name = 'correction/cellbender_remove_background'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.0'
|
||||
description = 'Eliminating technical artifacts from high-throughput single-cell RNA sequencing data.\n\nThis module removes counts due to ambient RNA molecules and random barcode swapping from (raw) UMI-based scRNA-seq count matrices. \nAt the moment, only the count matrices produced by the CellRanger count pipeline is supported. Support for additional tools and protocols \nwill be added in the future. A quick start tutorial can be found here.\n\nFleming et al. 2022, bioRxiv.\n'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,335 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "cellbender_remove_background",
|
||||
"description": "Eliminating technical artifacts from high-throughput single-cell RNA sequencing data.\n\nThis module removes counts due to ambient RNA molecules and random barcode swapping from (raw) UMI-based scRNA-seq count matrices. \nAt the moment, only the count matrices produced by the CellRanger count pipeline is supported. Support for additional tools and protocols \nwill be added in the future. A quick start tutorial can be found here.\n\nFleming et al. 2022, bioRxiv.\n",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"inputs": {
|
||||
"title": "Inputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "Input h5mu file",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"input.h5mu\"`. "
|
||||
},
|
||||
"modality": {
|
||||
"type": "string",
|
||||
"description": "List of modalities to process.",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"rna\"`. ",
|
||||
"default": "rna"
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": {
|
||||
"title": "Outputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"output": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Full count matrix as an h5mu file, with background RNA removed",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output.h5mu\"`, direction: `output`, example: `\"output.h5mu\"`. ",
|
||||
"default": "$id.$key.output.h5mu"
|
||||
},
|
||||
"layer_output": {
|
||||
"type": "string",
|
||||
"description": "Output layer",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_corrected\"`. ",
|
||||
"default": "cellbender_corrected"
|
||||
},
|
||||
"obs_background_fraction": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_background_fraction\"`. ",
|
||||
"default": "cellbender_background_fraction"
|
||||
},
|
||||
"obs_cell_probability": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_cell_probability\"`. ",
|
||||
"default": "cellbender_cell_probability"
|
||||
},
|
||||
"obs_cell_size": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_cell_size\"`. ",
|
||||
"default": "cellbender_cell_size"
|
||||
},
|
||||
"obs_droplet_efficiency": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in the .obs dataframe to store the droplet efficiencies in.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_droplet_efficiency\"`. ",
|
||||
"default": "cellbender_droplet_efficiency"
|
||||
},
|
||||
"obs_latent_scale": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_latent_scale\"`. ",
|
||||
"default": "cellbender_latent_scale"
|
||||
},
|
||||
"var_ambient_expression": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_ambient_expression\"`. ",
|
||||
"default": "cellbender_ambient_expression"
|
||||
},
|
||||
"obsm_gene_expression_encoding": {
|
||||
"type": "string",
|
||||
"description": "",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"cellbender_gene_expression_encoding\"`. ",
|
||||
"default": "cellbender_gene_expression_encoding"
|
||||
},
|
||||
"output_compression": {
|
||||
"type": "string",
|
||||
"description": "Compression format to use for the output AnnData and/or Mudata objects.\nBy default no compression is applied.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"gzip\"`, choices: ``gzip`, `lzf``. ",
|
||||
"enum": [
|
||||
"gzip",
|
||||
"lzf"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"arguments": {
|
||||
"title": "Arguments",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"expected_cells_from_qc": {
|
||||
"type": "boolean",
|
||||
"description": "Will use the Cell Ranger QC to determine the estimated number of cells",
|
||||
"help_text": "Type: `boolean`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"expected_cells": {
|
||||
"type": "integer",
|
||||
"description": "Number of cells expected in the dataset (a rough estimate within a factor of 2 is sufficient).",
|
||||
"help_text": "Type: `integer`, multiple: `False`, example: `1000`. "
|
||||
},
|
||||
"total_droplets_included": {
|
||||
"type": "integer",
|
||||
"description": "The number of droplets from the rank-ordered UMI plot\nthat will have their cell probabilities inferred as an\noutput",
|
||||
"help_text": "Type: `integer`, multiple: `False`, example: `25000`. "
|
||||
},
|
||||
"force_cell_umi_prior": {
|
||||
"type": "integer",
|
||||
"description": "Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in cells.",
|
||||
"help_text": "Type: `integer`, multiple: `False`. "
|
||||
},
|
||||
"force_empty_umi_prior": {
|
||||
"type": "integer",
|
||||
"description": "Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in empty droplets.",
|
||||
"help_text": "Type: `integer`, multiple: `False`. "
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": "Which model is being used for count data.\n\n* 'naive' subtracts the estimated ambient profile.\n* 'simple' does not model either ambient RNA or random barcode swapping (for debugging purposes -- not recommended).\n* 'ambient' assumes background RNA is incorporated into droplets.\n* 'swapping' assumes background RNA comes from random barcode swapping (via PCR chimeras).\n* 'full' uses a combined ambient and swapping model.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"full\"`, choices: ``naive`, `simple`, `ambient`, `swapping`, `full``. ",
|
||||
"enum": [
|
||||
"naive",
|
||||
"simple",
|
||||
"ambient",
|
||||
"swapping",
|
||||
"full"
|
||||
],
|
||||
"default": "full"
|
||||
},
|
||||
"epochs": {
|
||||
"type": "integer",
|
||||
"description": "Number of epochs to train.",
|
||||
"help_text": "Type: `integer`, multiple: `False`, default: `150`. ",
|
||||
"default": 150
|
||||
},
|
||||
"low_count_threshold": {
|
||||
"type": "integer",
|
||||
"description": "Droplets with UMI counts below this number are completely \nexcluded from the analysis",
|
||||
"help_text": "Type: `integer`, multiple: `False`, default: `5`. ",
|
||||
"default": 5
|
||||
},
|
||||
"z_dim": {
|
||||
"type": "integer",
|
||||
"description": "Dimension of latent variable z.\n",
|
||||
"help_text": "Type: `integer`, multiple: `False`, default: `64`. ",
|
||||
"default": 64
|
||||
},
|
||||
"z_layers": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "integer"
|
||||
},
|
||||
"description": "Dimension of hidden layers in the encoder for z.\n",
|
||||
"help_text": "Type: `integer`, multiple: `True`, default: `[512]`. ",
|
||||
"default": [
|
||||
512
|
||||
]
|
||||
},
|
||||
"training_fraction": {
|
||||
"type": "number",
|
||||
"description": "Training detail: the fraction of the data used for training.\nThe rest is never seen by the inference algorithm",
|
||||
"help_text": "Type: `double`, multiple: `False`, default: `0.9`. ",
|
||||
"default": 0.9
|
||||
},
|
||||
"empty_drop_training_fraction": {
|
||||
"type": "number",
|
||||
"description": "Training detail: the fraction of the training data each epoch that \nis drawn (randomly sampled) from surely empty droplets.\n",
|
||||
"help_text": "Type: `double`, multiple: `False`, default: `0.2`. ",
|
||||
"default": 0.2
|
||||
},
|
||||
"ignore_features": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "integer"
|
||||
},
|
||||
"description": "Integer indices of features to ignore entirely",
|
||||
"help_text": "Type: `integer`, multiple: `True`. "
|
||||
},
|
||||
"fpr": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "number"
|
||||
},
|
||||
"description": "Target 'delta' false positive rate in [0, 1)",
|
||||
"help_text": "Type: `double`, multiple: `True`, default: `[0.01]`. ",
|
||||
"default": [
|
||||
0.01
|
||||
]
|
||||
},
|
||||
"exclude_feature_types": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "Feature types to ignore during the analysis",
|
||||
"help_text": "Type: `string`, multiple: `True`. "
|
||||
},
|
||||
"projected_ambient_count_threshold": {
|
||||
"type": "number",
|
||||
"description": "Controls how many features are included in the analysis, which\ncan lead to a large speedup",
|
||||
"help_text": "Type: `double`, multiple: `False`, default: `0.1`. ",
|
||||
"default": 0.1
|
||||
},
|
||||
"learning_rate": {
|
||||
"type": "number",
|
||||
"description": "Training detail: lower learning rate for inference.\nA OneCycle learning rate schedule is used, where the\nupper learning rate is ten times this value",
|
||||
"help_text": "Type: `double`, multiple: `False`, default: `1.0E-4`. ",
|
||||
"default": 0.00010
|
||||
},
|
||||
"final_elbo_fail_fraction": {
|
||||
"type": "number",
|
||||
"description": "Training is considered to have failed if \n(best_test_ELBO - final_test_ELBO)/(best_test_ELBO - initial_test_ELBO) > FINAL_ELBO_FAIL_FRACTION.\nTraining will automatically re-run if --num-training-tries > 1.\nBy default, will not fail training based on final_training_ELBO.\n",
|
||||
"help_text": "Type: `double`, multiple: `False`. "
|
||||
},
|
||||
"epoch_elbo_fail_fraction": {
|
||||
"type": "number",
|
||||
"description": "Training is considered to have failed if \n(previous_epoch_test_ELBO - current_epoch_test_ELBO)/(previous_epoch_test_ELBO - initial_train_ELBO) > EPOCH_ELBO_FAIL_FRACTION.\nTraining will automatically re-run if --num-training-tries > 1.\nBy default, will not fail training based on epoch_training_ELBO.\n",
|
||||
"help_text": "Type: `double`, multiple: `False`. "
|
||||
},
|
||||
"num_training_tries": {
|
||||
"type": "integer",
|
||||
"description": "Number of times to attempt to train the model",
|
||||
"help_text": "Type: `integer`, multiple: `False`, default: `1`. ",
|
||||
"default": 1
|
||||
},
|
||||
"learning_rate_retry_mult": {
|
||||
"type": "number",
|
||||
"description": "Learning rate is multiplied by this amount each time a new training\nattempt is made",
|
||||
"help_text": "Type: `double`, multiple: `False`, default: `0.2`. ",
|
||||
"default": 0.2
|
||||
},
|
||||
"posterior_batch_size": {
|
||||
"type": "integer",
|
||||
"description": "Training detail: size of batches when creating the posterior.\nReduce this to avoid running out of GPU memory creating the posterior\n(will be slower).\n",
|
||||
"help_text": "Type: `integer`, multiple: `False`, default: `128`. ",
|
||||
"default": 128
|
||||
},
|
||||
"posterior_regulation": {
|
||||
"type": "string",
|
||||
"description": "Posterior regularization method",
|
||||
"help_text": "Type: `string`, multiple: `False`, choices: ``PRq`, `PRmu`, `PRmu_gene``. ",
|
||||
"enum": [
|
||||
"PRq",
|
||||
"PRmu",
|
||||
"PRmu_gene"
|
||||
]
|
||||
},
|
||||
"alpha": {
|
||||
"type": "number",
|
||||
"description": "Tunable parameter alpha for the PRq posterior regularization method\n(not normally used: see documentation).\n",
|
||||
"help_text": "Type: `double`, multiple: `False`. "
|
||||
},
|
||||
"q": {
|
||||
"type": "number",
|
||||
"description": "Tunable parameter q for the CDF threshold estimation method (not\nnormally used: see documentation).\n",
|
||||
"help_text": "Type: `double`, multiple: `False`. "
|
||||
},
|
||||
"estimator": {
|
||||
"type": "string",
|
||||
"description": "Output denoised count estimation method",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"mckp\"`, choices: ``map`, `mean`, `cdf`, `sample`, `mckp``. ",
|
||||
"enum": [
|
||||
"map",
|
||||
"mean",
|
||||
"cdf",
|
||||
"sample",
|
||||
"mckp"
|
||||
],
|
||||
"default": "mckp"
|
||||
},
|
||||
"estimator_multiple_cpu": {
|
||||
"type": "boolean",
|
||||
"description": "Including the flag --estimator-multiple-cpu will use more than one\nCPU to compute the MCKP output count estimator in parallel (does nothing\nfor other estimators).\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"constant_learning_rate": {
|
||||
"type": "boolean",
|
||||
"description": "Including the flag --constant-learning-rate will use the ClippedAdam\noptimizer instead of the OneCycleLR learning rate schedule, which is\nthe default",
|
||||
"help_text": "Type: `boolean`, multiple: `False`. "
|
||||
},
|
||||
"debug": {
|
||||
"type": "boolean",
|
||||
"description": "Including the flag --debug will log extra messages useful for debugging.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"cuda": {
|
||||
"type": "boolean",
|
||||
"description": "Including the flag --cuda will run the inference on a\nGPU.\n",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/inputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/outputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/arguments"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,274 @@
|
||||
name: "add_id"
|
||||
namespace: "metadata"
|
||||
version: "v4.0.0"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Path to the input .h5mu."
|
||||
info: null
|
||||
example:
|
||||
- "sample_path"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--input_id"
|
||||
description: "The input id."
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_output"
|
||||
description: "Name of the .obs column where to store the id."
|
||||
info: null
|
||||
default:
|
||||
- "sample_id"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "Name of output MuData file.\n"
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean_true"
|
||||
name: "--make_observation_keys_unique"
|
||||
description: "Join the id to the .obs index (.obs_names)."
|
||||
info: null
|
||||
direction: "input"
|
||||
- type: "string"
|
||||
name: "--output_compression"
|
||||
description: "Compression format to use for the output AnnData and/or Mudata objects.\n\
|
||||
By default no compression is applied.\n"
|
||||
info: null
|
||||
example:
|
||||
- "gzip"
|
||||
required: false
|
||||
choices:
|
||||
- "gzip"
|
||||
- "lzf"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Add id of .obs. Also allows to make .obs_names (the .obs index) unique\
|
||||
\ \nby prefixing the values with an unique id per .h5mu file.\n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "singlecpu"
|
||||
- "lowmem"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.11-slim"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.0"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "anndata~=0.12.7"
|
||||
- "awkward"
|
||||
- "mudata~=0.3.2"
|
||||
script:
|
||||
- "exec(\"try:\\n import zarr; from importlib.metadata import version\\nexcept\
|
||||
\ ModuleNotFoundError:\\n exit(0)\\nelse: assert int(version(\\\"zarr\\\"\
|
||||
).partition(\\\".\\\")[0]) > 2\")"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.8.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/metadata/add_id/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/metadata/add_id"
|
||||
executable: "target/nextflow/metadata/add_id/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "de02293c9e13198622b988dac952b2c8c70a1e35"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.0"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.0'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'metadata/add_id'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.0'
|
||||
description = 'Add id of .obs. Also allows to make .obs_names (the .obs index) unique \nby prefixing the values with an unique id per .h5mu file.\n'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,75 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "add_id",
|
||||
"description": "Add id of .obs. Also allows to make .obs_names (the .obs index) unique \nby prefixing the values with an unique id per .h5mu file.\n",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"arguments": {
|
||||
"title": "Arguments",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "Path to the input .h5mu.",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"sample_path\"`. "
|
||||
},
|
||||
"input_id": {
|
||||
"type": "string",
|
||||
"description": "The input id.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required. "
|
||||
},
|
||||
"obs_output": {
|
||||
"type": "string",
|
||||
"description": "Name of the .obs column where to store the id.",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"sample_id\"`. ",
|
||||
"default": "sample_id"
|
||||
},
|
||||
"output": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Name of output MuData file.\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output.h5mu\"`, direction: `output`, example: `\"output.h5mu\"`. ",
|
||||
"default": "$id.$key.output.h5mu"
|
||||
},
|
||||
"make_observation_keys_unique": {
|
||||
"type": "boolean",
|
||||
"description": "Join the id to the .obs index (.obs_names).",
|
||||
"help_text": "Type: `boolean_true`, multiple: `False`, default: `false`. ",
|
||||
"default": false
|
||||
},
|
||||
"output_compression": {
|
||||
"type": "string",
|
||||
"description": "Compression format to use for the output AnnData and/or Mudata objects.\nBy default no compression is applied.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"gzip\"`, choices: ``gzip`, `lzf``. ",
|
||||
"enum": [
|
||||
"gzip",
|
||||
"lzf"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/arguments"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,330 @@
|
||||
name: "grep_annotation_column"
|
||||
namespace: "metadata"
|
||||
version: "v4.0.0"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
description: "Arguments related to the input dataset."
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Path to the input .h5mu."
|
||||
info: null
|
||||
example:
|
||||
- "sample_path"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--input_column"
|
||||
description: "Column to query. If not specified, use .var_names or .obs_names,\
|
||||
\ depending on the value of --matrix"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--input_layer"
|
||||
description: "Input data to use when calculating fraction of observations that\
|
||||
\ match with the query. \nOnly used when --output_fraction_column is provided.\
|
||||
\ If not specified, .X is used.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Which modality to get the annotation matrix from.\n"
|
||||
info: null
|
||||
example:
|
||||
- "rna"
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--matrix"
|
||||
description: "Matrix to fetch the column from that will be searched."
|
||||
info: null
|
||||
example:
|
||||
- "var"
|
||||
required: false
|
||||
choices:
|
||||
- "var"
|
||||
- "obs"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
description: "Arguments related to how the output will be written."
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "Location of the output MuData file.\n"
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_match_column"
|
||||
description: "Name of the column to write the result to."
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_fraction_column"
|
||||
description: "For the opposite axis, name of the column to write the fraction\
|
||||
\ of \nobservations that matches to the pattern.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_compression"
|
||||
description: "Compression format to use for the output AnnData and/or Mudata objects.\n\
|
||||
By default no compression is applied.\n"
|
||||
info: null
|
||||
example:
|
||||
- "gzip"
|
||||
required: false
|
||||
choices:
|
||||
- "gzip"
|
||||
- "lzf"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Query options"
|
||||
description: "Options related to the query"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--regex_pattern"
|
||||
description: "Regex to use to match with the input column."
|
||||
info: null
|
||||
example:
|
||||
- "^[mM][tT]-"
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "compress_h5mu.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Perform a regex lookup on a column from the annotation matrices .obs\
|
||||
\ or .var.\nThe annotation matrix can originate from either a modality, or all modalities\
|
||||
\ (global .var or .obs).\n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "singlecpu"
|
||||
- "lowmem"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.11-slim"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.0"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "anndata~=0.12.7"
|
||||
- "awkward"
|
||||
- "mudata~=0.3.2"
|
||||
script:
|
||||
- "exec(\"try:\\n import zarr; from importlib.metadata import version\\nexcept\
|
||||
\ ModuleNotFoundError:\\n exit(0)\\nelse: assert int(version(\\\"zarr\\\"\
|
||||
).partition(\\\".\\\")[0]) > 2\")"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.8.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/metadata/grep_annotation_column/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/metadata/grep_annotation_column"
|
||||
executable: "target/nextflow/metadata/grep_annotation_column/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "de02293c9e13198622b988dac952b2c8c70a1e35"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.0"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.0'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
@@ -0,0 +1,87 @@
|
||||
import shutil
|
||||
from anndata import AnnData
|
||||
from mudata import write_h5ad
|
||||
from h5py import File as H5File
|
||||
from h5py import Group, Dataset
|
||||
from pathlib import Path
|
||||
from typing import Union, Literal
|
||||
from functools import partial
|
||||
|
||||
|
||||
def compress_h5mu(
|
||||
input_path: Union[str, Path],
|
||||
output_path: Union[str, Path],
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
):
|
||||
input_path, output_path = str(input_path), str(output_path)
|
||||
|
||||
def copy_attributes(in_object, out_object):
|
||||
for key, value in in_object.attrs.items():
|
||||
out_object.attrs[key] = value
|
||||
|
||||
def visit_path(
|
||||
output_h5: H5File,
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
name: str,
|
||||
object: Union[Group, Dataset],
|
||||
):
|
||||
if isinstance(object, Group):
|
||||
new_group = output_h5.create_group(name)
|
||||
copy_attributes(object, new_group)
|
||||
elif isinstance(object, Dataset):
|
||||
# Compression only works for non-scalar Dataset objects
|
||||
# Scalar objects dont have a shape defined
|
||||
if not object.compression and object.shape not in [None, ()]:
|
||||
new_dataset = output_h5.create_dataset(
|
||||
name, data=object, compression=compression
|
||||
)
|
||||
copy_attributes(object, new_dataset)
|
||||
else:
|
||||
output_h5.copy(object, name)
|
||||
else:
|
||||
raise NotImplementedError(
|
||||
f"Could not copy element {name}, "
|
||||
f"type has not been implemented yet: {type(object)}"
|
||||
)
|
||||
|
||||
with (
|
||||
H5File(input_path, "r") as input_h5,
|
||||
H5File(output_path, "w", userblock_size=512) as output_h5,
|
||||
):
|
||||
copy_attributes(input_h5, output_h5)
|
||||
input_h5.visititems(partial(visit_path, output_h5, compression))
|
||||
|
||||
with open(input_path, "rb") as input_bytes:
|
||||
# Mudata puts metadata like this in the first 512 bytes:
|
||||
# MuData (format-version=0.1.0;creator=muon;creator-version=0.2.0)
|
||||
# See mudata/_core/io.py, read_h5mu() function
|
||||
starting_metadata = input_bytes.read(100)
|
||||
# The metadata is padded with extra null bytes up until 512 bytes
|
||||
truncate_location = starting_metadata.find(b"\x00")
|
||||
starting_metadata = starting_metadata[:truncate_location]
|
||||
with open(output_path, "br+") as f:
|
||||
nbytes = f.write(starting_metadata)
|
||||
f.write(b"\0" * (512 - nbytes))
|
||||
|
||||
|
||||
def write_h5ad_to_h5mu_with_compression(
|
||||
output_file: Union[str, Path],
|
||||
h5mu: Union[str, Path],
|
||||
modality_name: str,
|
||||
modality_data: AnnData,
|
||||
output_compression=None,
|
||||
):
|
||||
output_file = Path(output_file)
|
||||
h5mu = Path(h5mu)
|
||||
output_file_uncompressed = (
|
||||
output_file.with_name(output_file.stem + "_uncompressed.h5mu")
|
||||
if output_compression
|
||||
else output_file
|
||||
)
|
||||
shutil.copyfile(h5mu, output_file_uncompressed)
|
||||
write_h5ad(filename=output_file_uncompressed, mod=modality_name, data=modality_data)
|
||||
if output_compression:
|
||||
compress_h5mu(
|
||||
output_file_uncompressed, output_file, compression=output_compression
|
||||
)
|
||||
output_file_uncompressed.unlink()
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'metadata/grep_annotation_column'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.0'
|
||||
description = 'Perform a regex lookup on a column from the annotation matrices .obs or .var.\nThe annotation matrix can originate from either a modality, or all modalities (global .var or .obs).\n'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,117 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "grep_annotation_column",
|
||||
"description": "Perform a regex lookup on a column from the annotation matrices .obs or .var.\nThe annotation matrix can originate from either a modality, or all modalities (global .var or .obs).\n",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"inputs": {
|
||||
"title": "Inputs",
|
||||
"type": "object",
|
||||
"description": "Arguments related to the input dataset.",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "Path to the input .h5mu.",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"sample_path\"`. "
|
||||
},
|
||||
"input_column": {
|
||||
"type": "string",
|
||||
"description": "Column to query",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"input_layer": {
|
||||
"type": "string",
|
||||
"description": "Input data to use when calculating fraction of observations that match with the query",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"modality": {
|
||||
"type": "string",
|
||||
"description": "Which modality to get the annotation matrix from.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"rna\"`. "
|
||||
},
|
||||
"matrix": {
|
||||
"type": "string",
|
||||
"description": "Matrix to fetch the column from that will be searched.",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"var\"`, choices: ``var`, `obs``. ",
|
||||
"enum": [
|
||||
"var",
|
||||
"obs"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": {
|
||||
"title": "Outputs",
|
||||
"type": "object",
|
||||
"description": "Arguments related to how the output will be written.",
|
||||
"properties": {
|
||||
"output": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Location of the output MuData file.\n",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output.h5mu\"`, direction: `output`, example: `\"output.h5mu\"`. ",
|
||||
"default": "$id.$key.output.h5mu"
|
||||
},
|
||||
"output_match_column": {
|
||||
"type": "string",
|
||||
"description": "Name of the column to write the result to.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required. "
|
||||
},
|
||||
"output_fraction_column": {
|
||||
"type": "string",
|
||||
"description": "For the opposite axis, name of the column to write the fraction of \nobservations that matches to the pattern.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"output_compression": {
|
||||
"type": "string",
|
||||
"description": "Compression format to use for the output AnnData and/or Mudata objects.\nBy default no compression is applied.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"gzip\"`, choices: ``gzip`, `lzf``. ",
|
||||
"enum": [
|
||||
"gzip",
|
||||
"lzf"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"query options": {
|
||||
"title": "Query options",
|
||||
"type": "object",
|
||||
"description": "Options related to the query",
|
||||
"properties": {
|
||||
"regex_pattern": {
|
||||
"type": "string",
|
||||
"description": "Regex to use to match with the input column.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"^[mM][tT]-\"`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/inputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/outputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/query options"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,390 @@
|
||||
name: "calculate_qc_metrics"
|
||||
namespace: "qc"
|
||||
version: "v4.0.0"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
description: "Input h5mu file"
|
||||
info: null
|
||||
example:
|
||||
- "input.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Which modality from the input MuData file to process. \n"
|
||||
info: null
|
||||
default:
|
||||
- "rna"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--layer"
|
||||
description: "Layer from modality to use as input data. If not provided the .X\
|
||||
\ attribute is used.\n"
|
||||
info: null
|
||||
example:
|
||||
- "raw_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Metrics added to .obs"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--var_qc_metrics"
|
||||
description: "Keys to select a boolean (containing only True or False) column\
|
||||
\ from .var.\nFor each cell, calculate the proportion of total values for genes\
|
||||
\ which are labeled 'True', \ncompared to the total sum of the values for all\
|
||||
\ genes.\n"
|
||||
info: null
|
||||
example:
|
||||
- "ercc,highly_variable,mitochondrial"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--var_qc_metrics_fill_na_value"
|
||||
description: "Fill any 'NA' values found in the columns specified with --var_qc_metrics\
|
||||
\ to 'True' or 'False'.\nas False.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--top_n_vars"
|
||||
description: "Number of top vars to be used to calculate cumulative proportions.\n\
|
||||
If not specified, proportions are not calculated. `--top_n_vars 20;50` finds\n\
|
||||
cumulative proportion to the 20th and 50th most expressed vars.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_obs_num_nonzero_vars"
|
||||
description: "Name of column in .obs describing, for each observation, the number\
|
||||
\ of stored values\n(including explicit zeroes). In other words, the name of\
|
||||
\ the column that counts\nfor each row the number of columns that contain data.\n"
|
||||
info: null
|
||||
default:
|
||||
- "num_nonzero_vars"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_obs_total_counts_vars"
|
||||
description: "Name of the column for .obs describing, for each observation (row),\n\
|
||||
the sum of the stored values in the columns.\n"
|
||||
info: null
|
||||
default:
|
||||
- "total_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Metrics added to .var"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--output_var_num_nonzero_obs"
|
||||
description: "Name of column describing, for each feature, the number of stored\
|
||||
\ values\n(including explicit zeroes). In other words, the name of the column\
|
||||
\ that counts\nfor each column the number of rows that contain data.\n"
|
||||
info: null
|
||||
default:
|
||||
- "num_nonzero_obs"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_total_counts_obs"
|
||||
description: "Name of the column in .var describing, for each feature (column),\n\
|
||||
the sum of the stored values in the rows.\n"
|
||||
info: null
|
||||
default:
|
||||
- "total_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_obs_mean"
|
||||
description: "Name of the column in .obs providing the mean of the values in each\
|
||||
\ row.\n"
|
||||
info: null
|
||||
default:
|
||||
- "obs_mean"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_pct_dropout"
|
||||
description: "Name of the column in .obs providing for each feature the percentage\
|
||||
\ of\nobservations the feature does not appear on (i.e. is missing). Same as\
|
||||
\ `--num_nonzero_obs`\nbut percentage based.\n"
|
||||
info: null
|
||||
default:
|
||||
- "pct_dropout"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
description: "Output h5mu file."
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_compression"
|
||||
description: "Compression format to use for the output AnnData and/or Mudata objects.\n\
|
||||
By default no compression is applied.\n"
|
||||
info: null
|
||||
example:
|
||||
- "gzip"
|
||||
required: false
|
||||
choices:
|
||||
- "gzip"
|
||||
- "lzf"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "compress_h5mu.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Add basic quality control metrics to an .h5mu file.\n\nThe metrics are\
|
||||
\ comparable to what scanpy.pp.calculate_qc_metrics output,\nalthough they have\
|
||||
\ slightly different names:\n\nVar metrics (name in this component -> name in scanpy):\n\
|
||||
\ - pct_dropout -> pct_dropout_by_{expr_type}\n - num_nonzero_obs -> n_cells_by_{expr_type}\n\
|
||||
\ - obs_mean -> mean_{expr_type}\n - total_counts -> total_{expr_type}\n\n Obs\
|
||||
\ metrics:\n - num_nonzero_vars -> n_genes_by_{expr_type}\n - pct_{var_qc_metrics}\
|
||||
\ -> pct_{expr_type}_{qc_var}\n - total_counts_{var_qc_metrics} -> total_{expr_type}_{qc_var}\n\
|
||||
\ - pct_of_counts_in_top_{top_n_vars}_vars -> pct_{expr_type}_in_top_{n}_{var_type}\n\
|
||||
\ - total_counts -> total_{expr_type}\n \n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5mu"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "singlecpu"
|
||||
- "midmem"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.11-slim"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.0"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "anndata~=0.12.7"
|
||||
- "awkward"
|
||||
- "mudata~=0.3.2"
|
||||
- "scipy"
|
||||
script:
|
||||
- "exec(\"try:\\n import zarr; from importlib.metadata import version\\nexcept\
|
||||
\ ModuleNotFoundError:\\n exit(0)\\nelse: assert int(version(\\\"zarr\\\"\
|
||||
).partition(\\\".\\\")[0]) > 2\")"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.8.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "scanpy"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/qc/calculate_qc_metrics/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/nextflow/qc/calculate_qc_metrics"
|
||||
executable: "target/nextflow/qc/calculate_qc_metrics/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "de02293c9e13198622b988dac952b2c8c70a1e35"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.0"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.0'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
@@ -0,0 +1,87 @@
|
||||
import shutil
|
||||
from anndata import AnnData
|
||||
from mudata import write_h5ad
|
||||
from h5py import File as H5File
|
||||
from h5py import Group, Dataset
|
||||
from pathlib import Path
|
||||
from typing import Union, Literal
|
||||
from functools import partial
|
||||
|
||||
|
||||
def compress_h5mu(
|
||||
input_path: Union[str, Path],
|
||||
output_path: Union[str, Path],
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
):
|
||||
input_path, output_path = str(input_path), str(output_path)
|
||||
|
||||
def copy_attributes(in_object, out_object):
|
||||
for key, value in in_object.attrs.items():
|
||||
out_object.attrs[key] = value
|
||||
|
||||
def visit_path(
|
||||
output_h5: H5File,
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
name: str,
|
||||
object: Union[Group, Dataset],
|
||||
):
|
||||
if isinstance(object, Group):
|
||||
new_group = output_h5.create_group(name)
|
||||
copy_attributes(object, new_group)
|
||||
elif isinstance(object, Dataset):
|
||||
# Compression only works for non-scalar Dataset objects
|
||||
# Scalar objects dont have a shape defined
|
||||
if not object.compression and object.shape not in [None, ()]:
|
||||
new_dataset = output_h5.create_dataset(
|
||||
name, data=object, compression=compression
|
||||
)
|
||||
copy_attributes(object, new_dataset)
|
||||
else:
|
||||
output_h5.copy(object, name)
|
||||
else:
|
||||
raise NotImplementedError(
|
||||
f"Could not copy element {name}, "
|
||||
f"type has not been implemented yet: {type(object)}"
|
||||
)
|
||||
|
||||
with (
|
||||
H5File(input_path, "r") as input_h5,
|
||||
H5File(output_path, "w", userblock_size=512) as output_h5,
|
||||
):
|
||||
copy_attributes(input_h5, output_h5)
|
||||
input_h5.visititems(partial(visit_path, output_h5, compression))
|
||||
|
||||
with open(input_path, "rb") as input_bytes:
|
||||
# Mudata puts metadata like this in the first 512 bytes:
|
||||
# MuData (format-version=0.1.0;creator=muon;creator-version=0.2.0)
|
||||
# See mudata/_core/io.py, read_h5mu() function
|
||||
starting_metadata = input_bytes.read(100)
|
||||
# The metadata is padded with extra null bytes up until 512 bytes
|
||||
truncate_location = starting_metadata.find(b"\x00")
|
||||
starting_metadata = starting_metadata[:truncate_location]
|
||||
with open(output_path, "br+") as f:
|
||||
nbytes = f.write(starting_metadata)
|
||||
f.write(b"\0" * (512 - nbytes))
|
||||
|
||||
|
||||
def write_h5ad_to_h5mu_with_compression(
|
||||
output_file: Union[str, Path],
|
||||
h5mu: Union[str, Path],
|
||||
modality_name: str,
|
||||
modality_data: AnnData,
|
||||
output_compression=None,
|
||||
):
|
||||
output_file = Path(output_file)
|
||||
h5mu = Path(h5mu)
|
||||
output_file_uncompressed = (
|
||||
output_file.with_name(output_file.stem + "_uncompressed.h5mu")
|
||||
if output_compression
|
||||
else output_file
|
||||
)
|
||||
shutil.copyfile(h5mu, output_file_uncompressed)
|
||||
write_h5ad(filename=output_file_uncompressed, mod=modality_name, data=modality_data)
|
||||
if output_compression:
|
||||
compress_h5mu(
|
||||
output_file_uncompressed, output_file, compression=output_compression
|
||||
)
|
||||
output_file_uncompressed.unlink()
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'qc/calculate_qc_metrics'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.0'
|
||||
description = 'Add basic quality control metrics to an .h5mu file.\n\nThe metrics are comparable to what scanpy.pp.calculate_qc_metrics output,\nalthough they have slightly different names:\n\nVar metrics (name in this component -> name in scanpy):\n - pct_dropout -> pct_dropout_by_{expr_type}\n - num_nonzero_obs -> n_cells_by_{expr_type}\n - obs_mean -> mean_{expr_type}\n - total_counts -> total_{expr_type}\n\n Obs metrics:\n - num_nonzero_vars -> n_genes_by_{expr_type}\n - pct_{var_qc_metrics} -> pct_{expr_type}_{qc_var}\n - total_counts_{var_qc_metrics} -> total_{expr_type}_{qc_var}\n - pct_of_counts_in_top_{top_n_vars}_vars -> pct_{expr_type}_in_top_{n}_{var_type}\n - total_counts -> total_{expr_type}\n \n'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,156 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "calculate_qc_metrics",
|
||||
"description": "Add basic quality control metrics to an .h5mu file.\n\nThe metrics are comparable to what scanpy.pp.calculate_qc_metrics output,\nalthough they have slightly different names:\n\nVar metrics (name in this component -> name in scanpy):\n - pct_dropout -> pct_dropout_by_{expr_type}\n - num_nonzero_obs -> n_cells_by_{expr_type}\n - obs_mean -> mean_{expr_type}\n - total_counts -> total_{expr_type}\n\n Obs metrics:\n - num_nonzero_vars -> n_genes_by_{expr_type}\n - pct_{var_qc_metrics} -> pct_{expr_type}_{qc_var}\n - total_counts_{var_qc_metrics} -> total_{expr_type}_{qc_var}\n - pct_of_counts_in_top_{top_n_vars}_vars -> pct_{expr_type}_in_top_{n}_{var_type}\n - total_counts -> total_{expr_type}\n \n",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"inputs": {
|
||||
"title": "Inputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"input": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "Input h5mu file",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"input.h5mu\"`. "
|
||||
},
|
||||
"modality": {
|
||||
"type": "string",
|
||||
"description": "Which modality from the input MuData file to process",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"rna\"`. ",
|
||||
"default": "rna"
|
||||
},
|
||||
"layer": {
|
||||
"type": "string",
|
||||
"description": "Layer from modality to use as input data",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"raw_counts\"`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": {
|
||||
"title": "Outputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"output": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Output h5mu file.",
|
||||
"help_text": "Type: `file`, multiple: `False`, default: `\"$id.$key.output.h5mu\"`, direction: `output`, example: `\"output.h5mu\"`. ",
|
||||
"default": "$id.$key.output.h5mu"
|
||||
},
|
||||
"output_compression": {
|
||||
"type": "string",
|
||||
"description": "Compression format to use for the output AnnData and/or Mudata objects.\nBy default no compression is applied.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"gzip\"`, choices: ``gzip`, `lzf``. ",
|
||||
"enum": [
|
||||
"gzip",
|
||||
"lzf"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"metrics added to .obs": {
|
||||
"title": "Metrics added to .obs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"var_qc_metrics": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "Keys to select a boolean (containing only True or False) column from .var.\nFor each cell, calculate the proportion of total values for genes which are labeled 'True', \ncompared to the total sum of the values for all genes.\n",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"ercc,highly_variable,mitochondrial\"]`. "
|
||||
},
|
||||
"var_qc_metrics_fill_na_value": {
|
||||
"type": "boolean",
|
||||
"description": "Fill any 'NA' values found in the columns specified with --var_qc_metrics to 'True' or 'False'.\nas False.\n",
|
||||
"help_text": "Type: `boolean`, multiple: `False`. "
|
||||
},
|
||||
"top_n_vars": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "integer"
|
||||
},
|
||||
"description": "Number of top vars to be used to calculate cumulative proportions.\nIf not specified, proportions are not calculated",
|
||||
"help_text": "Type: `integer`, multiple: `True`. "
|
||||
},
|
||||
"output_obs_num_nonzero_vars": {
|
||||
"type": "string",
|
||||
"description": "Name of column in .obs describing, for each observation, the number of stored values\n(including explicit zeroes)",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"num_nonzero_vars\"`. ",
|
||||
"default": "num_nonzero_vars"
|
||||
},
|
||||
"output_obs_total_counts_vars": {
|
||||
"type": "string",
|
||||
"description": "Name of the column for .obs describing, for each observation (row),\nthe sum of the stored values in the columns.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"total_counts\"`. ",
|
||||
"default": "total_counts"
|
||||
}
|
||||
}
|
||||
},
|
||||
"metrics added to .var": {
|
||||
"title": "Metrics added to .var",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"output_var_num_nonzero_obs": {
|
||||
"type": "string",
|
||||
"description": "Name of column describing, for each feature, the number of stored values\n(including explicit zeroes)",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"num_nonzero_obs\"`. ",
|
||||
"default": "num_nonzero_obs"
|
||||
},
|
||||
"output_var_total_counts_obs": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .var describing, for each feature (column),\nthe sum of the stored values in the rows.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"total_counts\"`. ",
|
||||
"default": "total_counts"
|
||||
},
|
||||
"output_var_obs_mean": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .obs providing the mean of the values in each row.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"obs_mean\"`. ",
|
||||
"default": "obs_mean"
|
||||
},
|
||||
"output_var_pct_dropout": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .obs providing for each feature the percentage of\nobservations the feature does not appear on (i.e",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"pct_dropout\"`. ",
|
||||
"default": "pct_dropout"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/inputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/outputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/metrics added to .obs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/metrics added to .var"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,415 @@
|
||||
name: "qc"
|
||||
namespace: "workflows/qc"
|
||||
version: "v4.0.0"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--id"
|
||||
description: "ID of the sample."
|
||||
info: null
|
||||
example:
|
||||
- "foo"
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Path to the sample."
|
||||
info: null
|
||||
example:
|
||||
- "input.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Which modality to process."
|
||||
info: null
|
||||
default:
|
||||
- "rna"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--layer"
|
||||
description: "Layer to calculate qc metrics for."
|
||||
info: null
|
||||
example:
|
||||
- "raw_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Mitochondrial & Ribosomal Gene Detection"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--var_gene_names"
|
||||
description: ".var column name to be used to detect mitochondrial/ribosomal genes\
|
||||
\ instead of .var_names (default if not set).\nGene names matching with the\
|
||||
\ regex value from --mitochondrial_gene_regex or --ribosomal_gene_regex will\
|
||||
\ be \nidentified as mitochondrial or ribosomal genes, respectively.\n"
|
||||
info: null
|
||||
example:
|
||||
- "gene_symbol"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--var_name_mitochondrial_genes"
|
||||
description: "In which .var slot to store a boolean array corresponding the mitochondrial\
|
||||
\ genes.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_name_mitochondrial_fraction"
|
||||
description: ".Obs slot to store the fraction of reads found to be mitochondrial.\
|
||||
\ Defaults to 'fraction_' suffixed by the value of --var_name_mitochondrial_genes\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--mitochondrial_gene_regex"
|
||||
description: "Regex string that identifies mitochondrial genes from --var_gene_names.\n\
|
||||
By default will detect human and mouse mitochondrial genes from a gene symbol.\n"
|
||||
info: null
|
||||
default:
|
||||
- "^[mM][tT]-"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--var_name_ribosomal_genes"
|
||||
description: "In which .var slot to store a boolean array corresponding the ribosomal\
|
||||
\ genes.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_name_ribosomal_fraction"
|
||||
description: "When specified, write the fraction of counts originating from ribosomal\
|
||||
\ genes \n(based on --ribosomal_gene_regex) to an .obs column with the specified\
|
||||
\ name.\nRequires --var_name_ribosomal_genes.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--ribosomal_gene_regex"
|
||||
description: "Regex string that identifies ribosomal genes from --var_gene_names.\n\
|
||||
By default will detect human and mouse ribosomal genes from a gene symbol.\n"
|
||||
info: null
|
||||
default:
|
||||
- "^[Mm]?[Rr][Pp][LlSs]"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "QC metrics calculation options"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--var_qc_metrics"
|
||||
description: "Keys to select a boolean (containing only True or False) column\
|
||||
\ from .var.\nFor each cell, calculate the proportion of total values for genes\
|
||||
\ which are labeled 'True', \ncompared to the total sum of the values for all\
|
||||
\ genes. Defaults to the value from\n--var_name_mitochondrial_genes.\n"
|
||||
info: null
|
||||
example:
|
||||
- "ercc,highly_variable"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- type: "integer"
|
||||
name: "--top_n_vars"
|
||||
description: "Number of top vars to be used to calculate cumulative proportions.\n\
|
||||
If not specified, proportions are not calculated. `--top_n_vars 20,50` finds\n\
|
||||
cumulative proportion to the 20th and 50th most expressed vars.\n"
|
||||
info: null
|
||||
default:
|
||||
- 50
|
||||
- 100
|
||||
- 200
|
||||
- 500
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- type: "string"
|
||||
name: "--output_obs_num_nonzero_vars"
|
||||
description: "Name of column in .obs describing, for each observation, the number\
|
||||
\ of stored values\n(including explicit zeroes). In other words, the name of\
|
||||
\ the column that counts\nfor each row the number of columns that contain data.\n"
|
||||
info: null
|
||||
default:
|
||||
- "num_nonzero_vars"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_obs_total_counts_vars"
|
||||
description: "Name of the column for .obs describing, for each observation (row),\n\
|
||||
the sum of the stored values in the columns.\n"
|
||||
info: null
|
||||
default:
|
||||
- "total_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_num_nonzero_obs"
|
||||
description: "Name of column describing, for each feature, the number of stored\
|
||||
\ values\n(including explicit zeroes). In other words, the name of the column\
|
||||
\ that counts\nfor each column the number of rows that contain data.\n"
|
||||
info: null
|
||||
default:
|
||||
- "num_nonzero_obs"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_total_counts_obs"
|
||||
description: "Name of the column in .var describing, for each feature (column),\n\
|
||||
the sum of the stored values in the rows.\n"
|
||||
info: null
|
||||
default:
|
||||
- "total_counts"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_obs_mean"
|
||||
description: "Name of the column in .obs providing the mean of the values in each\
|
||||
\ row.\n"
|
||||
info: null
|
||||
default:
|
||||
- "obs_mean"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_var_pct_dropout"
|
||||
description: "Name of the column in .obs providing for each feature the percentage\
|
||||
\ of\nobservations the feature does not appear on (i.e. is missing). Same as\
|
||||
\ `--output_var_num_nonzero_obs`\nbut percentage based.\n"
|
||||
info: null
|
||||
default:
|
||||
- "pct_dropout"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
description: "Destination path to the output."
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "nextflow_script"
|
||||
path: "main.nf"
|
||||
is_executable: true
|
||||
entrypoint: "run_wf"
|
||||
- type: "file"
|
||||
path: "utils"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "A pipeline to add basic qc statistics to a MuData "
|
||||
test_resources:
|
||||
- type: "nextflow_script"
|
||||
path: "test.nf"
|
||||
is_executable: true
|
||||
entrypoint: "test_wf"
|
||||
- type: "file"
|
||||
path: "concat_test_data"
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3"
|
||||
info:
|
||||
test_dependencies:
|
||||
- name: "qc_test"
|
||||
namespace: "test_workflows/qc"
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "public"
|
||||
target: "public"
|
||||
dependencies:
|
||||
- name: "metadata/grep_annotation_column"
|
||||
repository:
|
||||
type: "local"
|
||||
- name: "qc/calculate_qc_metrics"
|
||||
repository:
|
||||
type: "local"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/workflows/qc/qc/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "native"
|
||||
output: "target/nextflow/workflows/qc/qc"
|
||||
executable: "target/nextflow/workflows/qc/qc/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "de02293c9e13198622b988dac952b2c8c70a1e35"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
dependencies:
|
||||
- "target/nextflow/metadata/grep_annotation_column"
|
||||
- "target/nextflow/qc/calculate_qc_metrics"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.0"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.0'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'workflows/qc/qc'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.0'
|
||||
description = 'A pipeline to add basic qc statistics to a MuData '
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,190 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"title": "qc",
|
||||
"description": "A pipeline to add basic qc statistics to a MuData ",
|
||||
"type": "object",
|
||||
"$defs": {
|
||||
"inputs": {
|
||||
"title": "Inputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string",
|
||||
"description": "ID of the sample.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"foo\"`. "
|
||||
},
|
||||
"input": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"exists": true,
|
||||
"description": "Path to the sample.",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, direction: `input`, example: `\"input.h5mu\"`. "
|
||||
},
|
||||
"modality": {
|
||||
"type": "string",
|
||||
"description": "Which modality to process.",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"rna\"`. ",
|
||||
"default": "rna"
|
||||
},
|
||||
"layer": {
|
||||
"type": "string",
|
||||
"description": "Layer to calculate qc metrics for.",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"raw_counts\"`. "
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": {
|
||||
"title": "Outputs",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"output": {
|
||||
"type": "string",
|
||||
"format": "path",
|
||||
"description": "Destination path to the output.",
|
||||
"help_text": "Type: `file`, multiple: `False`, required, default: `\"$id.$key.output.h5mu\"`, direction: `output`, example: `\"output.h5mu\"`. ",
|
||||
"default": "$id.$key.output.h5mu"
|
||||
}
|
||||
}
|
||||
},
|
||||
"mitochondrial & ribosomal gene detection": {
|
||||
"title": "Mitochondrial & Ribosomal Gene Detection",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"var_gene_names": {
|
||||
"type": "string",
|
||||
"description": ".var column name to be used to detect mitochondrial/ribosomal genes instead of .var_names (default if not set).\nGene names matching with the regex value from --mitochondrial_gene_regex or --ribosomal_gene_regex will be \nidentified as mitochondrial or ribosomal genes, respectively.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, example: `\"gene_symbol\"`. "
|
||||
},
|
||||
"var_name_mitochondrial_genes": {
|
||||
"type": "string",
|
||||
"description": "In which .var slot to store a boolean array corresponding the mitochondrial genes.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"obs_name_mitochondrial_fraction": {
|
||||
"type": "string",
|
||||
"description": ".Obs slot to store the fraction of reads found to be mitochondrial",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"mitochondrial_gene_regex": {
|
||||
"type": "string",
|
||||
"description": "Regex string that identifies mitochondrial genes from --var_gene_names.\nBy default will detect human and mouse mitochondrial genes from a gene symbol.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"^[mM][tT]-\"`. ",
|
||||
"default": "^[mM][tT]-"
|
||||
},
|
||||
"var_name_ribosomal_genes": {
|
||||
"type": "string",
|
||||
"description": "In which .var slot to store a boolean array corresponding the ribosomal genes.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"obs_name_ribosomal_fraction": {
|
||||
"type": "string",
|
||||
"description": "When specified, write the fraction of counts originating from ribosomal genes \n(based on --ribosomal_gene_regex) to an .obs column with the specified name.\nRequires --var_name_ribosomal_genes.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`. "
|
||||
},
|
||||
"ribosomal_gene_regex": {
|
||||
"type": "string",
|
||||
"description": "Regex string that identifies ribosomal genes from --var_gene_names.\nBy default will detect human and mouse ribosomal genes from a gene symbol.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"^[Mm]?[Rr][Pp][LlSs]\"`. ",
|
||||
"default": "^[Mm]?[Rr][Pp][LlSs]"
|
||||
}
|
||||
}
|
||||
},
|
||||
"qc metrics calculation options": {
|
||||
"title": "QC metrics calculation options",
|
||||
"type": "object",
|
||||
"description": "No description",
|
||||
"properties": {
|
||||
"var_qc_metrics": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": "Keys to select a boolean (containing only True or False) column from .var.\nFor each cell, calculate the proportion of total values for genes which are labeled 'True', \ncompared to the total sum of the values for all genes",
|
||||
"help_text": "Type: `string`, multiple: `True`, example: `[\"ercc,highly_variable\"]`. "
|
||||
},
|
||||
"top_n_vars": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "integer"
|
||||
},
|
||||
"description": "Number of top vars to be used to calculate cumulative proportions.\nIf not specified, proportions are not calculated",
|
||||
"help_text": "Type: `integer`, multiple: `True`, default: `[50,100,200,500]`. ",
|
||||
"default": [
|
||||
50,
|
||||
100,
|
||||
200,
|
||||
500
|
||||
]
|
||||
},
|
||||
"output_obs_num_nonzero_vars": {
|
||||
"type": "string",
|
||||
"description": "Name of column in .obs describing, for each observation, the number of stored values\n(including explicit zeroes)",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"num_nonzero_vars\"`. ",
|
||||
"default": "num_nonzero_vars"
|
||||
},
|
||||
"output_obs_total_counts_vars": {
|
||||
"type": "string",
|
||||
"description": "Name of the column for .obs describing, for each observation (row),\nthe sum of the stored values in the columns.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"total_counts\"`. ",
|
||||
"default": "total_counts"
|
||||
},
|
||||
"output_var_num_nonzero_obs": {
|
||||
"type": "string",
|
||||
"description": "Name of column describing, for each feature, the number of stored values\n(including explicit zeroes)",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"num_nonzero_obs\"`. ",
|
||||
"default": "num_nonzero_obs"
|
||||
},
|
||||
"output_var_total_counts_obs": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .var describing, for each feature (column),\nthe sum of the stored values in the rows.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"total_counts\"`. ",
|
||||
"default": "total_counts"
|
||||
},
|
||||
"output_var_obs_mean": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .obs providing the mean of the values in each row.\n",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"obs_mean\"`. ",
|
||||
"default": "obs_mean"
|
||||
},
|
||||
"output_var_pct_dropout": {
|
||||
"type": "string",
|
||||
"description": "Name of the column in .obs providing for each feature the percentage of\nobservations the feature does not appear on (i.e",
|
||||
"help_text": "Type: `string`, multiple: `False`, default: `\"pct_dropout\"`. ",
|
||||
"default": "pct_dropout"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nextflow input-output arguments": {
|
||||
"title": "Nextflow input-output arguments",
|
||||
"type": "object",
|
||||
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
|
||||
"properties": {
|
||||
"publish_dir": {
|
||||
"type": "string",
|
||||
"description": "Path to an output directory.",
|
||||
"help_text": "Type: `string`, multiple: `False`, required, example: `\"output/\"`. "
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"allOf": [
|
||||
{
|
||||
"$ref": "#/$defs/inputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/outputs"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/mitochondrial & ribosomal gene detection"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/qc metrics calculation options"
|
||||
},
|
||||
{
|
||||
"$ref": "#/$defs/nextflow input-output arguments"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1 @@
|
||||
process.errorStrategy = 'ignore'
|
||||
@@ -0,0 +1,36 @@
|
||||
profiles {
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
process {
|
||||
withLabel: lowmem { memory = 13.Gb }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midmem { memory = 13.Gb }
|
||||
withLabel: midcpu { cpus = 4 }
|
||||
withLabel: highmem { memory = 13.Gb }
|
||||
withLabel: highcpu { cpus = 4 }
|
||||
withLabel: veryhighmem { memory = 13.Gb }
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
}
|
||||
|
||||
env.NUMBA_CACHE_DIR = '/tmp'
|
||||
|
||||
trace {
|
||||
enabled = true
|
||||
overwrite = true
|
||||
}
|
||||
dag {
|
||||
overwrite = true
|
||||
}
|
||||
|
||||
process.maxForks = 1
|
||||
@@ -0,0 +1,336 @@
|
||||
name: "move_mudata_obs_to_tiledb"
|
||||
namespace: "tiledb"
|
||||
version: "v4.0.4"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Input database"
|
||||
description: "Open a tileDB-SOMA database by URI or as a local directory."
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--input_uri"
|
||||
description: "A URI pointing to a TileDB-SOMA database. Mutually exclusive with\
|
||||
\ 'input_dir'"
|
||||
info: null
|
||||
example:
|
||||
- "s3://bucket/path"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--input_dir"
|
||||
description: "Path to a TileDB-SOMA database as a local directory"
|
||||
info: null
|
||||
example:
|
||||
- "./tiledb_database"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--s3_region"
|
||||
description: "Region where the TileDB-SOMA database is hosted.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--endpoint"
|
||||
description: "Custom endpoint to use to connect to S3\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--s3_no_sign_request"
|
||||
description: "Do not sign S3 requests. Credentials will not be loaded if this\
|
||||
\ argument is provided.\n"
|
||||
info: null
|
||||
default:
|
||||
- false
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_modality"
|
||||
description: "TileDB-SOMA measurement to add the output to.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_index_name_input"
|
||||
description: "Name of the index that is used to describe the cells (observations).\n"
|
||||
info: null
|
||||
default:
|
||||
- "cell_id"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "MuData input"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input_mudata"
|
||||
description: "MuData object to take the columns from. The observations and their\
|
||||
\ order should\nmatch between the database and the input modality.\n"
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Modality where to take the .obs from.\n"
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obs_input"
|
||||
description: "Columns from .obs to copy. The keys should not be present yet in\
|
||||
\ the database.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "TileDB-SOMA output"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output_tiledb"
|
||||
description: "Output to a directory instead of adding to the existing database.\n"
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Move .obs columns from a MuData modality to an existing tileDB database.\n\
|
||||
The .obs keys should not exist in the database yet; and the observations from the\
|
||||
\ modality and \ntheir order should match with what is already present the tiledb\
|
||||
\ database.\n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "tiledb"
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
docker_run_args:
|
||||
- "--env"
|
||||
- "AWS_ACCESS_KEY_ID"
|
||||
- "--env"
|
||||
- "AWS_SECRET_ACCESS_KEY"
|
||||
- "--env"
|
||||
- "AWS_DEFAULT_REGION"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "highmem"
|
||||
- "midcpu"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.12"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.4"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "anndata~=0.12.7"
|
||||
- "awkward"
|
||||
- "mudata~=0.3.2"
|
||||
- "tiledbsoma"
|
||||
- "boto3"
|
||||
- "awscli"
|
||||
script:
|
||||
- "exec(\"try:\\n import zarr; from importlib.metadata import version\\nexcept\
|
||||
\ ModuleNotFoundError:\\n exit(0)\\nelse: assert int(version(\\\"zarr\\\"\
|
||||
).partition(\\\".\\\")[0]) > 2\")"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.8.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "moto[server]"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/tiledb/move_mudata_obs_to_tiledb/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/_private/nextflow/tiledb/move_mudata_obs_to_tiledb"
|
||||
executable: "target/_private/nextflow/tiledb/move_mudata_obs_to_tiledb/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "fb7dc76676aa63d06ae1421bbdd6312ad4f67312"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.4"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.4'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'tiledb/move_mudata_obs_to_tiledb'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.4'
|
||||
description = 'Move .obs columns from a MuData modality to an existing tileDB database.\nThe .obs keys should not exist in the database yet; and the observations from the modality and \ntheir order should match with what is already present the tiledb database.\n'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,313 @@
|
||||
name: "move_mudata_obsm_to_tiledb"
|
||||
namespace: "tiledb"
|
||||
version: "v4.0.4"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Input database"
|
||||
description: "Open a tileDB-SOMA database by URI."
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--input_uri"
|
||||
description: "A URI pointing to a TileDB-SOMA database."
|
||||
info: null
|
||||
example:
|
||||
- "s3://bucket/path"
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--s3_region"
|
||||
description: "Region where the TileDB-SOMA database is hosted.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--endpoint"
|
||||
description: "Custom endpoint to use to connect to S3\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--s3_no_sign_request"
|
||||
description: "Do not sign S3 requests. Credentials will not be loaded if this\
|
||||
\ argument is provided.\n"
|
||||
info: null
|
||||
default:
|
||||
- false
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--output_modality"
|
||||
description: "TileDB-SOMA measurement to add the output to.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "MuData input"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input_mudata"
|
||||
description: "MuData object to take the columns from. The observations and their\
|
||||
\ order should\nmatch between the database and the input modality.\n"
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Modality where to take the .obsm from.\n"
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--obsm_input"
|
||||
description: "Keys from .obm to copy. The keys should not be present yet in the\
|
||||
\ database.\n"
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "TileDB-SOMA output"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output_tiledb"
|
||||
description: "Output to a directory instead of adding to the existing database.\n"
|
||||
info: null
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Move .obsm items from a MuData modality to an existing tileDB database.\n\
|
||||
The .obsm keys should not exist in the database yet; and the observations from the\
|
||||
\ modality and \ntheir order should match with what is already present the tiledb\
|
||||
\ database.\n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "tiledb"
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
docker_run_args:
|
||||
- "--env"
|
||||
- "AWS_ACCESS_KEY_ID"
|
||||
- "--env"
|
||||
- "AWS_SECRET_ACCESS_KEY"
|
||||
- "--env"
|
||||
- "AWS_DEFAULT_REGION"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "highmem"
|
||||
- "midcpu"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.12"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "v4.0.4"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "anndata~=0.12.7"
|
||||
- "awkward"
|
||||
- "mudata~=0.3.2"
|
||||
- "tiledbsoma"
|
||||
- "boto3"
|
||||
- "awscli"
|
||||
script:
|
||||
- "exec(\"try:\\n import zarr; from importlib.metadata import version\\nexcept\
|
||||
\ ModuleNotFoundError:\\n exit(0)\\nelse: assert int(version(\\\"zarr\\\"\
|
||||
).partition(\\\".\\\")[0]) > 2\")"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.8.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "moto[server]"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/tiledb/move_mudata_obsm_to_tiledb/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/_private/nextflow/tiledb/move_mudata_obsm_to_tiledb"
|
||||
executable: "target/_private/nextflow/tiledb/move_mudata_obsm_to_tiledb/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "fb7dc76676aa63d06ae1421bbdd6312ad4f67312"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.4"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.4'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'tiledb/move_mudata_obsm_to_tiledb'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.4'
|
||||
description = 'Move .obsm items from a MuData modality to an existing tileDB database.\nThe .obsm keys should not exist in the database yet; and the observations from the modality and \ntheir order should match with what is already present the tiledb database.\n'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
@@ -0,0 +1,233 @@
|
||||
name: "split_modalities"
|
||||
namespace: "workflows/multiomics"
|
||||
version: "v4.0.4"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--id"
|
||||
description: "ID of the sample."
|
||||
info: null
|
||||
example:
|
||||
- "foo"
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Path to the sample."
|
||||
info: null
|
||||
example:
|
||||
- "input.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "Output directory containing multiple h5mu files."
|
||||
info: null
|
||||
example:
|
||||
- "/path/to/output"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output_types"
|
||||
description: "A csv containing the base filename and modality type per output\
|
||||
\ file."
|
||||
info: null
|
||||
example:
|
||||
- "types.csv"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "nextflow_script"
|
||||
path: "main.nf"
|
||||
is_executable: true
|
||||
entrypoint: "run_wf"
|
||||
- type: "file"
|
||||
path: "utils"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "A pipeline to split a multimodal mudata files into several unimodal\
|
||||
\ mudata files."
|
||||
test_resources:
|
||||
- type: "nextflow_script"
|
||||
path: "test.nf"
|
||||
is_executable: true
|
||||
entrypoint: "test_wf"
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5mu"
|
||||
info:
|
||||
test_dependencies:
|
||||
- name: "split_modalities_test"
|
||||
namespace: "test_workflows/multiomics"
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
dependencies:
|
||||
- name: "dataflow/split_modalities"
|
||||
alias: "split_modalities_component"
|
||||
repository:
|
||||
type: "local"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/workflows/multiomics/split_modalities/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "native"
|
||||
output: "target/_private/nextflow/workflows/multiomics/split_modalities"
|
||||
executable: "target/_private/nextflow/workflows/multiomics/split_modalities/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "fb7dc76676aa63d06ae1421bbdd6312ad4f67312"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
dependencies:
|
||||
- "target/nextflow/dataflow/split_modalities"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.4"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.4'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'workflows/multiomics/split_modalities'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.4'
|
||||
description = 'A pipeline to split a multimodal mudata files into several unimodal mudata files.'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1 @@
|
||||
process.errorStrategy = 'ignore'
|
||||
@@ -0,0 +1,36 @@
|
||||
profiles {
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,48 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
|
||||
// The memory a task is assinged increases with each attempt
|
||||
// uncomment the line below and adjust the value to set a global upper limit on the memory.
|
||||
// resourceLimits = [ memory: 240.Gb ]
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 4.GB * task.attempt } }
|
||||
withLabel: midmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 25.GB * task.attempt } }
|
||||
withLabel: highmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 50.GB * task.attempt } }
|
||||
withLabel: veryhighmem { memory = { task?.resourceLimits?.memory && task?.maxRetries && task.attempt >= task.maxRetries ? task.resourceLimits.memory : 75.GB * task.attempt } }
|
||||
|
||||
// Disk space
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
process {
|
||||
withLabel: lowmem { memory = 13.Gb }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midmem { memory = 13.Gb }
|
||||
withLabel: midcpu { cpus = 4 }
|
||||
withLabel: highmem { memory = 13.Gb }
|
||||
withLabel: highcpu { cpus = 4 }
|
||||
withLabel: veryhighmem { memory = 13.Gb }
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
}
|
||||
|
||||
env.NUMBA_CACHE_DIR = '/tmp'
|
||||
|
||||
trace {
|
||||
enabled = true
|
||||
overwrite = true
|
||||
}
|
||||
dag {
|
||||
overwrite = true
|
||||
}
|
||||
|
||||
process.maxForks = 1
|
||||
@@ -0,0 +1,250 @@
|
||||
name: "log_normalize"
|
||||
namespace: "workflows/rna"
|
||||
version: "v4.0.4"
|
||||
authors:
|
||||
- name: "Dries Schaumont"
|
||||
roles:
|
||||
- "author"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dries@data-intuitive.com"
|
||||
github: "DriesSchaumont"
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: "dries-schaumont"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
description: "MuData file to transform."
|
||||
info: null
|
||||
example:
|
||||
- "dataset.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--modality"
|
||||
description: "Modality to process."
|
||||
info: null
|
||||
default:
|
||||
- "rna"
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "string"
|
||||
name: "--layer"
|
||||
description: "Input layer containing raw counts. If not specified, .X is used."
|
||||
info: null
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Transormation options"
|
||||
arguments:
|
||||
- type: "integer"
|
||||
name: "--target_sum"
|
||||
description: "Normalize total counts to the specified amount. If not set, after\
|
||||
\ normalization each observation (cell) \nwill have a total count equal to the\
|
||||
\ median of total counts for observations (cells) before normalization.\n"
|
||||
info: null
|
||||
required: false
|
||||
min: 1
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Output slots"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--output_layer"
|
||||
description: "Layer to write the log-transformed counts to.\n"
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
description: "Destination path to the output."
|
||||
info: null
|
||||
example:
|
||||
- "output.h5mu"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "nextflow_script"
|
||||
path: "main.nf"
|
||||
is_executable: true
|
||||
entrypoint: "run_wf"
|
||||
- type: "file"
|
||||
path: "utils"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Performs normalization and subsequent log-transformation of raw count\
|
||||
\ data."
|
||||
test_resources:
|
||||
- type: "nextflow_script"
|
||||
path: "test.nf"
|
||||
is_executable: true
|
||||
entrypoint: "test_wf"
|
||||
- type: "file"
|
||||
path: "pbmc_1k_protein_v3"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
dependencies:
|
||||
- name: "transform/normalize_total"
|
||||
repository:
|
||||
type: "local"
|
||||
- name: "transform/log1p"
|
||||
repository:
|
||||
type: "local"
|
||||
- name: "transform/delete_layer"
|
||||
repository:
|
||||
type: "local"
|
||||
license: "MIT"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/workflows/rna/log_normalize/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "native"
|
||||
output: "target/_private/nextflow/workflows/rna/log_normalize"
|
||||
executable: "target/_private/nextflow/workflows/rna/log_normalize/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "fb7dc76676aa63d06ae1421bbdd6312ad4f67312"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline"
|
||||
dependencies:
|
||||
- "target/nextflow/transform/normalize_total"
|
||||
- "target/nextflow/transform/log1p"
|
||||
- "target/nextflow/transform/delete_layer"
|
||||
package_config:
|
||||
name: "openpipeline"
|
||||
version: "v4.0.4"
|
||||
summary: "Best-practice workflows for single-cell multi-omics analyses.\n"
|
||||
description: "OpenPipelines are extensible single cell analysis pipelines for reproducible\
|
||||
\ and large-scale single cell processing using [Viash](https://viash.io) and [Nextflow](https://www.nextflow.io/).\n\
|
||||
\nIn terms of workflows, the following has been made available, but keep in mind\
|
||||
\ that\nindividual tools and functionality can be executed as standalone components\
|
||||
\ as well.\n\n * Demultiplexing: conversion of raw sequencing data to FASTQ objects.\n\
|
||||
\ * Ingestion: Read mapping and generating a count matrix.\n * Single sample\
|
||||
\ processing: cell filtering and doublet detection.\n * Multisample processing:\
|
||||
\ Count transformation, normalization, QC metric calulations.\n * Integration:\
|
||||
\ Clustering, integration and batch correction using single and multimodal methods.\n\
|
||||
\ * Downstream analysis workflows\n"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-data"
|
||||
dest: "resources_test"
|
||||
nextflow_labels_ci:
|
||||
- path: "src/workflows/utils/labels_ci.config"
|
||||
description: "Adds the correct memory and CPU labels when running on the Viash\
|
||||
\ Hub CI."
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'v4.0.4'"
|
||||
keywords:
|
||||
- "single-cell"
|
||||
- "multimodal"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline"
|
||||
docker_registry: "ghcr.io"
|
||||
homepage: "https://openpipelines.bio"
|
||||
documentation: "https://openpipelines.bio/fundamentals"
|
||||
issue_tracker: "https://github.com/openpipelines-bio/openpipeline/issues"
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'workflows/rna/log_normalize'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'v4.0.4'
|
||||
description = 'Performs normalization and subsequent log-transformation of raw count data.'
|
||||
author = 'Dries Schaumont'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user