Build branch openpipeline_spatial/fix-unit-tests with version fix-unit-tests to openpipeline_spatial on branch fix-unit-tests (1bcdc62)
Build pipeline: openpipelines-bio.openpipeline-spatial.fix-unit-tests-cpsxf
Source commit: 1bcdc62a71
Source message: update changelog
This commit is contained in:
63
.gitignore
vendored
Normal file
63
.gitignore
vendored
Normal file
@@ -0,0 +1,63 @@
|
||||
# IDEs and editors
|
||||
/.idea
|
||||
.project
|
||||
.classpath
|
||||
*.launch
|
||||
.settings/
|
||||
.vscode
|
||||
|
||||
# Temp
|
||||
gitignore
|
||||
test_results
|
||||
|
||||
# System Files
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Nextflow
|
||||
work
|
||||
.nextflow*
|
||||
|
||||
# viash
|
||||
check_results/
|
||||
out/
|
||||
output*
|
||||
output_log/
|
||||
resources_test
|
||||
/viash_tools/
|
||||
/test/
|
||||
|
||||
# jupyter notebook
|
||||
/.ipynb_checkpoints/
|
||||
*.ipynb
|
||||
|
||||
# compress
|
||||
/__MACOSX/
|
||||
|
||||
# python
|
||||
*__pycache__*
|
||||
|
||||
# Python virtual environments
|
||||
.venv
|
||||
|
||||
# temporary files related
|
||||
temp
|
||||
|
||||
# NextFlow
|
||||
work/
|
||||
.nextflow.log
|
||||
.nextflow*
|
||||
out/
|
||||
trace*.txt
|
||||
|
||||
# Macos
|
||||
.DS_Store
|
||||
|
||||
# vscode
|
||||
.vscode/launch.json
|
||||
.vscode/settings.json
|
||||
|
||||
# linting
|
||||
renv.lock
|
||||
.Rprofile
|
||||
renv/
|
||||
24
.pre-commit-config.yaml
Normal file
24
.pre-commit-config.yaml
Normal file
@@ -0,0 +1,24 @@
|
||||
|
||||
repos:
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
# Ruff version.
|
||||
rev: v0.14.0
|
||||
hooks:
|
||||
- id: ruff-check
|
||||
args: [ --fix ]
|
||||
- id: ruff-format
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: run_styler
|
||||
name: run_styler
|
||||
language: r
|
||||
description: style files with {styler}
|
||||
entry: "Rscript -e 'styler::style_file(commandArgs(TRUE))'"
|
||||
files: '(\.[rR]profile|\.[rR]|\.[rR]md|\.[rR]nw|\.[qQ]md)$'
|
||||
additional_dependencies:
|
||||
- styler
|
||||
- knitr
|
||||
- repo: https://github.com/lorenzwalthert/precommit
|
||||
rev: v0.4.3.9015
|
||||
hooks:
|
||||
- id: lintr
|
||||
63
CHANGELOG.md
Normal file
63
CHANGELOG.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# openpipeline_spatial 0.2.0
|
||||
|
||||
## NEW FUNCTIONALITY
|
||||
|
||||
* `neighbors/spatial_neighborhood_graph`: Calculate the spatial neighborhood graph (PR #29).
|
||||
|
||||
* `convert/from_spaceranger_to_h5mu`: Added converter component for convert Spaceranger output to H5MU files (PR #33).
|
||||
|
||||
* `workflows/ingestion/spaceranger_mapping`: Added a workflow to ingest Visium data using Spaceranger and convert the count matrix to an H5MU file (PR #33).
|
||||
|
||||
## MINOR CHANGES
|
||||
|
||||
* Add `scope` to component and workflow configurations (PR #22).
|
||||
|
||||
* `convert/from_xenium_to_spatialexperiment`: Add arrow with zstd codec support to handle I/O of zstd-compressed Xenium parquet files (PR #30).
|
||||
|
||||
* `mapping/spaceranger_count`: Allow providing individual FASTQ files instead of directories (PR #32).
|
||||
|
||||
* Bump anndata to 0.12.7 and mudata to 0.3.2 (PR #34).
|
||||
|
||||
* Bump spatialdata to 0.6.1 and spatialdata-io to 0.5.1 (PR #24, #34).
|
||||
|
||||
* Bump squidpy to 1.7.0 (PR #36).
|
||||
|
||||
## BUG FIXES
|
||||
|
||||
* `convert/from_cosmx_to_h5mu`: Fixed an issue where parent directories of the cosmx output bundle were duplicated when reading in data (PR #25).
|
||||
|
||||
* `mapping/spaceranger_count`: Fixed issue with long temporary folder paths causing write failures (PR #31).
|
||||
|
||||
# openpipeline_spatial 0.1.1
|
||||
|
||||
## MINOR CHANGES
|
||||
|
||||
* Add a README (PR #21).
|
||||
|
||||
## NEW FUNCTIONALITY
|
||||
|
||||
* `convert`: Updated multiple components to accept spatial output bundles in .zip format (for CosMx, Xenium and Aviti) as input (PR #19, PR #20).
|
||||
|
||||
* `convert/from_cosmx_to_h5mu`: Updated component to handle CosMx output bundles generated with AtoMx SIP versions < v1.3.2 (PR #25).
|
||||
|
||||
# openpipeline_spatial 0.1.0
|
||||
|
||||
## NEW FUNCTIONALITY
|
||||
|
||||
* `filter/subset_cosmx`: Added a component to subset COSMX data (PR #3, PR #9).
|
||||
|
||||
* `convert/from_cosmx_to_h5mu`: Added converter component for COSMX data (PR #3, PR #9).
|
||||
|
||||
* `mapping/spaceranger_count`: Added a spaceranger count component (PR #2).
|
||||
|
||||
* `convert/from_spatialdata_to_h5mu`, `convert/from_xenium_to_spatialdata`, `convert/from_xenium_to_h5mu`: Added converter components for xenium data (PR #1, #10).
|
||||
|
||||
* `convert/from_xenium_to_spatialexperiment`, `convert/from_cosmx_to_spatialexperiment`: Added converter components for Xenium or CosMx data to SpatialExperiment objects (PR #9).
|
||||
|
||||
* `convert/from_cells2stats_to_h5mu`: Added a component to convert data resulting from Aviti Teton sequencers processed by Cells2Stats into an H5MU file (PR #15).
|
||||
|
||||
* `workflows/qc/qc`: Added a pipeline for calculating qc metrics of spatial omics samples (PR #5).
|
||||
|
||||
* `workflows/multiomics/spatial_process_samples`: Added a pipeline to pre-process multiple spatial omics samples (PR #7).
|
||||
|
||||
* `convert/from_h5mu_to_spatialexperiment`: Added converter component for H5MU data to SpatialExperiment objects (PR #15).
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 openpipelines-bio
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
55
README.md
Normal file
55
README.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# OpenPipeline Spatial
|
||||
|
||||
Extensible spatial single cell analysis pipelines for reproducible and large-scale spatial single cell processing using Viash and Nextflow.
|
||||
|
||||
OpenPipeline Spatial extends the [OpenPipeline](https://github.com/openpipelines-bio/openpipeline/) ecosystem with specialized workflows and components for spatial transcriptomics analysis. It provides standardized, reproducible pipelines that are technology-agnostic and can be used for processing spatial omics data from various technologies and platforms.
|
||||
|
||||
[](https://www.viash-hub.com/packages/openpipeline_spatial)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_spatial)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_spatial/blob/main/LICENSE)
|
||||
[](https://github.com/openpipelines-bio/openpipeline_spatial/issues)
|
||||
[](https://viash.io)
|
||||
|
||||
## Functionality
|
||||
|
||||
OpenPipeline Spatial executes a list of predefined tasks specifically designed for spatial omics data. These discrete steps are also provided as standalone components that can be executed individually with a standardized interface.
|
||||
|
||||
The following spatial-specific workflows are provided:
|
||||
|
||||
- [Ingestion](https://www.viash-hub.com/packages/openpipeline_spatial/latest/components?search=mapping): Whereas many technologies generate count matrices on-instrument, functionality is provided for the mapping & quantification of 10X Visum data.
|
||||
- [Interoperability](https://www.viash-hub.com/packages/openpipeline_spatial/latest/components?search=convert): To make sure all spatial workflows are technology-agnostic, functionality is provided to convert count matrices from different technologies (e.g. Xenium, CosMx, AtoMx, Aviti) into a common format (H5MU). In addition, functionality is provided to convert between various Spatial data formats (e.g. Seurat, SpatialExperiment, MuData, SpatialData).
|
||||
- [QC](https://www.viash-hub.com/packages/openpipeline_spatial/latest/components?search=spatial_qc): Calculation of comprehensive quality control metrics.
|
||||
- [Sample Processing](https://www.viash-hub.com/packages/openpipeline_spatial/latest/components?search=spatial_process_samples): Batch processing of multiple spatial samples, including count-based filtering, normalisation and dimensionality reduction.
|
||||
|
||||
## Extended functionality
|
||||
|
||||
Whereas this package only provides spatial-specific functionality, it is designed to work seamlessly with the core [OpenPipeline package](https://github.com/openpipelines-bio/openpipeline/). This means that all core OpenPipeline workflows and components can be used in conjunction with the spatial-specific ones. For example, the [**integration**](https://www.viash-hub.com/packages/openpipeline/latest/components?search=workflows%2Fintegration) and [**cell type annotation**](https://www.viash-hub.com/packages/openpipeline/latest/components?search=workflows%2Fannotation) workflows can be applied to spatial data after it has been processed using the spatial-specific workflows.
|
||||
|
||||
``` mermaid lang="mermaid"
|
||||
flowchart LR
|
||||
demultiplexing["Step 1: Ingestion"]
|
||||
ingestion["Step 2: QC"]
|
||||
process_samples["Step 3: Process Samples"]
|
||||
integration["Step 4: Integration"]
|
||||
downstream["Step 5: Downstream Analysis"]
|
||||
demultiplexing-->ingestion-->process_samples-->integration-->downstream
|
||||
```
|
||||
|
||||
## Execution via CLI or Seqera Cloud
|
||||
|
||||
The openpipeline_spatial package is available via [Viash
|
||||
Hub](https://www.viash-hub.com/packages/openpipeline_spatial/latest/), where
|
||||
you can receive instructions on how to run the end-to-end workflow as
|
||||
well as individual subworkflows or components.
|
||||
|
||||
It’s possible to run the workflow directly from Seqera Cloud. The necessary Nextflow schema files have been [built and provided with the workflows](https://packages.viash-hub.com/vsh/openpipeline_spatial/-/tree/build/main/target/nextflow?ref_type=heads) in order to use the form-based input. However, Seqera Cloud can not deal with multiple-value parameters for batch processing of multiple samples. Therefore, it’s better to use Viash Hub also here for launching the workflow on Seqera Cloud.
|
||||
|
||||
* Navigate to the [Viash Hub package page](https://www.viash-hub.com/packages/openpipeline_spatial/latest/), select the workflow you want to launch and click the `launch` button.
|
||||
* Select the execution environment of choice (e.g. `Seqera Cloud`, `CLI` or `Executable`)
|
||||
* Fill in the form with the required parameters and launch the workflow.
|
||||
|
||||
## Support
|
||||
For issues specific to spatial analysis, please use the [GitHub issues tracker](https://github.com/openpipelines-bio/openpipeline_spatial/issues). For general OpenPipeline questions, refer to the main [OpenPipeline documentation](https://openpipelines.bio/).
|
||||
22
_viash.yaml
Normal file
22
_viash.yaml
Normal file
@@ -0,0 +1,22 @@
|
||||
viash_version: 0.9.4
|
||||
source: src
|
||||
target: target
|
||||
name: openpipeline_spatial
|
||||
organization: vsh
|
||||
links:
|
||||
repository: https://github.com/openpipelines-bio/openpipeline_spatial
|
||||
docker_registry: ghcr.io
|
||||
repositories:
|
||||
- name: openpipeline
|
||||
repo: openpipeline
|
||||
type: vsh
|
||||
tag: v3.0.0
|
||||
info:
|
||||
test_resources:
|
||||
- type: s3
|
||||
path: s3://openpipelines-bio/openpipeline_spatial/resources_test
|
||||
dest: resources_test
|
||||
config_mods: |-
|
||||
.resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig("nextflow_labels.config")'
|
||||
version: fix-unit-tests
|
||||
0
nextflow.config
Normal file
0
nextflow.config
Normal file
116
resources_test_scripts/aviti_teton_tiny.sh
Normal file
116
resources_test_scripts/aviti_teton_tiny.sh
Normal file
@@ -0,0 +1,116 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
ID=aviti
|
||||
DIR=resources_test/$ID/
|
||||
OUT=$DIR/teton_cells2stats_tiny/
|
||||
|
||||
# Create directories
|
||||
[ -d "$DIR" ] || mkdir -p "$DIR"
|
||||
[ -d "$OUT" ] || mkdir -p "$OUT"
|
||||
|
||||
echo "> Downloading Aviti Teton data"
|
||||
wget "https://go.elementbiosciences.com/l/938263/28kddnj7/d59cp" -O "${DIR}/PLUT-0105.tar.gz"
|
||||
tar -xzf "${DIR}/PLUT-0105.tar.gz" -C "$DIR"
|
||||
rm "${DIR}/PLUT-0105.tar.gz"
|
||||
|
||||
echo "> Processing and subsetting Aviti Teton data"
|
||||
python <<HEREDOC
|
||||
import os
|
||||
import shutil
|
||||
import pandas as pd
|
||||
import glob
|
||||
import json
|
||||
|
||||
src_dir = "${DIR}/PLUT-0105"
|
||||
dest_dir = "${OUT}"
|
||||
subset_image_dirs = False
|
||||
wells_to_keep = ["A1"]
|
||||
max_cells_per_well = 1000
|
||||
|
||||
os.makedirs(dest_dir, exist_ok=True)
|
||||
|
||||
print(f"Processing data from {src_dir} to {dest_dir}")
|
||||
|
||||
# Copy images
|
||||
if subset_image_dirs:
|
||||
image_dirs = ["CellSegmentation", "Projection"]
|
||||
for image_dir in image_dirs:
|
||||
image_dir_path = os.path.join(src_dir, image_dir)
|
||||
if not os.path.exists(image_dir_path):
|
||||
print(f"Warning: Image directory not found: {image_dir_path}")
|
||||
continue
|
||||
if not os.path.isdir(image_dir_path):
|
||||
print(f"Warning: Path exists but is not a directory: {image_dir_path}")
|
||||
continue
|
||||
print(f"Processing image directory: {image_dir}")
|
||||
|
||||
for well in wells_to_keep:
|
||||
dest_path = f"{dest_dir}/{image_dir}/Well{well}"
|
||||
os.makedirs(dest_path, exist_ok=True)
|
||||
src_path = glob.glob(os.path.join(src_dir, image_dir, f"Well{well}"))
|
||||
if len(src_path) != 1:
|
||||
print(f"Warning: Expected 1 path for Well{well}, found {len(src_path)}")
|
||||
continue
|
||||
shutil.copytree(src_path[0], os.path.join(dest_path), dirs_exist_ok=True)
|
||||
|
||||
# Copy count matrix
|
||||
src_path = os.path.join(src_dir, "Cytoprofiling", "Instrument", "RawCellStats.parquet")
|
||||
if os.path.exists(src_path):
|
||||
print(f"Processing count matrix: {src_path}")
|
||||
df = pd.read_parquet(src_path)
|
||||
print(f"Original data: {len(df)} rows")
|
||||
|
||||
# Filter by wells
|
||||
df = df[df["Well"].isin(wells_to_keep)]
|
||||
print(f"After well filtering: {len(df)} rows")
|
||||
|
||||
if max_cells_per_well:
|
||||
# Limit the number of cells per well
|
||||
df = df.head(max_cells_per_well)
|
||||
print(f"After cell limit: {len(df)} rows")
|
||||
|
||||
dest_path = os.path.join(dest_dir, "Cytoprofiling", "Instrument")
|
||||
os.makedirs(dest_path, exist_ok=True)
|
||||
dest_file = os.path.join(dest_path, "RawCellStats.parquet")
|
||||
df.to_parquet(dest_file, engine="pyarrow")
|
||||
print(f"Saved processed count matrix to {dest_file}")
|
||||
else:
|
||||
print(f"Warning: Count matrix not found at {src_path}")
|
||||
|
||||
# Copy Panel Metadata
|
||||
panel_src_path = os.path.join(src_dir, "Panel.json")
|
||||
if os.path.exists(panel_src_path):
|
||||
panel_dest_path = os.path.join(dest_dir, "Panel.json")
|
||||
shutil.copy2(panel_src_path, panel_dest_path)
|
||||
print(f"Copied Panel.json")
|
||||
else:
|
||||
print(f"Warning: Panel.json not found at {panel_src_path}")
|
||||
print("Processing complete!")
|
||||
HEREDOC
|
||||
|
||||
echo "> Removing original aviti_teton folder"
|
||||
rm -rf "$DIR/PLUT-0105"
|
||||
|
||||
echo "> Aviti Teton tiny dataset created successfully at $OUT"
|
||||
|
||||
viash run src/convert/from_cells2stats_to_h5mu/config.vsh.yaml -- \
|
||||
--input "${OUT}" \
|
||||
--output "$DIR/aviti_teton_tiny.h5mu" \
|
||||
--output_compression "gzip"
|
||||
|
||||
echo "> Conversion to H5MU complete"
|
||||
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
"$DIR" \
|
||||
s3://openpipelines-bio/openpipeline_spatial/resources_test/aviti \
|
||||
--delete \
|
||||
--dryrun
|
||||
52
resources_test_scripts/cosmx_tiny.sh
Executable file
52
resources_test_scripts/cosmx_tiny.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
DIR="resources_test/cosmx"
|
||||
ID="Lung5_Rep2"
|
||||
OUT="$DIR/$ID/"
|
||||
|
||||
# create tempdir
|
||||
MY_TEMP="${VIASH_TEMP:-/tmp}"
|
||||
TMPDIR=$(mktemp -d "$MY_TEMP/$ID-XXXXXX")
|
||||
function clean_up {
|
||||
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
if [ ! -d "$OUT" ]; then
|
||||
flat_dataset="https://nanostring-public-share.s3.us-west-2.amazonaws.com/SMI-Compressed/Lung5_Rep2/Lung5_Rep2+SMI+Flat+data.tar.gz"
|
||||
wget "$flat_dataset" -O "$TMPDIR/Lung5_Rep2.tar.gz"
|
||||
mkdir -p "$TMPDIR/Lung5_Rep2"
|
||||
tar -xzf "$TMPDIR/Lung5_Rep2.tar.gz" -C "$TMPDIR/Lung5_Rep2"
|
||||
mkdir -p "$OUT"
|
||||
mv "$TMPDIR/Lung5_Rep2/Lung5_Rep2/Lung5_Rep2-Flat_files_and_images/"* "$OUT/"
|
||||
fi
|
||||
|
||||
viash run src/filter/subset_cosmx/config.vsh.yaml -- \
|
||||
--input "$OUT" \
|
||||
--num_fovs 3 \
|
||||
--subset_transcripts_file True \
|
||||
--subset_polygons_file False \
|
||||
--output "${DIR}/${ID}_tiny"
|
||||
|
||||
viash run src/convert/from_cosmx_to_h5mu/config.vsh.yaml -- \
|
||||
--input ${DIR}/${ID}_tiny \
|
||||
--output "$DIR/${ID}_tiny.h5mu" \
|
||||
--output_compression "gzip"
|
||||
|
||||
rm -rf "$OUT"
|
||||
|
||||
# Sync to S3
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
"$DIR" \
|
||||
s3://openpipelines-bio/openpipeline_spatial/resources_test/cosmx \
|
||||
--delete \
|
||||
--dryrun
|
||||
19
resources_test_scripts/reference_tiny.sh
Executable file
19
resources_test_scripts/reference_tiny.sh
Executable file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
DIR="resources_test/GRCh38"
|
||||
|
||||
mkdir -p $DIR
|
||||
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
s3://openpipelines-bio/openpipeline_spatial/resources_test/GRCh38 \
|
||||
"$DIR" \
|
||||
--delete \
|
||||
--dryrun
|
||||
55
resources_test_scripts/visium_tiny.sh
Normal file
55
resources_test_scripts/visium_tiny.sh
Normal file
@@ -0,0 +1,55 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# Define absolute directory path
|
||||
DIR="$REPO_ROOT/resources_test/visium"
|
||||
|
||||
# from https://www.10xgenomics.com/resources/datasets/human-ovarian-cancer-1-standard
|
||||
mkdir -p "$DIR"
|
||||
|
||||
# Input Files - download to the specific directory
|
||||
curl -o "$DIR/Visium_FFPE_Human_Ovarian_Cancer_fastqs.tar" https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Ovarian_Cancer/Visium_FFPE_Human_Ovarian_Cancer_fastqs.tar
|
||||
curl -o "$DIR/Visium_FFPE_Human_Ovarian_Cancer_image.jpg" https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Ovarian_Cancer/Visium_FFPE_Human_Ovarian_Cancer_image.jpg
|
||||
curl -o "$DIR/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv" https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Ovarian_Cancer/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv
|
||||
|
||||
# Extract in the specific directory
|
||||
tar xvf "$DIR/Visium_FFPE_Human_Ovarian_Cancer_fastqs.tar" -C "$DIR"
|
||||
|
||||
# Create subsampled dataset with ImageMagick
|
||||
# https://imagemagick.org/index.php
|
||||
mkdir -p "$DIR/Visium_FFPE_Human_Ovarian_Cancer_tiny"
|
||||
convert "$DIR/Visium_FFPE_Human_Ovarian_Cancer_image.jpg" -resize 2000x2000 "$DIR/Visium_FFPE_Human_Ovarian_Cancer_image_tiny.jpg"
|
||||
for f in "$DIR"/Visium_FFPE_Human_Ovarian_Cancer_fastqs/*L001*R*; do
|
||||
gzip -cdf "$f" | head -n 40000 | gzip -c > "$DIR/Visium_FFPE_Human_Ovarian_Cancer_tiny/$(basename "$f")";
|
||||
done
|
||||
|
||||
echo "> Downloading and subsampling of datasets complete"
|
||||
|
||||
# Run spaceranger
|
||||
viash run src/mapping/spaceranger_count/config.vsh.yaml -- \
|
||||
--input "$DIR/Visium_FFPE_Human_Ovarian_Cancer_tiny" \
|
||||
--gex_reference "$REPO_ROOT/resources_test/GRCh38/" \
|
||||
--probe_set "$DIR/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv" \
|
||||
--image "$DIR/Visium_FFPE_Human_Ovarian_Cancer_image_tiny.jpg" \
|
||||
--slide "V10L13-020" \
|
||||
--area "D1" \
|
||||
--create_bam "false" \
|
||||
--output "Visium_FFPE_Human_Ovarian_Cancer_tiny_spaceranger"
|
||||
|
||||
mv
|
||||
echo "> Running spaceranger complete"
|
||||
|
||||
rm -rf "$DIR/Visium_FFPE_Human_Ovarian_Cancer_fastqs"
|
||||
rm -f "$DIR/Visium_FFPE_Human_Ovarian_Cancer_image.jpg"
|
||||
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
--exclude "*.yaml" \
|
||||
"$DIR" \
|
||||
s3://openpipelines-bio/openpipeline_spatial/resources_test/visium \
|
||||
--delete \
|
||||
--dryrun
|
||||
44
resources_test_scripts/xenium_tiny.sh
Executable file
44
resources_test_scripts/xenium_tiny.sh
Executable file
@@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# Define absolute directory paths
|
||||
DIR="$REPO_ROOT/resources_test/xenium"
|
||||
ID="xenium_tiny"
|
||||
OUT="$DIR/$ID"
|
||||
|
||||
# create tempdir
|
||||
MY_TEMP="${VIASH_TEMP:-/tmp}"
|
||||
TMPDIR=$(mktemp -d "$MY_TEMP/$ID-XXXXXX")
|
||||
function clean_up {
|
||||
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
if [ ! -d "$OUT" ]; then
|
||||
tiny_dataset="https://raw.githubusercontent.com/nf-core/test-datasets/spatialxe/Xenium_Prime_Mouse_Ileum_tiny_outs.zip"
|
||||
wget "$tiny_dataset" -O "$TMPDIR/xenium_tiny.zip"
|
||||
|
||||
unzip -q "$TMPDIR/xenium_tiny.zip" -d "$TMPDIR/xenium_tiny"
|
||||
mkdir -p "$OUT"
|
||||
mv "$TMPDIR/xenium_tiny/Xenium_Prime_Mouse_Ileum_tiny_outs/"* "$OUT/"
|
||||
fi
|
||||
|
||||
viash run "$REPO_ROOT/src/convert/from_xenium_to_spatialdata/config.vsh.yaml" -- \
|
||||
--input "$OUT" \
|
||||
--output "$DIR/$ID.zarr"
|
||||
|
||||
viash run "$REPO_ROOT/src/convert/from_spatialdata_to_h5mu/config.vsh.yaml" -- \
|
||||
--input "$DIR/$ID.zarr" \
|
||||
--output "$DIR/$ID.h5mu"
|
||||
|
||||
# Sync to S3
|
||||
aws s3 sync \
|
||||
--profile di \
|
||||
"$DIR" \
|
||||
s3://openpipelines-bio/openpipeline_spatial/resources_test/xenium \
|
||||
--delete \
|
||||
--dryrun
|
||||
43
ruff.toml
Normal file
43
ruff.toml
Normal file
@@ -0,0 +1,43 @@
|
||||
# Exclude a variety of commonly ignored directories.
|
||||
exclude = [
|
||||
".git",
|
||||
".pyenv",
|
||||
".pytest_cache",
|
||||
".ruff_cache",
|
||||
".venv",
|
||||
".vscode",
|
||||
"__pypackages__",
|
||||
"_build",
|
||||
"build",
|
||||
"dist",
|
||||
"node_modules",
|
||||
"site-packages",
|
||||
]
|
||||
|
||||
builtins = ["meta"]
|
||||
|
||||
|
||||
|
||||
|
||||
[format]
|
||||
# Like Black, use double quotes for strings.
|
||||
quote-style = "double"
|
||||
|
||||
# Like Black, indent with spaces, rather than tabs.
|
||||
indent-style = "space"
|
||||
|
||||
# Like Black, respect magic trailing commas.
|
||||
skip-magic-trailing-comma = false
|
||||
|
||||
# Like Black, automatically detect the appropriate line ending.
|
||||
line-ending = "auto"
|
||||
|
||||
[lint.flake8-pytest-style]
|
||||
fixture-parentheses = false
|
||||
mark-parentheses = false
|
||||
|
||||
[lint]
|
||||
ignore = [
|
||||
# module level import not at top of file
|
||||
"E402"
|
||||
]
|
||||
11
src/authors/dorien_roosen.yaml
Normal file
11
src/authors/dorien_roosen.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
name: Dorien Roosen
|
||||
info:
|
||||
role: Core Team Member
|
||||
links:
|
||||
email: dorien@data-intuitive.com
|
||||
github: dorien-er
|
||||
linkedin: dorien-roosen
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist
|
||||
12
src/authors/dries_schaumont.yaml
Normal file
12
src/authors/dries_schaumont.yaml
Normal file
@@ -0,0 +1,12 @@
|
||||
name: Dries Schaumont
|
||||
info:
|
||||
role: Core Team Member
|
||||
links:
|
||||
email: dries@data-intuitive.com
|
||||
github: DriesSchaumont
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: dries-schaumont
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist
|
||||
11
src/authors/jakub_majercik.yaml
Normal file
11
src/authors/jakub_majercik.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
name: Jakub Majercik
|
||||
info:
|
||||
role: Contributor
|
||||
links:
|
||||
email: jakub@data-intuitive.com
|
||||
github: jakubmajercik
|
||||
linkedin: jakubmajercik
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatics Engineer
|
||||
15
src/authors/robrecht_cannoodt.yaml
Normal file
15
src/authors/robrecht_cannoodt.yaml
Normal file
@@ -0,0 +1,15 @@
|
||||
name: Robrecht Cannoodt
|
||||
info:
|
||||
role: Core Team Member
|
||||
links:
|
||||
email: robrecht@data-intuitive.com
|
||||
github: rcannood
|
||||
orcid: "0000-0003-3641-729X"
|
||||
linkedin: robrechtcannoodt
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Science Engineer
|
||||
- name: Open Problems
|
||||
href: https://openproblems.bio
|
||||
role: Core Member
|
||||
6
src/authors/weiwei_schultz.yaml
Normal file
6
src/authors/weiwei_schultz.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
name: Weiwei Schultz
|
||||
info:
|
||||
role: Contributor
|
||||
organizations:
|
||||
- name: Janssen R&D US
|
||||
role: Associate Director Data Sciences
|
||||
9
src/base/h5_compression_argument.yaml
Normal file
9
src/base/h5_compression_argument.yaml
Normal file
@@ -0,0 +1,9 @@
|
||||
arguments:
|
||||
- name: "--output_compression"
|
||||
description: |
|
||||
Compression format to use for the output AnnData and/or Mudata objects.
|
||||
By default no compression is applied.
|
||||
type: string
|
||||
choices: ["gzip", "lzf"]
|
||||
required: false
|
||||
example: "gzip"
|
||||
3
src/base/requirements/anndata.yaml
Normal file
3
src/base/requirements/anndata.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
packages:
|
||||
- anndata~=0.12.7
|
||||
- awkward
|
||||
5
src/base/requirements/anndata_mudata.yaml
Normal file
5
src/base/requirements/anndata_mudata.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
__merge__: [/src/base/requirements/anndata.yaml, .]
|
||||
packages:
|
||||
- mudata~=0.3.2
|
||||
script: |
|
||||
exec("try:\n import zarr; from importlib.metadata import version\nexcept ModuleNotFoundError:\n exit(0)\nelse: assert int(version(\"zarr\").partition(\".\")[0]) > 2")
|
||||
2
src/base/requirements/openpipeline_testutils.yaml
Normal file
2
src/base/requirements/openpipeline_testutils.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
github:
|
||||
- openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils
|
||||
8
src/base/requirements/python_test_setup.yaml
Normal file
8
src/base/requirements/python_test_setup.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
test_setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- git
|
||||
- type: python
|
||||
__merge__:
|
||||
- /src/base/requirements/viashpy.yaml
|
||||
- /src/base/requirements/openpipeline_testutils.yaml
|
||||
2
src/base/requirements/scanpy.yaml
Normal file
2
src/base/requirements/scanpy.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
packages:
|
||||
- scanpy~=1.10.4
|
||||
3
src/base/requirements/spatialdata-io.yaml
Normal file
3
src/base/requirements/spatialdata-io.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
packages:
|
||||
- spatialdata-io~=0.5.1
|
||||
__merge__: [ ., /src/base/requirements/spatialdata.yaml ]
|
||||
3
src/base/requirements/spatialdata.yaml
Normal file
3
src/base/requirements/spatialdata.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
packages:
|
||||
- spatialdata~=0.6.1
|
||||
- pyarrow~=18.0.0
|
||||
4
src/base/requirements/squidpy.yaml
Normal file
4
src/base/requirements/squidpy.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
__merge__: [/src/base/requirements/spatialdata.yaml, .]
|
||||
packages:
|
||||
- squidpy~=1.7.0
|
||||
__merge__: [/src/base/requirements/scanpy.yaml, .]
|
||||
10
src/base/requirements/testworkflows_setup.yaml
Normal file
10
src/base/requirements/testworkflows_setup.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- git
|
||||
- type: python
|
||||
__merge__:
|
||||
- /src/base/requirements/anndata_mudata.yaml
|
||||
- /src/base/requirements/openpipeline_testutils.yaml
|
||||
- /src/base/requirements/viashpy.yaml
|
||||
2
src/base/requirements/viashpy.yaml
Normal file
2
src/base/requirements/viashpy.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
packages:
|
||||
- viashpy==0.9.0
|
||||
137
src/convert/from_cells2stats_to_h5mu/config.vsh.yaml
Normal file
137
src/convert/from_cells2stats_to_h5mu/config.vsh.yaml
Normal file
@@ -0,0 +1,137 @@
|
||||
name: from_cells2stats_to_h5mu
|
||||
namespace: convert
|
||||
scope: public
|
||||
description: |
|
||||
Convert spatial data resulting from Aviti Teton sequencers that have been processed by the Element Biosciences cells2stats workflow to H5MU format.
|
||||
|
||||
This component processes cells2stats count matrices to create a standardized H5MU file for downstream analysis.
|
||||
|
||||
The component reads:
|
||||
- Parquet file containing the count matrix and metadata
|
||||
- Panel.json with target and batch information
|
||||
|
||||
And outputs an H5MU file with:
|
||||
- Count data as the main .X matrix
|
||||
- Spatial coordinates in obsm
|
||||
- Cell Paint intensities in obsm (optional)
|
||||
- Nuclear count data as a layer (optional)
|
||||
- CellProfiler morphology metrics in obsm (optional)
|
||||
- Unassigned targets in obsm (optional)
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: --input
|
||||
type: file
|
||||
direction: input
|
||||
required: true
|
||||
description: |
|
||||
Path to the cells2stats output bundle.
|
||||
Expected folder structure (showing required files only):
|
||||
├── Cytoprofiling/
|
||||
│ └── Instrument/
|
||||
│ └── RawCellStats.parquet
|
||||
└── Panel.json
|
||||
example: path/to/aviti_output/
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: --output
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: Output H5MU file path.
|
||||
example: output.h5mu
|
||||
__merge__: [., /src/base/h5_compression_argument.yaml]
|
||||
|
||||
- name: Options
|
||||
arguments:
|
||||
- name: --modality
|
||||
type: string
|
||||
default: rna
|
||||
description: The modality to which the processed data will be written to in the H5MU file.
|
||||
- name: --obsm_coordinates
|
||||
type: string
|
||||
description: |
|
||||
Key name to store the spatial coordinates (in pixels) in obsm.
|
||||
If present, spatial coordinates in micrometers will be stored under {obsm_coordinates}_um.
|
||||
The column names will be stored in uns.
|
||||
default: spatial
|
||||
- name: --layer_nuclear_counts
|
||||
type: string
|
||||
description: |
|
||||
Name for nuclear counts layer. If specified, nuclear count data
|
||||
will be stored as a separate layer in the AnnData object.
|
||||
example: nuclear_counts
|
||||
- name: --obsm_cell_paint
|
||||
type: string
|
||||
description: |
|
||||
Key name for storing Cell Paint target intensities in obsm.
|
||||
If provided, Cell Paint target intensity data will be stored as a separate matrix in the obsm field.
|
||||
The column names will be stored in uns.
|
||||
example: cell_paint
|
||||
- name: --obsm_cell_paint_nuclear
|
||||
type: string
|
||||
description: |
|
||||
Key name for storing Nuclear Cell Paint target intensities in obsm.
|
||||
If provided, Nuclear Cell Paint target intensity data will be stored as a separate matrix in the obsm field.
|
||||
The column names will be stored in uns.
|
||||
example: cell_paint_nuclear
|
||||
- name: --obsm_cell_profiler
|
||||
type: string
|
||||
description: |
|
||||
Key name for storing CellProfiler morphology metrics in obsm.
|
||||
If provided, CellProfiler morphology metrics will be stored as a separate matrix in the obsm field.
|
||||
The column names will be stored in uns.
|
||||
example: cell_profiler
|
||||
- name: --obsm_unassigned_targets
|
||||
type: string
|
||||
description: |
|
||||
Key name for storing any unassigned target data in obsm.
|
||||
If provided, unassigned target data will be stored as a separate matrix in the obsm field.
|
||||
The column names will be stored in uns.
|
||||
example: cell_profiler
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
- path: /src/utils/unzip_archived_folder.py
|
||||
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/aviti/
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.13-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- build-essential
|
||||
- zlib1g-dev
|
||||
- git
|
||||
- type: python
|
||||
__merge__: [/src/base/requirements/anndata_mudata.yaml, .]
|
||||
packages: [ pyarrow ]
|
||||
# Windows explorer uses DEFLATE64 compression for large ZIP files,
|
||||
# which is not supported by most standard library zipfile module
|
||||
git: [ https://codeberg.org/miurahr/zipfile-inflate64.git@v0.2 ]
|
||||
test_setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- zip
|
||||
- type: python
|
||||
__merge__: [ /src/base/requirements/viashpy.yaml, .]
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, lowcpu]
|
||||
310
src/convert/from_cells2stats_to_h5mu/script.py
Normal file
310
src/convert/from_cells2stats_to_h5mu/script.py
Normal file
@@ -0,0 +1,310 @@
|
||||
import sys
|
||||
from pathlib import Path
|
||||
import scipy.sparse as sp
|
||||
import pandas as pd
|
||||
import mudata as mu
|
||||
import anndata as ad
|
||||
import re
|
||||
import json
|
||||
import zipfile_inflate64 as zipfile
|
||||
import os
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "./resources_test/aviti/aviti_teton_tiny_2",
|
||||
"modality": "rna",
|
||||
"output": "aviti_tiny_test.h5mu",
|
||||
"output_compression": "gzip",
|
||||
"layer_nuclear_counts": "nuclear_counts",
|
||||
"obsm_coordinates": "spatial",
|
||||
"obsm_cell_paint": "cell_paint",
|
||||
"obsm_cell_paint_nuclear": "cell_paint_nuclear",
|
||||
"obsm_cell_profiler": "cell_profiler",
|
||||
"obsm_unassigned_targets": "unassigned_targets",
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
from unzip_archived_folder import extract_selected_files_from_zip
|
||||
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
|
||||
def assert_matching_order(var_names, count_columns, split_pattern=None):
|
||||
for var, col in zip(var_names, count_columns):
|
||||
count_var = col if not split_pattern else col.replace(split_pattern, "")
|
||||
assert var == count_var, "Orders do not match"
|
||||
|
||||
|
||||
def categorize_columns(column_list, target_panel):
|
||||
# Extract imaging and barcoding information from Panel.json
|
||||
imaging_batches = [tube["BatchName"] for tube in target_panel["ImagingPrimerTubes"]]
|
||||
barcoding_batches = [
|
||||
tube["BatchName"] for tube in target_panel["BarcodingPrimerTubes"]
|
||||
]
|
||||
|
||||
# Extract target information
|
||||
cellpaint_targets = [target["Target"] for target in target_panel["ImagingTargets"]]
|
||||
barcoding_targets = [
|
||||
target["Target"] for target in target_panel["BarcodingTargets"]
|
||||
]
|
||||
|
||||
# METADATA (for .obs and .obsm)
|
||||
# Fixed columns
|
||||
columns_fixed = [
|
||||
"Area",
|
||||
"AreaUm",
|
||||
"Cell",
|
||||
"NuclearArea",
|
||||
"NuclearAreaUm",
|
||||
"Tile",
|
||||
"Well",
|
||||
"WellLabel",
|
||||
]
|
||||
obs_columns_fixed = list(set(columns_fixed) & set(column_list))
|
||||
|
||||
# Coordinate columns
|
||||
coordinate_columns = ["X", "Y", "Xum", "Yum"]
|
||||
obsm_coordinate_columns = list(set(coordinate_columns) & set(column_list))
|
||||
|
||||
# Cell Paint target intensity columns (format: {cell_paint_target.batch})
|
||||
cell_paint_columns = [
|
||||
col
|
||||
for col in column_list
|
||||
if any(
|
||||
col.startswith(f"{target}.") and col.endswith(f".{batch}")
|
||||
for target in cellpaint_targets
|
||||
for batch in imaging_batches
|
||||
)
|
||||
]
|
||||
|
||||
# Cell Paint nuclear target intensity columns (format: {cell_paint_target_Nuclear.batch})
|
||||
cell_paint_nuclear_columns = [
|
||||
col
|
||||
for col in column_list
|
||||
if any(
|
||||
col.startswith(f"{target}_Nuclear") and col.endswith(f".{batch}")
|
||||
for target in cellpaint_targets
|
||||
for batch in imaging_batches
|
||||
)
|
||||
]
|
||||
|
||||
# CellProfiler morphology metrics
|
||||
morphology_patterns = [
|
||||
r"^AreaShape_",
|
||||
r"^Granularity_",
|
||||
r"^Texture_",
|
||||
r"^Intensity_",
|
||||
r"^Location_",
|
||||
r"^RadialDistribution_",
|
||||
]
|
||||
cell_profiler_columns = [
|
||||
col
|
||||
for col in column_list
|
||||
for pattern in morphology_patterns
|
||||
if re.match(pattern, col)
|
||||
]
|
||||
|
||||
# COUNT MATRICES (for .X and layers)
|
||||
# Feature Count Matrix - barcoding targets (format: {target.batch})
|
||||
# Includes cellular and nuclear counts
|
||||
count_columns = [
|
||||
col
|
||||
for col in column_list
|
||||
if any(
|
||||
col.startswith(f"{target}.") and col.endswith(f".{batch}")
|
||||
for target in barcoding_targets
|
||||
for batch in barcoding_batches
|
||||
)
|
||||
]
|
||||
|
||||
# Nuclear Feature Count Matrix - barcoding targets (format: {target_Nuclear.batch})
|
||||
# Includes only nuclear counts
|
||||
nuclear_count_columns = [
|
||||
col
|
||||
for col in column_list
|
||||
if any(
|
||||
col.startswith(f"{target}_Nuclear") and col.endswith(f".{batch}")
|
||||
for target in barcoding_targets
|
||||
for batch in barcoding_batches
|
||||
)
|
||||
]
|
||||
|
||||
# Unassigned columns (format: {Unassigned_*.*})
|
||||
unassigned_columns = [col for col in column_list if col.startswith("Unassigned")]
|
||||
|
||||
# Make sure all columns have been categorized and have expected sizes
|
||||
assert len(count_columns) == len(nuclear_count_columns), (
|
||||
"Cellular and nuclear count columns do not match."
|
||||
)
|
||||
all_categorized_columns = (
|
||||
obs_columns_fixed
|
||||
+ obsm_coordinate_columns
|
||||
+ cell_paint_columns
|
||||
+ cell_paint_nuclear_columns
|
||||
+ cell_profiler_columns
|
||||
+ count_columns
|
||||
+ nuclear_count_columns
|
||||
+ unassigned_columns
|
||||
)
|
||||
assert len(column_list) == len(all_categorized_columns), (
|
||||
"Column categorization incomplete."
|
||||
)
|
||||
|
||||
return (
|
||||
obs_columns_fixed,
|
||||
obsm_coordinate_columns,
|
||||
cell_paint_columns,
|
||||
cell_paint_nuclear_columns,
|
||||
cell_profiler_columns,
|
||||
count_columns,
|
||||
nuclear_count_columns,
|
||||
unassigned_columns,
|
||||
)
|
||||
|
||||
|
||||
def retrieve_input_data(cells2stats_output_bundle):
|
||||
# Expected folder structure (showing only relevant files):
|
||||
# ├── Cytoprofiling/
|
||||
# │ └── Instrument/
|
||||
# │ └── RawCellStats.parquet
|
||||
# └── Panel.json
|
||||
|
||||
required_file_patterns = {
|
||||
"target_panel": "**/Panel.json",
|
||||
"count_matrix": "**/Cytoprofiling/Instrument/RawCellStats.parquet",
|
||||
}
|
||||
|
||||
if zipfile.is_zipfile(cells2stats_output_bundle):
|
||||
cells2stats_output_bundle = extract_selected_files_from_zip(
|
||||
cells2stats_output_bundle, members=required_file_patterns.values()
|
||||
)
|
||||
else:
|
||||
cells2stats_output_bundle = Path(cells2stats_output_bundle)
|
||||
|
||||
assert os.path.isdir(cells2stats_output_bundle), (
|
||||
"Input is expected to be a (compressed) directory."
|
||||
)
|
||||
|
||||
input_data = {}
|
||||
for key, pattern in required_file_patterns.items():
|
||||
file = list(cells2stats_output_bundle.glob(pattern))
|
||||
assert len(file) == 1, (
|
||||
f"Expected exactly one file matching pattern {pattern}, found {len(file)}."
|
||||
)
|
||||
input_data[key] = file[0]
|
||||
|
||||
return input_data
|
||||
|
||||
|
||||
def main():
|
||||
logger.info("Reading input data...")
|
||||
input_data = retrieve_input_data(par["input"])
|
||||
with open(input_data["target_panel"], "r") as f:
|
||||
target_panel = json.load(f)
|
||||
df = pd.read_parquet(input_data["count_matrix"], engine="pyarrow")
|
||||
df_columns = df.columns.tolist()
|
||||
|
||||
logger.info("Categorizing input data...")
|
||||
(
|
||||
obs_columns_fixed,
|
||||
coordinate_columns,
|
||||
cell_paint_columns,
|
||||
cell_paint_nuclear_columns,
|
||||
cell_profiler_columns,
|
||||
count_columns,
|
||||
nuclear_count_columns,
|
||||
unassigned_columns,
|
||||
) = categorize_columns(df_columns, target_panel)
|
||||
|
||||
df = df.set_index(df["Cell"].astype(str), drop=False)
|
||||
df.index_name = None
|
||||
|
||||
# var and obs names
|
||||
var_columns = list(count_columns)
|
||||
obs_columns = df["Cell"].astype(str).tolist()
|
||||
|
||||
# Count matrix
|
||||
logger.info("Creating count matrix...")
|
||||
count_df = df[count_columns].copy()
|
||||
count_matrix_sparse = sp.csr_matrix(count_df.values)
|
||||
|
||||
# Obs field
|
||||
logger.info(f"Creating obs field with columns {obs_columns_fixed}")
|
||||
obs_df = df[obs_columns_fixed].copy()
|
||||
|
||||
# Var field
|
||||
var_df = pd.DataFrame(index=pd.Index(var_columns, dtype=str))
|
||||
targets, batches = zip(*(c.rsplit(".", 1) for c in var_columns))
|
||||
var_df["target"] = targets
|
||||
var_df["batch"] = batches
|
||||
|
||||
# Create AnnData object
|
||||
logger.info("Creating AnnData object...")
|
||||
adata = ad.AnnData(
|
||||
X=count_matrix_sparse,
|
||||
obs=obs_df,
|
||||
var=var_df,
|
||||
)
|
||||
adata.obs_names = pd.Index(obs_columns, dtype=str)
|
||||
adata.var_names = pd.Index(var_columns, dtype=str)
|
||||
|
||||
# Spatial coordinates
|
||||
coordinate_sets = {
|
||||
par["obsm_coordinates"]: ["X", "Y"],
|
||||
f"{par['obsm_coordinates']}_um": ["Xum", "Yum"],
|
||||
}
|
||||
|
||||
for obsm_key, coord_cols in coordinate_sets.items():
|
||||
if all(col in coordinate_columns for col in coord_cols):
|
||||
coordinates = df[coord_cols].copy()
|
||||
adata.obsm[obsm_key] = coordinates.values
|
||||
adata.uns[obsm_key] = coord_cols
|
||||
logger.info(f"Added {obsm_key} coordinates ({coord_cols}) to obsm")
|
||||
else:
|
||||
missing_cols = [col for col in coord_cols if col not in coordinate_columns]
|
||||
logger.warning(
|
||||
f"Skipping {obsm_key}: missing coordinate columns {missing_cols}"
|
||||
)
|
||||
|
||||
# Add (optional) .obsm fields
|
||||
if par["obsm_cell_paint"]:
|
||||
logger.info(f"Adding {par['obsm_cell_paint']} to obsm")
|
||||
adata.obsm[par["obsm_cell_paint"]] = df[cell_paint_columns].copy()
|
||||
adata.uns[par["obsm_cell_paint"]] = cell_paint_columns
|
||||
if par["obsm_cell_paint_nuclear"]:
|
||||
logger.info(f"Adding {par['obsm_cell_paint_nuclear']} to obsm")
|
||||
adata.obsm[par["obsm_cell_paint_nuclear"]] = df[
|
||||
cell_paint_nuclear_columns
|
||||
].copy()
|
||||
adata.uns[par["obsm_cell_paint_nuclear"]] = cell_paint_nuclear_columns
|
||||
if par["obsm_cell_profiler"]:
|
||||
logger.info(f"Adding {par['obsm_cell_profiler']} to obsm")
|
||||
adata.obsm[par["obsm_cell_profiler"]] = df[cell_profiler_columns].copy()
|
||||
adata.uns[par["obsm_cell_profiler"]] = cell_profiler_columns
|
||||
if par["obsm_unassigned_targets"]:
|
||||
logger.info(f"Adding {par['obsm_unassigned_targets']} to obsm")
|
||||
adata.obsm[par["obsm_unassigned_targets"]] = df[unassigned_columns].copy()
|
||||
adata.uns[par["obsm_unassigned_targets"]] = unassigned_columns
|
||||
|
||||
# Add (optional) nuclear count layer
|
||||
if par["layer_nuclear_counts"]:
|
||||
assert_matching_order(
|
||||
var_columns, nuclear_count_columns, split_pattern="_Nuclear"
|
||||
)
|
||||
logger.info(f"Adding {par['layer_nuclear_counts']} to layers")
|
||||
nuclear_count_df = df[nuclear_count_columns].copy()
|
||||
nuclear_count_matrix_sparse = sp.csr_matrix(nuclear_count_df.values)
|
||||
adata.layers[par["layer_nuclear_counts"]] = nuclear_count_matrix_sparse
|
||||
|
||||
# Write output MuData
|
||||
logger.info("Writing MuData object...")
|
||||
mdata = mu.MuData({par["modality"]: adata})
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
158
src/convert/from_cells2stats_to_h5mu/test.py
Normal file
158
src/convert/from_cells2stats_to_h5mu/test.py
Normal file
@@ -0,0 +1,158 @@
|
||||
import pytest
|
||||
import sys
|
||||
import mudata as mu
|
||||
import subprocess
|
||||
|
||||
## VIASH START
|
||||
meta = {
|
||||
"executable": "./target/executable/convert/from_cells2stats_to_h5mu/from_cells2stats_to_h5mu",
|
||||
"resources_dir": "resources_test/aviti/",
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
input = f"{meta['resources_dir']}/aviti/teton_cells2stats_tiny/"
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output = tmp_path / "aviti.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
["--input", input, "--output", str(output), "--output_compression", "gzip"]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert adata.X.dtype.kind == "f"
|
||||
expected_obs_keys = [
|
||||
"AreaUm",
|
||||
"Area",
|
||||
"Tile",
|
||||
"WellLabel",
|
||||
"Well",
|
||||
"Cell",
|
||||
"NuclearAreaUm",
|
||||
"NuclearArea",
|
||||
]
|
||||
assert all([obs in expected_obs_keys for obs in adata.obs.columns])
|
||||
obs_counts = ["Area", "Cell", "NuclearArea"]
|
||||
assert all([adata.obs[obs].dtype.kind == "u" for obs in obs_counts])
|
||||
obs_areas = ["AreaUm", "NuclearAreaUm"]
|
||||
assert all([adata.obs[obs].dtype.kind == "f" for obs in obs_areas])
|
||||
obs_categories = ["Tile", "WellLabel", "Well"]
|
||||
assert all([adata.obs[obs].dtype.kind == "O" for obs in obs_categories])
|
||||
|
||||
expected_obsm_keys = ["spatial", "spatial_um"]
|
||||
assert list(adata.obsm.keys()) == expected_obsm_keys
|
||||
assert list(adata.uns.keys()) == expected_obsm_keys
|
||||
assert all(adata.obsm[obsm].dtype.kind == "f" for obsm in expected_obsm_keys)
|
||||
|
||||
|
||||
def test_compressed_input(run_component, tmp_path):
|
||||
output = tmp_path / "aviti.h5mu"
|
||||
zipped_input = tmp_path / "aviti.zip"
|
||||
|
||||
subprocess.run(
|
||||
["zip", "-r", str(zipped_input), "aviti"], cwd=meta["resources_dir"], check=True
|
||||
)
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
zipped_input,
|
||||
"--output",
|
||||
str(output),
|
||||
"--output_compression",
|
||||
"gzip",
|
||||
]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert adata.X.dtype.kind == "f"
|
||||
expected_obs_keys = [
|
||||
"AreaUm",
|
||||
"Area",
|
||||
"Tile",
|
||||
"WellLabel",
|
||||
"Well",
|
||||
"Cell",
|
||||
"NuclearAreaUm",
|
||||
"NuclearArea",
|
||||
]
|
||||
assert all([obs in expected_obs_keys for obs in adata.obs.columns])
|
||||
obs_counts = ["Area", "Cell", "NuclearArea"]
|
||||
assert all([adata.obs[obs].dtype.kind == "u" for obs in obs_counts])
|
||||
obs_areas = ["AreaUm", "NuclearAreaUm"]
|
||||
assert all([adata.obs[obs].dtype.kind == "f" for obs in obs_areas])
|
||||
obs_categories = ["Tile", "WellLabel", "Well"]
|
||||
assert all([adata.obs[obs].dtype.kind == "O" for obs in obs_categories])
|
||||
|
||||
expected_obsm_keys = ["spatial", "spatial_um"]
|
||||
assert list(adata.obsm.keys()) == expected_obsm_keys
|
||||
assert list(adata.uns.keys()) == expected_obsm_keys
|
||||
assert all(adata.obsm[obsm].dtype.kind == "f" for obsm in expected_obsm_keys)
|
||||
|
||||
|
||||
def test_extended_parameters(run_component, tmp_path):
|
||||
output = tmp_path / "aviti_ext.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
input,
|
||||
"--modality",
|
||||
"mod1",
|
||||
"--output",
|
||||
str(output),
|
||||
"--layer_nuclear_counts",
|
||||
"nuclear_counts",
|
||||
"--obsm_coordinates",
|
||||
"coords",
|
||||
"--obsm_cell_paint",
|
||||
"cell_paint",
|
||||
"--obsm_cell_paint_nuclear",
|
||||
"cell_paint_nuclear",
|
||||
"--obsm_cell_profiler",
|
||||
"cell_profiler",
|
||||
"--obsm_unassigned_targets",
|
||||
"unassigned_targets",
|
||||
"--output_compression",
|
||||
"gzip",
|
||||
]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["mod1"]
|
||||
adata = mdata.mod["mod1"]
|
||||
|
||||
assert list(adata.layers) == ["nuclear_counts"]
|
||||
assert adata.layers["nuclear_counts"].dtype.kind == "f"
|
||||
|
||||
expected_obsm_keys = [
|
||||
"cell_paint",
|
||||
"cell_paint_nuclear",
|
||||
"cell_profiler",
|
||||
"coords",
|
||||
"coords_um",
|
||||
"unassigned_targets",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == expected_obsm_keys
|
||||
assert list(adata.obsm.keys()) == expected_obsm_keys
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
75
src/convert/from_cosmx_to_h5mu/config.vsh.yaml
Normal file
75
src/convert/from_cosmx_to_h5mu/config.vsh.yaml
Normal file
@@ -0,0 +1,75 @@
|
||||
name: "from_cosmx_to_h5mu"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Converts the output from NanoString experiment into a MuData objcet.
|
||||
- `<dataset_id>_exprMat_file.csv`: File containing the counts.
|
||||
- `<dataset_id>`_metadata_file: File containing the spatial coordinates and additional cell-level metadata.
|
||||
- `<dataset_id>_fov_file.csv`: File containing the coordinates of all the fields of view.
|
||||
In addition to reading the regular Nanostring output, it loads CellComposite and CellLabels directories, if present,
|
||||
containing the images.
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input folder. Must contain the output from a NanoString CosMx run.
|
||||
example: cosmx_data
|
||||
direction: input
|
||||
required: true
|
||||
- name: "--modality"
|
||||
type: string
|
||||
default: rna
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: The output h5mu file.
|
||||
example: "output.h5mu"
|
||||
direction: output
|
||||
- name: "--output_compression"
|
||||
type: string
|
||||
choices: ["gzip", "lzf"]
|
||||
required: false
|
||||
example: "gzip"
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
- path: /src/utils/unzip_archived_folder.py
|
||||
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/cosmx/Lung5_Rep2_tiny/
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- build-essential
|
||||
- zlib1g-dev
|
||||
- git
|
||||
- type: python
|
||||
__merge__: [/src/base/requirements/anndata_mudata.yaml, /src/base/requirements/squidpy.yaml, .]
|
||||
packages: [ pyarrow ]
|
||||
# Windows explorer uses DEFLATE64 compression for large ZIP files,
|
||||
# which is not supported by most standard library zipfile module
|
||||
git: [ https://codeberg.org/miurahr/zipfile-inflate64.git@v0.2 ]
|
||||
test_setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- zip
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, . ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
98
src/convert/from_cosmx_to_h5mu/script.py
Normal file
98
src/convert/from_cosmx_to_h5mu/script.py
Normal file
@@ -0,0 +1,98 @@
|
||||
import sys
|
||||
import os
|
||||
import squidpy as sq
|
||||
import mudata as mu
|
||||
import zipfile_inflate64 as zipfile
|
||||
from pathlib import Path
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "./resources_test/cosmx/Lung5_Rep2_tiny",
|
||||
"output": "./resources_test/cosmx/Lung5_Rep2_tiny.h5mu",
|
||||
"modality": "rna",
|
||||
"output_compression": None,
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
from unzip_archived_folder import extract_selected_files_from_zip
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
|
||||
def retrieve_input_data(cosmx_output_bundle):
|
||||
# Expected folder structure (showing only relevant files):
|
||||
# ├── *_exprMat_file.csv
|
||||
# ├── *_fov_positions_file.csv
|
||||
# └── *_metadata_file.csv
|
||||
|
||||
required_file_patterns = {
|
||||
"counts_file": "**/*exprMat_file.csv",
|
||||
"fov_file": "**/*fov_positions_file.csv",
|
||||
"meta_file": "**/*metadata_file.csv",
|
||||
}
|
||||
if zipfile.is_zipfile(cosmx_output_bundle):
|
||||
cosmx_output_bundle = extract_selected_files_from_zip(
|
||||
cosmx_output_bundle, members=required_file_patterns.values()
|
||||
)
|
||||
else:
|
||||
cosmx_output_bundle = Path(cosmx_output_bundle)
|
||||
|
||||
assert os.path.isdir(cosmx_output_bundle), (
|
||||
"Input is expected to be a (compressed) directory."
|
||||
)
|
||||
|
||||
input_data = {}
|
||||
for key, pattern in required_file_patterns.items():
|
||||
file = list(cosmx_output_bundle.glob(pattern))
|
||||
assert len(file) == 1, f"Expected one file for {key}, found {len(file)}."
|
||||
input_data[key] = file[0]
|
||||
|
||||
parent_dirs = {file.parent for file in input_data.values()}
|
||||
assert len(parent_dirs) == 1, (
|
||||
f"Input files are expected to be in the same directory."
|
||||
f"Found files in {len(parent_dirs)} different directories: {parent_dirs}"
|
||||
)
|
||||
|
||||
return input_data
|
||||
|
||||
|
||||
def main():
|
||||
logger.info("Reading in CosMx data...")
|
||||
input_files = retrieve_input_data(par["input"])
|
||||
|
||||
try:
|
||||
adata = sq.read.nanostring(
|
||||
path=input_files["counts_file"].parent,
|
||||
counts_file=input_files["counts_file"].name,
|
||||
meta_file=input_files["meta_file"].name,
|
||||
fov_file=input_files["fov_file"].name,
|
||||
)
|
||||
except ValueError as e:
|
||||
if "Index fov invalid" in str(e):
|
||||
# CosMx experiments processed with AtoMx SIP <v1.3.2 has 'FOV' index column in fov_file
|
||||
# see https://nanostring-biostats.github.io/CosMx-Analysis-Scratch-Space/posts/flat-file-exports/flat-files-compare.html
|
||||
import pandas as pd
|
||||
|
||||
df = pd.read_csv(input_files["fov_file"])
|
||||
df.rename(columns={"FOV": "fov"}, inplace=True)
|
||||
df.to_csv(input_files["fov_file"], index=False)
|
||||
|
||||
adata = sq.read.nanostring(
|
||||
path=input_files["counts_file"].parent,
|
||||
counts_file=input_files["counts_file"].name,
|
||||
meta_file=input_files["meta_file"].name,
|
||||
fov_file=input_files["fov_file"].name,
|
||||
)
|
||||
else:
|
||||
raise e
|
||||
|
||||
logger.info("Writing output MuData object...")
|
||||
mdata = mu.MuData({par["modality"]: adata})
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
130
src/convert/from_cosmx_to_h5mu/test.py
Normal file
130
src/convert/from_cosmx_to_h5mu/test.py
Normal file
@@ -0,0 +1,130 @@
|
||||
import pytest
|
||||
import sys
|
||||
import mudata as mu
|
||||
import subprocess
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output = tmp_path / "cosmx_tiny.h5mu"
|
||||
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
meta["resources_dir"] + "/Lung5_Rep2_tiny",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.obs.keys()) == [
|
||||
"fov",
|
||||
"Area",
|
||||
"AspectRatio",
|
||||
"CenterX_global_px",
|
||||
"CenterY_global_px",
|
||||
"Width",
|
||||
"Height",
|
||||
"Mean.MembraneStain",
|
||||
"Max.MembraneStain",
|
||||
"Mean.PanCK",
|
||||
"Max.PanCK",
|
||||
"Mean.CD45",
|
||||
"Max.CD45",
|
||||
"Mean.CD3",
|
||||
"Max.CD3",
|
||||
"Mean.DAPI",
|
||||
"Max.DAPI",
|
||||
"cell_ID",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == ["spatial"]
|
||||
assert list(adata.obsm.keys()) == ["spatial", "spatial_fov"]
|
||||
|
||||
assert adata.obsm["spatial"].dtype == "int"
|
||||
assert adata.obsm["spatial_fov"].dtype == "float"
|
||||
|
||||
|
||||
def test_compressed_input(run_component, tmp_path):
|
||||
output = tmp_path / "cosmx_tiny.h5mu"
|
||||
zipped_input = tmp_path / "Lung5_Rep2_tiny.zip"
|
||||
|
||||
subprocess.run(
|
||||
["zip", "-r", str(zipped_input), "Lung5_Rep2_tiny"],
|
||||
cwd=meta["resources_dir"],
|
||||
check=True,
|
||||
)
|
||||
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
zipped_input,
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.obs.keys()) == [
|
||||
"fov",
|
||||
"Area",
|
||||
"AspectRatio",
|
||||
"CenterX_global_px",
|
||||
"CenterY_global_px",
|
||||
"Width",
|
||||
"Height",
|
||||
"Mean.MembraneStain",
|
||||
"Max.MembraneStain",
|
||||
"Mean.PanCK",
|
||||
"Max.PanCK",
|
||||
"Mean.CD45",
|
||||
"Max.CD45",
|
||||
"Mean.CD3",
|
||||
"Max.CD3",
|
||||
"Mean.DAPI",
|
||||
"Max.DAPI",
|
||||
"cell_ID",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == ["spatial"]
|
||||
assert list(adata.obsm.keys()) == ["spatial", "spatial_fov"]
|
||||
|
||||
assert adata.obsm["spatial"].dtype == "int"
|
||||
assert adata.obsm["spatial_fov"].dtype == "float"
|
||||
|
||||
|
||||
def test_legacy_atomx_input(run_component, tmp_path):
|
||||
output = tmp_path / "cosmx_tiny.h5mu"
|
||||
|
||||
# mimic legacy AtoMx SIP output structure
|
||||
fov_file = (
|
||||
meta["resources_dir"] + "/Lung5_Rep2_tiny/Lung5_Rep2_fov_positions_file.csv"
|
||||
)
|
||||
df = pd.read_csv(fov_file)
|
||||
df.rename(columns={"fov": "FOV"}, inplace=True)
|
||||
df.to_csv(fov_file, index=False)
|
||||
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
meta["resources_dir"] + "/Lung5_Rep2_tiny",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
83
src/convert/from_cosmx_to_spatialexperiment/config.vsh.yaml
Normal file
83
src/convert/from_cosmx_to_spatialexperiment/config.vsh.yaml
Normal file
@@ -0,0 +1,83 @@
|
||||
name: "from_cosmx_to_spatialexperiment"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Creates a SpatialExperiment object from the downloaded unzipped CosMx directory for Nanostring
|
||||
CosMx spatial gene expression data, and saves it as a SpatialExperiment object.
|
||||
The constructor assumes the downloaded unzipped CosMx Folder has the following structure:
|
||||
|
||||
Mandatory files
|
||||
· | — *_exprMat_file.csv
|
||||
· | — *_metadata_file.csv
|
||||
Optional files, by default added to the metadata() as a list of paths (will be converted to parquet):
|
||||
· | —*_fov_positions_file.csv
|
||||
· | — *_tx_file.csv
|
||||
· | — *_polygons.csv
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input CosMx directory
|
||||
direction: input
|
||||
required: true
|
||||
example: path/to/cosmx_bundle
|
||||
- name: "--add_tx_path"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to add parquet paths to the metadata.
|
||||
If True, `*_tx_file.csv` file will be converted to .parquet and added to the metadata.
|
||||
- name: "--add_polygon_path"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to add polygon path to the metadata.
|
||||
If True, `*_polygons.csv` file will be converted to .parquet and be added to the metadata.
|
||||
- name: "--add_fov_positions"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to add fov positions to the metadata.
|
||||
If True, `fov_positions_file.csv` will be added to the metadata.
|
||||
- name: "--alternative_experiment_features"
|
||||
type: string
|
||||
multiple: true
|
||||
description: Feature names containing these strings will be moved to altExps(sxe) slots as separate SpatialExperiment objects.
|
||||
default: [NegPrb, Negative, SystemControl, FalseCode]
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: Output SpatialExperiment file
|
||||
direction: output
|
||||
required: true
|
||||
example: output.rds
|
||||
resources:
|
||||
- type: r_script
|
||||
path: script.R
|
||||
- path: /src/utils/unzip_archived_folder.R
|
||||
test_resources:
|
||||
- type: r_script
|
||||
path: test.R
|
||||
- path: /resources_test/cosmx/Lung5_Rep2_tiny
|
||||
engines:
|
||||
- type: docker
|
||||
image: rocker/r2u:24.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- libhdf5-dev
|
||||
- libgeos-dev
|
||||
- type: r
|
||||
bioc: [ SpatialExperimentIO ]
|
||||
test_setup:
|
||||
- type: r
|
||||
cran: [ testthat ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
62
src/convert/from_cosmx_to_spatialexperiment/script.R
Normal file
62
src/convert/from_cosmx_to_spatialexperiment/script.R
Normal file
@@ -0,0 +1,62 @@
|
||||
library(SpatialExperimentIO)
|
||||
|
||||
### VIASH START
|
||||
par <- list(
|
||||
input = "resources_test/cosmx/test2.zip",
|
||||
add_tx_path = TRUE,
|
||||
add_polygon_path = FALSE,
|
||||
add_fov_positions = TRUE,
|
||||
alternative_experiment_features = c(
|
||||
"NegPrb", "Negative", "SystemControl", "FalseCode"
|
||||
),
|
||||
output = "spe_cosmx_test.rds"
|
||||
)
|
||||
meta <- list(
|
||||
resources_dir = "src/utils/"
|
||||
)
|
||||
### VIASH END
|
||||
|
||||
source(paste0(meta$resources_dir, "/unzip_archived_folder.R"))
|
||||
|
||||
cat("Reading input data...")
|
||||
if (tools::file_ext(par$input) == "zip") {
|
||||
expected_file_patterns <- c(
|
||||
"*.csv",
|
||||
"*.parquet"
|
||||
)
|
||||
tmp_dir <- extract_selected_files(
|
||||
par$input,
|
||||
members = expected_file_patterns
|
||||
)
|
||||
cosmx_output_bundle <- file.path(
|
||||
tmp_dir,
|
||||
tools::file_path_sans_ext(basename(par$input))
|
||||
)
|
||||
} else {
|
||||
cosmx_output_bundle <- par$input
|
||||
}
|
||||
|
||||
cat("Setting parameters...")
|
||||
if (par$add_polygon_path == FALSE && par$add_tx_path == FALSE) {
|
||||
add_parquet_paths <- FALSE
|
||||
} else {
|
||||
add_parquet_paths <- TRUE
|
||||
}
|
||||
|
||||
cat("Converting to SpatialExperiment...")
|
||||
spe <- readCosmxSXE(
|
||||
dirName = cosmx_output_bundle,
|
||||
returnType = "SPE",
|
||||
countMatPattern = "exprMat_file.csv",
|
||||
metaDataPattern = "metadata_file.csv",
|
||||
coordNames = c("CenterX_global_px", "CenterY_global_px"),
|
||||
addFovPos = par$add_fov_positions,
|
||||
fovPosPattern = "fov_positions_file.csv",
|
||||
addParquetPaths = add_parquet_paths,
|
||||
addPolygon = par$add_polygon_path,
|
||||
addTx = par$add_tx_path,
|
||||
altExps = par$alternative_experiment_features
|
||||
)
|
||||
|
||||
cat("Saving output...")
|
||||
saveRDS(spe, file = par$output)
|
||||
182
src/convert/from_cosmx_to_spatialexperiment/test.R
Normal file
182
src/convert/from_cosmx_to_spatialexperiment/test.R
Normal file
@@ -0,0 +1,182 @@
|
||||
library(testthat, warn.conflicts = FALSE)
|
||||
library(SpatialExperimentIO)
|
||||
library(SpatialExperiment)
|
||||
|
||||
## VIASH START
|
||||
meta <- list(
|
||||
executable = "./from_cosmx_to_spatialexperiment",
|
||||
resources_dir = "resources_test/cosmx/",
|
||||
name = "from_cosmx_to_spatialexperiment"
|
||||
)
|
||||
## VIASH END
|
||||
|
||||
cat("> Checking simple execution\n")
|
||||
|
||||
spe <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/Lung5_Rep2_tiny"
|
||||
)
|
||||
out_rds <- "output.rds"
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", spe,
|
||||
"--add_tx_path", TRUE,
|
||||
"--add_polygon_path", FALSE,
|
||||
"--output", out_rds
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out$status, 0)
|
||||
expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj <- readRDS(file = out_rds)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(
|
||||
spatialCoordsNames(obj),
|
||||
c("CenterX_global_px", "CenterY_global_px")
|
||||
)
|
||||
# Alternative experiments
|
||||
expect_equal(altExpNames(obj), c("NegPrb"))
|
||||
# Metadata components
|
||||
expect_named(
|
||||
metadata(obj),
|
||||
c("fov_positions", "transcripts"),
|
||||
ignore.order = TRUE
|
||||
)
|
||||
# Parquet paths
|
||||
expect_true(grepl("\\.parquet$", metadata(obj)[["transcripts"]]))
|
||||
# Dimensions
|
||||
input <- readCosmxSXE(
|
||||
dirName = spe,
|
||||
addParquetPaths = FALSE,
|
||||
returnType = "SPE"
|
||||
)
|
||||
|
||||
dim_rds <- dim(obj)
|
||||
dim_input <- dim(input)
|
||||
|
||||
expect_equal(dim_rds, dim_input)
|
||||
|
||||
|
||||
cat("> Checking execution with compressed input\n")
|
||||
|
||||
spe <- paste0(meta[["resources_dir"]], "/Lung5_Rep2_tiny")
|
||||
out_rds <- "output.rds"
|
||||
|
||||
create_folder_archive <- function(
|
||||
folder_path,
|
||||
archive = "Lung5_Rep2_tiny.zip"
|
||||
) {
|
||||
old_wd <- getwd()
|
||||
on.exit(setwd(old_wd))
|
||||
setwd(meta$resources_dir)
|
||||
system2("zip", c("-r", archive, "Lung5_Rep2_tiny"))
|
||||
paste0(meta$resources_dir, "/", archive)
|
||||
}
|
||||
|
||||
zipped_spe <- create_folder_archive(spe)
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", zipped_spe,
|
||||
"--add_tx_path", TRUE,
|
||||
"--add_polygon_path", FALSE,
|
||||
"--output", out_rds
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out$status, 0)
|
||||
expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj <- readRDS(file = out_rds)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(
|
||||
spatialCoordsNames(obj),
|
||||
c("CenterX_global_px", "CenterY_global_px")
|
||||
)
|
||||
# Alternative experiments
|
||||
expect_equal(altExpNames(obj), c("NegPrb"))
|
||||
# Metadata components
|
||||
expect_named(
|
||||
metadata(obj),
|
||||
c("fov_positions", "transcripts"),
|
||||
ignore.order = TRUE
|
||||
)
|
||||
# Parquet paths
|
||||
expect_true(grepl("\\.parquet$", metadata(obj)[["transcripts"]]))
|
||||
# Dimensions
|
||||
input <- readCosmxSXE(
|
||||
dirName = spe,
|
||||
addParquetPaths = FALSE,
|
||||
returnType = "SPE"
|
||||
)
|
||||
|
||||
dim_rds <- dim(obj)
|
||||
dim_input <- dim(input)
|
||||
|
||||
expect_equal(dim_rds, dim_input)
|
||||
|
||||
|
||||
cat("> Checking parameter functionality\n")
|
||||
|
||||
out_rds_ext <- "output_ext.rds"
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out_ext <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", spe,
|
||||
"--add_fov_positions", FALSE,
|
||||
"--add_tx_path", FALSE,
|
||||
"--add_polygon_path", FALSE,
|
||||
"--alternative_experiment_features", c("Negative"),
|
||||
"--output", out_rds_ext
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out_ext$status, 0)
|
||||
expect_true(file.exists(out_rds_ext))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj_ext <- readRDS(file = out_rds_ext)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj_ext, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj_ext, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(
|
||||
spatialCoordsNames(obj_ext),
|
||||
c("CenterX_global_px", "CenterY_global_px")
|
||||
)
|
||||
# Alternative experiments
|
||||
expect_length(altExpNames(obj_ext), 0)
|
||||
# Metadata components
|
||||
expect_length(metadata(obj_ext), 0)
|
||||
|
||||
dim_rds_ext <- dim(obj_ext)
|
||||
expect_true(identical(dim_rds_ext[2], dim_input[2]))
|
||||
expect_false(identical(dim_rds_ext[1], dim_input[1]))
|
||||
77
src/convert/from_h5mu_to_spatialexperiment/config.vsh.yaml
Normal file
77
src/convert/from_h5mu_to_spatialexperiment/config.vsh.yaml
Normal file
@@ -0,0 +1,77 @@
|
||||
name: "from_h5mu_to_spatialexperiment"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Converts an h5mu file into a SpatialExperiment object.
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ author ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input h5mu file
|
||||
direction: input
|
||||
required: true
|
||||
example: input.h5mu
|
||||
- name: "--modality"
|
||||
type: string
|
||||
required: true
|
||||
default: "rna"
|
||||
description: Name of the modality to be converted.
|
||||
- name: "--obsm_spatial_coordinates"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
Key in the .obsm field that contains the spatial coordinates.
|
||||
Will be mapped to spatialCoords in the SpatialExperiment object.
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: Output SpatialExperiment file
|
||||
direction: output
|
||||
required: true
|
||||
example: output.rds
|
||||
resources:
|
||||
- type: r_script
|
||||
path: script.R
|
||||
test_resources:
|
||||
- type: r_script
|
||||
path: test.R
|
||||
- path: /resources_test/aviti/aviti_teton_tiny.h5mu
|
||||
- path: /resources_test/cosmx/Lung5_Rep2_tiny.h5mu
|
||||
- path: /resources_test/xenium/xenium_tiny.h5mu
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: rocker/r2u:24.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- libhdf5-dev
|
||||
- libgeos-dev
|
||||
- type: r
|
||||
cran: [ hdf5r, SpatialExperiment ]
|
||||
github: scverse/anndataR@36f3caad9a7f360165c1510bbe0c62657580415a
|
||||
test_setup:
|
||||
- type: docker
|
||||
env:
|
||||
- RETICULATE_PYTHON=/usr/bin/python
|
||||
- PIP_BREAK_SYSTEM_PACKAGES=1
|
||||
- type: apt
|
||||
packages:
|
||||
- python3
|
||||
- python3-pip
|
||||
- python3-dev
|
||||
- python-is-python3
|
||||
- type: r
|
||||
cran: [ reticulate, testthat ]
|
||||
- type: python
|
||||
user: true
|
||||
__merge__: /src/base/requirements/anndata_mudata.yaml
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
113
src/convert/from_h5mu_to_spatialexperiment/script.R
Normal file
113
src/convert/from_h5mu_to_spatialexperiment/script.R
Normal file
@@ -0,0 +1,113 @@
|
||||
library(SpatialExperiment)
|
||||
library(SingleCellExperiment)
|
||||
library(hdf5r)
|
||||
library(Matrix)
|
||||
library(hdf5r)
|
||||
|
||||
## VIASH START
|
||||
par <- list(
|
||||
input = "resources_test/xenium/xenium_tiny.h5mu",
|
||||
output = "xenium_test.rds",
|
||||
modality = "rna",
|
||||
obsm_spatial_coordinates = "spatial"
|
||||
)
|
||||
## VIASH END
|
||||
|
||||
|
||||
h5mu_to_h5ad <- function(h5mu_path, modality_name) {
|
||||
tmp_path <- tempfile(fileext = ".h5ad")
|
||||
mod_location <- paste("mod", modality_name, sep = "/")
|
||||
h5src <- hdf5r::H5File$new(h5mu_path, "r")
|
||||
h5dest <- hdf5r::H5File$new(tmp_path, "w")
|
||||
# Copy over the child objects and the child attributes from root
|
||||
# Root cannot be copied directly because it always exists and
|
||||
# copying does not allow overwriting.
|
||||
children <- hdf5r::list.objects(h5src,
|
||||
path = mod_location,
|
||||
full.names = FALSE, recursive = FALSE
|
||||
)
|
||||
for (child in children) {
|
||||
h5dest$obj_copy_from(
|
||||
h5src, paste(mod_location, child, sep = "/"),
|
||||
paste0("/", child)
|
||||
)
|
||||
}
|
||||
# Also copy the root attributes
|
||||
root_attrs <- hdf5r::h5attr_names(x = h5src)
|
||||
for (attr in root_attrs) {
|
||||
h5a <- h5src$attr_open(attr_name = attr)
|
||||
robj <- h5a$read()
|
||||
h5dest$create_attr_by_name(
|
||||
attr_name = attr,
|
||||
obj_name = ".",
|
||||
robj = robj,
|
||||
space = h5a$get_space(),
|
||||
dtype = h5a$get_type()
|
||||
)
|
||||
}
|
||||
h5src$close()
|
||||
h5dest$close()
|
||||
|
||||
tmp_path
|
||||
}
|
||||
|
||||
read_spatial_coordinates <- function(sce, spatial_coordinates_name) {
|
||||
# Check if the specified spatial coordinates exist in reducedDims
|
||||
reduced_dims <- SingleCellExperiment::reducedDims(sce)
|
||||
if (par$obsm_spatial_coordinates %in% names(reduced_dims)) {
|
||||
spatial_coords <- reduced_dims[[par$obsm_spatial_coordinates]]
|
||||
if (ncol(spatial_coords) != 2) {
|
||||
stop(
|
||||
"Spatial coordinates must have 2 columns, but found ",
|
||||
ncol(spatial_coords), " columns"
|
||||
)
|
||||
}
|
||||
# Set proper column names for spatial coordinates
|
||||
colnames(spatial_coords) <- c("x", "y")
|
||||
} else {
|
||||
warning(
|
||||
"Spatial coordinates '", par$obsm_spatial_coordinates,
|
||||
"' not found in reducedDims. Available dimensions: ",
|
||||
paste(names(reduced_dims), collapse = ", ")
|
||||
)
|
||||
spatial_coords <- NULL
|
||||
}
|
||||
spatial_coords
|
||||
}
|
||||
|
||||
main <- function() {
|
||||
# Convert to AnnData
|
||||
cat("Converting H5MU file to H5AD...\n")
|
||||
h5file <- h5mu_to_h5ad(par$input, par$modality)
|
||||
|
||||
# Convert to SpatialExperiment
|
||||
cat("Converting to SingleCellExperiment...\n")
|
||||
sce <- anndataR::read_h5ad(h5file, as = "SingleCellExperiment")
|
||||
|
||||
# Extract spatial coordinates if specified
|
||||
if (
|
||||
!is.null(par$obsm_spatial_coordinates) &&
|
||||
length(par$obsm_spatial_coordinates) > 0
|
||||
) {
|
||||
cat("Reading in spatial coordinates...\n")
|
||||
spatial_coords <- read_spatial_coordinates(
|
||||
sce, par$obsm_spatial_coordinates
|
||||
)
|
||||
SingleCellExperiment::reducedDims(sce)[[
|
||||
par$obsm_spatial_coordinates
|
||||
]] <- NULL
|
||||
} else {
|
||||
spatial_coords <- NULL
|
||||
}
|
||||
|
||||
# Converting SingleCellExperiment to SpatialExperiment
|
||||
cat("Converting to SpatialExperiment...\n")
|
||||
spe <- as(sce, "SpatialExperiment")
|
||||
SpatialExperiment::spatialCoords(spe) <- spatial_coords
|
||||
|
||||
# Saving SpatialExperiment object
|
||||
cat("Saving SpatialExperiment object to:", par$output, "\n")
|
||||
saveRDS(spe, file = par$output, compress = FALSE)
|
||||
}
|
||||
|
||||
main()
|
||||
475
src/convert/from_h5mu_to_spatialexperiment/test.R
Normal file
475
src/convert/from_h5mu_to_spatialexperiment/test.R
Normal file
@@ -0,0 +1,475 @@
|
||||
library(testthat)
|
||||
library(SpatialExperiment)
|
||||
library(SingleCellExperiment)
|
||||
library(hdf5r)
|
||||
library(Matrix)
|
||||
library(reticulate)
|
||||
|
||||
mu <- reticulate::import("mudata")
|
||||
ad <- reticulate::import("anndata")
|
||||
|
||||
## VIASH START
|
||||
meta <- list(
|
||||
resources_dir = "resources_test"
|
||||
)
|
||||
## VIASH END
|
||||
|
||||
# Helper function to create mock H5MU test data
|
||||
create_mock_h5mu <- function(path) {
|
||||
n_obs <- 5
|
||||
n_var_mod1 <- 4
|
||||
n_var_mod2 <- 3
|
||||
|
||||
# ============== MOD1 MODALITY ==============
|
||||
|
||||
mod1_x_data <- matrix(c(
|
||||
1, 2, 3, 0,
|
||||
4, 5, 6, 2,
|
||||
0, 1, 2, 3,
|
||||
2, 0, 1, 4,
|
||||
1, 3, 0, 2
|
||||
), nrow = n_obs, ncol = n_var_mod1, byrow = TRUE)
|
||||
|
||||
|
||||
# Create obs dataframe
|
||||
mod1_obs <- data.frame(
|
||||
Obs1 = c("A", "B", "A", "C", "B"),
|
||||
Obs2 = c(0.9, 0.8, 0.95, 0.7, 0.85),
|
||||
Obs3 = c(FALSE, FALSE, TRUE, FALSE, FALSE),
|
||||
row.names = paste0("cell_", 1:n_obs),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
# Create var dataframe
|
||||
mod1_var <- data.frame(
|
||||
Feat1 = c("A", "B", "C", "D"),
|
||||
Feat2 = c(TRUE, FALSE, TRUE, FALSE),
|
||||
Feat3 = c(1.6, 2.2, 1.2, 1.8),
|
||||
row.names = paste0("gene_", 1:n_var_mod1),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
|
||||
# Create layers
|
||||
mod1_layers <- list(
|
||||
counts = mod1_x_data * 2
|
||||
)
|
||||
|
||||
# Create obsm
|
||||
obsm_1 <- matrix(c(
|
||||
100.5, 200.3,
|
||||
150.2, 180.7,
|
||||
120.8, 220.1,
|
||||
180.4, 160.9,
|
||||
200.1, 190.5
|
||||
), nrow = n_obs, ncol = 2, byrow = TRUE)
|
||||
|
||||
obsm_2 <- matrix(c(
|
||||
-1.2, 0.8, 0.3,
|
||||
1.1, -0.5, -0.2,
|
||||
0.3, 1.2, 0.7,
|
||||
-0.8, -0.3, 1.1,
|
||||
0.9, 0.2, -0.9
|
||||
), nrow = n_obs, ncol = 3, byrow = TRUE)
|
||||
|
||||
mod1_obsm <- list(
|
||||
Obsm1 = obsm_1,
|
||||
Obsm2 = obsm_2
|
||||
)
|
||||
|
||||
# Create uns (unstructured metadata)
|
||||
mod1_uns <- list(
|
||||
experiment_info = "metadata"
|
||||
)
|
||||
|
||||
# Create AnnData object for mod1 using AnnDataR
|
||||
ad_mod1 <- ad$AnnData(
|
||||
X = mod1_x_data,
|
||||
obs = mod1_obs,
|
||||
var = mod1_var,
|
||||
layers = mod1_layers,
|
||||
obsm = mod1_obsm,
|
||||
uns = mod1_uns
|
||||
)
|
||||
|
||||
# ============== MOD2 MODALITY ==============
|
||||
|
||||
# Create expression matrix
|
||||
mod2_x_data <- matrix(c(
|
||||
10, 20, 15,
|
||||
25, 30, 18,
|
||||
12, 22, 20,
|
||||
18, 25, 12,
|
||||
20, 28, 16
|
||||
), nrow = n_obs, ncol = n_var_mod2, byrow = TRUE)
|
||||
|
||||
# Create obs dataframe
|
||||
mod2_obs <- data.frame(
|
||||
Obs = c("C", "D", "C", "E", "D"),
|
||||
row.names = paste0("cell_", 1:n_obs),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
|
||||
# Create var dataframe
|
||||
mod2_var <- data.frame(
|
||||
Feat = c("d", "e", "g"),
|
||||
row.names = paste0("protein_", 1:n_var_mod2),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
|
||||
# Create AnnData object for mod2
|
||||
ad_mod2 <- ad$AnnData(
|
||||
X = mod2_x_data,
|
||||
obs = mod2_obs,
|
||||
var = mod2_var
|
||||
)
|
||||
|
||||
# ============== CREATE MUDATA ==============
|
||||
|
||||
# Create MuData object using reticulate
|
||||
mdata <- mu$MuData(list(
|
||||
mod1 = ad_mod1,
|
||||
mod2 = ad_mod2
|
||||
))
|
||||
|
||||
# Write Mudata to path
|
||||
mdata$write_h5mu(path)
|
||||
path
|
||||
}
|
||||
|
||||
# Main test
|
||||
test_simple_execution <- function() {
|
||||
cat("> > Testing Simple Conversion\n")
|
||||
cat("> Creating mock H5MU file\n")
|
||||
|
||||
# Create mock H5MU file
|
||||
test_h5mu <- tempfile(fileext = ".h5mu")
|
||||
create_mock_h5mu(test_h5mu)
|
||||
|
||||
# Output file
|
||||
out_rds <- tempfile(fileext = ".rds")
|
||||
|
||||
# Run conversion
|
||||
cat("> Running conversion\n")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", test_h5mu,
|
||||
"--modality", "mod1",
|
||||
"--output", out_rds,
|
||||
"--obsm_spatial_coordinates", "Obsm1"
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking execution status\n")
|
||||
testthat::expect_equal(out$status, 0)
|
||||
testthat::expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
spe <- readRDS(file = out_rds)
|
||||
testthat::expect_s4_class(spe, "SpatialExperiment")
|
||||
|
||||
cat("> Opening input file for comparison\n")
|
||||
mod1 <- mu$read_h5ad(test_h5mu, mod = "mod1")
|
||||
|
||||
cat("> Testing dimensions\n")
|
||||
dim_spe <- dim(spe)
|
||||
dim_h5mu <- dim(mod1$X)
|
||||
|
||||
testthat::expect_equal(dim_spe[1], dim_h5mu[2])
|
||||
testthat::expect_equal(dim_spe[2], dim_h5mu[1])
|
||||
testthat::expect_equal(nrow(spe), 4)
|
||||
testthat::expect_equal(ncol(spe), 5)
|
||||
|
||||
cat("> Testing colData (obs) transfer and data types\n")
|
||||
col_data <- SummarizedExperiment::colData(spe)
|
||||
coldata_cols <- colnames(col_data)
|
||||
obs_cols <- colnames(mod1$obs)
|
||||
testthat::expect_true(all(obs_cols %in% coldata_cols))
|
||||
|
||||
# Test data types in colData
|
||||
testthat::expect_true(is.factor(col_data$Obs1))
|
||||
testthat::expect_true(is.numeric(col_data$Obs2))
|
||||
testthat::expect_true(is.logical(col_data$Obs3))
|
||||
|
||||
cat("> Testing rowData (var) transfer and data types\n")
|
||||
row_data <- SummarizedExperiment::rowData(spe)
|
||||
row_names <- colnames(row_data)
|
||||
var_cols <- colnames(mod1$var)
|
||||
testthat::expect_true(all(var_cols %in% row_names))
|
||||
|
||||
# Test data types in rowData
|
||||
testthat::expect_true(is.character(row_data$Feat1))
|
||||
testthat::expect_true(is.logical(row_data$Feat2))
|
||||
testthat::expect_true(is.numeric(row_data$Feat3))
|
||||
|
||||
cat("> Testing spatialCoords\n")
|
||||
spatial_coords <- SpatialExperiment::spatialCoords(spe)
|
||||
testthat::expect_false(is.null(spatial_coords))
|
||||
testthat::expect_equal(ncol(spatial_coords), 2)
|
||||
testthat::expect_equal(nrow(spatial_coords), ncol(spe))
|
||||
testthat::expect_identical(colnames(spatial_coords), c("x", "y"))
|
||||
|
||||
# Test spatial coordinate data types and values
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "x"]))
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "y"]))
|
||||
|
||||
# Compare with original spatial coordinates
|
||||
original_spatial <- mod1$obsm[["Obsm1"]]
|
||||
testthat::expect_equal(
|
||||
as.numeric(original_spatial),
|
||||
as.numeric(spatial_coords)
|
||||
)
|
||||
|
||||
cat("> Testing assay data\n")
|
||||
counts_matrix <- SummarizedExperiment::assays(spe)[["counts"]]
|
||||
testthat::expect_true(is(counts_matrix, "Matrix") || is.matrix(counts_matrix))
|
||||
testthat::expect_true(all(counts_matrix >= 0))
|
||||
testthat::expect_equal(dim(counts_matrix), c(4, 5))
|
||||
|
||||
cat("> Testing reducedDims\n")
|
||||
# PCA should not be in reducedDims since we only specified spatial
|
||||
red_dims <- SingleCellExperiment::reducedDims(spe)
|
||||
testthat::expect_false(is.null(red_dims))
|
||||
testthat::expect_equal(names(red_dims), c("Obsm2"))
|
||||
testthat::expect_equal(dim(red_dims$Obsm2), c(5, 3))
|
||||
testthat::expect_true(is.numeric(red_dims$Obsm2))
|
||||
|
||||
# Compare with original spatial coordinates
|
||||
original_dimred <- mod1$obsm[["Obsm2"]]
|
||||
testthat::expect_equal(
|
||||
as.numeric(red_dims$Obsm2),
|
||||
as.numeric(original_dimred)
|
||||
)
|
||||
|
||||
# Clean up
|
||||
unlink(c(test_h5mu, out_rds))
|
||||
}
|
||||
|
||||
test_xenium_execution <- function() {
|
||||
cat("> > Testing Xenium Conversion\n")
|
||||
xenium_h5mu <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/xenium_tiny.h5mu"
|
||||
)
|
||||
|
||||
# Output file
|
||||
out_rds <- tempfile(fileext = ".rds")
|
||||
|
||||
# Run conversion
|
||||
cat("> Running conversion\n")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", xenium_h5mu,
|
||||
"--modality", "rna",
|
||||
"--output", out_rds,
|
||||
"--obsm_spatial_coordinates", "spatial"
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking execution status\n")
|
||||
testthat::expect_equal(out$status, 0)
|
||||
testthat::expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
xenium_spe <- readRDS(file = out_rds)
|
||||
testthat::expect_s4_class(xenium_spe, "SpatialExperiment")
|
||||
|
||||
cat("> Opening input file for comparison\n")
|
||||
rna_mod <- mu$read_h5ad(xenium_h5mu, mod = "rna")
|
||||
|
||||
cat("> Testing dimensions\n")
|
||||
dim_spe <- dim(xenium_spe)
|
||||
dim_h5mu <- dim(rna_mod$X)
|
||||
|
||||
testthat::expect_equal(dim_spe[1], dim_h5mu[2])
|
||||
testthat::expect_equal(dim_spe[2], dim_h5mu[1])
|
||||
|
||||
cat("> Testing colData (obs) transfer and data types\n")
|
||||
col_data <- SummarizedExperiment::colData(xenium_spe)
|
||||
coldata_cols <- colnames(col_data)
|
||||
obs_cols <- colnames(rna_mod$obs)
|
||||
testthat::expect_true(all(obs_cols %in% coldata_cols))
|
||||
|
||||
cat("> Testing rowData (var) transfer and data types\n")
|
||||
row_data <- SummarizedExperiment::rowData(xenium_spe)
|
||||
row_names <- colnames(row_data)
|
||||
var_cols <- colnames(rna_mod$var)
|
||||
testthat::expect_true(all(var_cols %in% row_names))
|
||||
|
||||
cat("> Testing spatialCoords\n")
|
||||
spatial_coords <- SpatialExperiment::spatialCoords(xenium_spe)
|
||||
testthat::expect_false(is.null(spatial_coords))
|
||||
testthat::expect_equal(ncol(spatial_coords), 2)
|
||||
testthat::expect_equal(nrow(spatial_coords), ncol(xenium_spe))
|
||||
testthat::expect_identical(colnames(spatial_coords), c("x", "y"))
|
||||
|
||||
# Test spatial coordinate data types and values
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "x"]))
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "y"]))
|
||||
|
||||
# Compare with original spatial coordinates
|
||||
original_spatial <- rna_mod$obsm[["spatial"]]
|
||||
testthat::expect_equal(
|
||||
as.numeric(original_spatial),
|
||||
as.numeric(spatial_coords)
|
||||
)
|
||||
|
||||
# Clean up
|
||||
unlink(c(xenium_h5mu, out_rds))
|
||||
}
|
||||
|
||||
test_aviti_execution <- function() {
|
||||
cat("> > Testing Aviti Conversion\n")
|
||||
aviti_h5mu <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/aviti_teton_tiny.h5mu"
|
||||
)
|
||||
|
||||
# Output file
|
||||
out_rds <- tempfile(fileext = ".rds")
|
||||
|
||||
# Run conversion
|
||||
cat("> Running conversion\n")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", aviti_h5mu,
|
||||
"--modality", "rna",
|
||||
"--output", out_rds,
|
||||
"--obsm_spatial_coordinates", "spatial"
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking execution status\n")
|
||||
testthat::expect_equal(out$status, 0)
|
||||
testthat::expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
aviti_spe <- readRDS(file = out_rds)
|
||||
testthat::expect_s4_class(aviti_spe, "SpatialExperiment")
|
||||
|
||||
cat("> Opening input file for comparison\n")
|
||||
rna_mod <- mu$read_h5ad(aviti_h5mu, mod = "rna")
|
||||
|
||||
cat("> Testing dimensions\n")
|
||||
dim_spe <- dim(aviti_spe)
|
||||
dim_h5mu <- dim(rna_mod$X)
|
||||
|
||||
testthat::expect_equal(dim_spe[1], dim_h5mu[2])
|
||||
testthat::expect_equal(dim_spe[2], dim_h5mu[1])
|
||||
|
||||
cat("> Testing colData (obs) transfer and data types\n")
|
||||
col_data <- SummarizedExperiment::colData(aviti_spe)
|
||||
coldata_cols <- colnames(col_data)
|
||||
obs_cols <- colnames(rna_mod$obs)
|
||||
testthat::expect_true(all(obs_cols %in% coldata_cols))
|
||||
|
||||
cat("> Testing rowData (var) transfer and data types\n")
|
||||
row_data <- SummarizedExperiment::rowData(aviti_spe)
|
||||
row_names <- colnames(row_data)
|
||||
var_cols <- colnames(rna_mod$var)
|
||||
testthat::expect_true(all(var_cols %in% row_names))
|
||||
|
||||
cat("> Testing spatialCoords\n")
|
||||
spatial_coords <- SpatialExperiment::spatialCoords(aviti_spe)
|
||||
testthat::expect_false(is.null(spatial_coords))
|
||||
testthat::expect_equal(ncol(spatial_coords), 2)
|
||||
testthat::expect_equal(nrow(spatial_coords), ncol(aviti_spe))
|
||||
testthat::expect_identical(colnames(spatial_coords), c("x", "y"))
|
||||
|
||||
# Test spatial coordinate data types and values
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "x"]))
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "y"]))
|
||||
|
||||
# Compare with original spatial coordinates
|
||||
original_spatial <- rna_mod$obsm[["spatial"]]
|
||||
testthat::expect_equal(
|
||||
as.numeric(original_spatial),
|
||||
as.numeric(spatial_coords)
|
||||
)
|
||||
|
||||
# Clean up
|
||||
unlink(c(aviti_h5mu, out_rds))
|
||||
}
|
||||
|
||||
test_cosmx_execution <- function() {
|
||||
cat("> > Testing CosMx Conversion\n")
|
||||
cosmx_h5mu <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/Lung5_Rep2_tiny.h5mu"
|
||||
)
|
||||
|
||||
# Output file
|
||||
out_rds <- tempfile(fileext = ".rds")
|
||||
|
||||
# Run conversion
|
||||
cat("> Running conversion\n")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", cosmx_h5mu,
|
||||
"--modality", "rna",
|
||||
"--output", out_rds,
|
||||
"--obsm_spatial_coordinates", "spatial"
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking execution status\n")
|
||||
testthat::expect_equal(out$status, 0)
|
||||
testthat::expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
cosmx_spe <- readRDS(file = out_rds)
|
||||
testthat::expect_s4_class(cosmx_spe, "SpatialExperiment")
|
||||
|
||||
cat("> Opening input file for comparison\n")
|
||||
rna_mod <- mu$read_h5ad(cosmx_h5mu, mod = "rna")
|
||||
|
||||
cat("> Testing dimensions\n")
|
||||
dim_spe <- dim(cosmx_spe)
|
||||
dim_h5mu <- dim(rna_mod$X)
|
||||
|
||||
testthat::expect_equal(dim_spe[1], dim_h5mu[2])
|
||||
testthat::expect_equal(dim_spe[2], dim_h5mu[1])
|
||||
|
||||
cat("> Testing colData (obs) transfer and data types\n")
|
||||
col_data <- SummarizedExperiment::colData(cosmx_spe)
|
||||
coldata_cols <- colnames(col_data)
|
||||
obs_cols <- colnames(rna_mod$obs)
|
||||
testthat::expect_true(all(obs_cols %in% coldata_cols))
|
||||
|
||||
cat("> Testing rowData (var) transfer and data types\n")
|
||||
row_data <- SummarizedExperiment::rowData(cosmx_spe)
|
||||
row_names <- colnames(row_data)
|
||||
var_cols <- colnames(rna_mod$var)
|
||||
testthat::expect_true(all(var_cols %in% row_names))
|
||||
|
||||
cat("> Testing spatialCoords\n")
|
||||
spatial_coords <- SpatialExperiment::spatialCoords(cosmx_spe)
|
||||
testthat::expect_false(is.null(spatial_coords))
|
||||
testthat::expect_equal(ncol(spatial_coords), 2)
|
||||
testthat::expect_equal(nrow(spatial_coords), ncol(cosmx_spe))
|
||||
testthat::expect_identical(colnames(spatial_coords), c("x", "y"))
|
||||
|
||||
# Test spatial coordinate data types and values
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "x"]))
|
||||
testthat::expect_true(is.numeric(spatial_coords[, "y"]))
|
||||
|
||||
# Compare with original spatial coordinates
|
||||
original_spatial <- rna_mod$obsm[["spatial"]]
|
||||
testthat::expect_equal(
|
||||
as.numeric(original_spatial),
|
||||
as.numeric(spatial_coords)
|
||||
)
|
||||
|
||||
# Clean up
|
||||
unlink(c(cosmx_h5mu, out_rds))
|
||||
}
|
||||
|
||||
cat("Starting tests...")
|
||||
test_simple_execution()
|
||||
test_xenium_execution()
|
||||
test_aviti_execution()
|
||||
test_cosmx_execution()
|
||||
|
||||
cat("All tests completed!\n")
|
||||
90
src/convert/from_spaceranger_to_h5mu/config.vsh.yaml
Normal file
90
src/convert/from_spaceranger_to_h5mu/config.vsh.yaml
Normal file
@@ -0,0 +1,90 @@
|
||||
name: "from_spaceranger_to_h5mu"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Converts the output bundle from spaceranger into an h5mu file.
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: |
|
||||
Convert spatial data resulting from Aviti Teton sequencers that have been processed by the Element Biosciences cells2stats workflow to H5MU format.
|
||||
|
||||
This component processes cells2stats count matrices to create a standardized H5MU file for downstream analysis.
|
||||
|
||||
The component reads:
|
||||
- Parquet file containing the count matrix and metadata
|
||||
- Panel.json with target and batch information
|
||||
|
||||
And outputs an H5MU file with:
|
||||
- Count data as the main .X matrix
|
||||
- Spatial coordinates in obsm
|
||||
- Cell Paint intensities in obsm (optional)
|
||||
- Nuclear count data as a layer (optional)
|
||||
- CellProfiler morphology metrics in obsm (optional)
|
||||
- Unassigned targets in obsm (optional)
|
||||
example: spaceranger_output
|
||||
direction: input
|
||||
required: true
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
description: Output h5mu file.
|
||||
example: output.h5mu
|
||||
direction: output
|
||||
- name: "--modality"
|
||||
type: string
|
||||
description: Name of the modality under which to store the data.
|
||||
default: "rna"
|
||||
- name: "--uns_metrics"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to QC metrics (if any).
|
||||
default: "metrics_spaceranger"
|
||||
- name: "--uns_probe_set"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to store probe set information (if any).
|
||||
default: "probe_set"
|
||||
- name: "--obsm_coordinates"
|
||||
type: string
|
||||
description: Name of the .obsm slot under which to store the cell centroid coordinates.
|
||||
default: "spatial"
|
||||
- name: "--output_type"
|
||||
type: string
|
||||
description: "Which Spaceranger output to use for converting to h5mu."
|
||||
choices: [ raw, filtered ]
|
||||
default: filtered
|
||||
- name: "--output_compression"
|
||||
type: string
|
||||
description: Compression to use when writing the h5mu file.
|
||||
choices: [ gzip, lzf ]
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/visium/Visium_FFPE_Human_Ovarian_Cancer_tiny_spaceranger
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- type: python
|
||||
__merge__: [/src/base/requirements/anndata_mudata.yaml, /src/base/requirements/scanpy.yaml, .]
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
134
src/convert/from_spaceranger_to_h5mu/script.py
Normal file
134
src/convert/from_spaceranger_to_h5mu/script.py
Normal file
@@ -0,0 +1,134 @@
|
||||
from pathlib import Path
|
||||
import mudata
|
||||
import scanpy as sc
|
||||
import sys
|
||||
import pandas as pd
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "resources_test/visium/Visium_FFPE_Human_Ovarian_Cancer_tiny_spaceranger",
|
||||
"modality": "rna",
|
||||
"uns_metrics": "metrics_spaceranger",
|
||||
"uns_probe_set": "probe_set",
|
||||
"obsm_coordinates": "spatial",
|
||||
"output": "foo.h5mu",
|
||||
"min_genes": None,
|
||||
"min_counts": None,
|
||||
"output_compression": "gzip",
|
||||
"output_type": "filtered",
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
|
||||
def retrieve_input_data(spaceranger_output_bundle, input_type="filtered"):
|
||||
# Expected folder structure (showing only relevant files):
|
||||
# ├── Spatial/
|
||||
# │ └── tissue_positions.csv
|
||||
# ├── filtered_feature_bc_matrix.h5 OR raw_feature_bc_matrix.h5
|
||||
# ├── metrics_summary.csv
|
||||
# └── probe_set.csv
|
||||
|
||||
matrix_pattern = (
|
||||
"**/filtered_feature_bc_matrix.h5"
|
||||
if input_type == "filtered"
|
||||
else "**/raw_feature_bc_matrix.h5"
|
||||
)
|
||||
spaceranger_file_patterns = {
|
||||
"count_matrix": matrix_pattern,
|
||||
"metrics_summary": "**/metrics_summary.csv",
|
||||
"probe_set": "**/probe_set.csv",
|
||||
"spatial_coords": "**/spatial/tissue_positions.csv",
|
||||
}
|
||||
|
||||
spaceranger_output_bundle = Path(spaceranger_output_bundle)
|
||||
|
||||
spaceranger_files = {}
|
||||
|
||||
for key, pattern in spaceranger_file_patterns.items():
|
||||
file = list(spaceranger_output_bundle.glob(pattern))
|
||||
assert len(file) == 1, (
|
||||
f"Expected exactly one file for pattern '{pattern}', found {len(file)}."
|
||||
)
|
||||
spaceranger_files[key] = file[0]
|
||||
|
||||
return spaceranger_files
|
||||
|
||||
|
||||
def main():
|
||||
spaceranger_files = retrieve_input_data(par["input"], input_type=par["output_type"])
|
||||
|
||||
logger.info("Reading count matrix...")
|
||||
adata = sc.read_10x_h5(spaceranger_files["count_matrix"], gex_only=False)
|
||||
|
||||
# set the gene ids as var_names
|
||||
logger.info("Renaming var columns")
|
||||
adata.var = adata.var.rename_axis("gene_symbol").reset_index().set_index("gene_ids")
|
||||
|
||||
if par["uns_metrics"]:
|
||||
logger.info("Reading metrics summary file...")
|
||||
metrics_summary = pd.read_csv(
|
||||
spaceranger_files["metrics_summary"],
|
||||
decimal=".",
|
||||
quotechar='"',
|
||||
thousands=",",
|
||||
)
|
||||
|
||||
logger.info("Storing metrics summary in .uns slot...")
|
||||
adata.uns[par["uns_metrics"]] = metrics_summary
|
||||
|
||||
if par["uns_probe_set"]:
|
||||
logger.info("Reading probe set file...")
|
||||
|
||||
def read_hash_metadata(path):
|
||||
meta = {}
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
for i, line in enumerate(f):
|
||||
if not line.startswith("#"):
|
||||
break
|
||||
line = line[1:].strip()
|
||||
if "=" in line:
|
||||
k, v = line.split("=", 1)
|
||||
meta[k.strip()] = v.strip()
|
||||
return meta
|
||||
|
||||
meta = read_hash_metadata(spaceranger_files["probe_set"])
|
||||
probe_set = pd.read_csv(spaceranger_files["probe_set"], comment="#")
|
||||
|
||||
logger.info("Storing probe set in .uns slot...")
|
||||
adata.uns[par["uns_probe_set"]] = probe_set
|
||||
adata.uns[par["uns_probe_set"] + "_meta"] = meta
|
||||
|
||||
logger.info("Reading spatial coordinates...")
|
||||
spatial_coords = pd.read_csv(
|
||||
spaceranger_files["spatial_coords"], decimal=".", thousands=","
|
||||
)
|
||||
|
||||
spatial_coords_aligned = spatial_coords.set_index("barcode").reindex(
|
||||
adata.obs_names
|
||||
)
|
||||
logger.info("Storing spatial coordinates in .obsm slot...")
|
||||
adata.obsm[par["obsm_coordinates"]] = spatial_coords_aligned[
|
||||
["pxl_col_in_fullres", "pxl_row_in_fullres"]
|
||||
].to_numpy()
|
||||
|
||||
# generate output
|
||||
logger.info("Convert to mudata")
|
||||
mdata = mudata.MuData({par["modality"]: adata})
|
||||
|
||||
# override root .obs and .uns
|
||||
mdata.obs = adata.obs
|
||||
mdata.uns = adata.uns
|
||||
|
||||
# write output
|
||||
logger.info("Writing %s", par["output"])
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
44
src/convert/from_spaceranger_to_h5mu/test.py
Normal file
44
src/convert/from_spaceranger_to_h5mu/test.py
Normal file
@@ -0,0 +1,44 @@
|
||||
import pytest
|
||||
import sys
|
||||
import mudata as mu
|
||||
|
||||
## VIASH START
|
||||
meta = {
|
||||
"executable": "./target/executable/convert/from_spaceranger_to_h5mu/from_spaceranger_to_h5mu",
|
||||
"resources_dir": "resources_test/",
|
||||
"config": "src/convert/from_spaceranger_to_h5mu/config.vsh.yaml",
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
input = f"{meta['resources_dir']}/Visium_FFPE_Human_Ovarian_Cancer_tiny_spaceranger"
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output = tmp_path / "xenium.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
["--input", input, "--output", str(output), "--output_compression", "gzip"]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.uns.keys()) == [
|
||||
"metrics_spaceranger",
|
||||
"probe_set",
|
||||
"probe_set_meta",
|
||||
]
|
||||
assert list(adata.obsm.keys()) == ["spatial"]
|
||||
assert list(adata.var.keys()) == ["gene_symbol", "feature_types", "genome"]
|
||||
|
||||
assert adata.X.dtype.kind == "f"
|
||||
assert all(adata.var["feature_types"] == "Gene Expression")
|
||||
assert adata.obsm["spatial"].dtype == "float"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
56
src/convert/from_spatialdata_to_h5mu/config.vsh.yaml
Normal file
56
src/convert/from_spatialdata_to_h5mu/config.vsh.yaml
Normal file
@@ -0,0 +1,56 @@
|
||||
name: "from_spatialdata_to_h5mu"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Reads in the Tables field stored in a SpatialData object and converts it to an h5mu file.
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input zarr folder where the SpatialData object is stored.
|
||||
example: input.zarr
|
||||
direction: input
|
||||
required: true
|
||||
- name: "--modality"
|
||||
type: string
|
||||
default: rna
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: The output h5mu file.
|
||||
example: "output.h5mu"
|
||||
direction: output
|
||||
- name: "--output_compression"
|
||||
type: string
|
||||
choices: ["gzip", "lzf"]
|
||||
required: false
|
||||
example: "gzip"
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/xenium/xenium_tiny.zarr
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- type: python
|
||||
__merge__: [/src/base/requirements/anndata_mudata.yaml, /src/base/requirements/spatialdata.yaml]
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
28
src/convert/from_spatialdata_to_h5mu/script.py
Normal file
28
src/convert/from_spatialdata_to_h5mu/script.py
Normal file
@@ -0,0 +1,28 @@
|
||||
import sys
|
||||
import spatialdata as sd
|
||||
import mudata as mu
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "./resources_test/xenium/xenium_tiny.zarr",
|
||||
"output": "./resources_test/xenium/xenium_tiny.h5mu",
|
||||
"modality": "rna",
|
||||
"output_compression": None,
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
logger.info("Reading in Xenium data...")
|
||||
sdata = sd.read_zarr(par["input"])
|
||||
|
||||
logger.info("Fetching AnnData table from SpatialData object...")
|
||||
adata = sdata.tables["table"]
|
||||
|
||||
logger.info("Writing output MuData object...")
|
||||
mdata = mu.MuData({par["modality"]: adata})
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
52
src/convert/from_spatialdata_to_h5mu/test.py
Normal file
52
src/convert/from_spatialdata_to_h5mu/test.py
Normal file
@@ -0,0 +1,52 @@
|
||||
import pytest
|
||||
import sys
|
||||
import mudata as mu
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output = tmp_path / "output.h5mu"
|
||||
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
meta["resources_dir"] + "/xenium_tiny.zarr",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
# TODO: update what is checked here when spatialdata from other experimental set-ups are tested (e.g. cosmx, visium)
|
||||
assert list(adata.obs.keys()) == [
|
||||
"cell_id",
|
||||
"transcript_counts",
|
||||
"control_probe_counts",
|
||||
"genomic_control_counts",
|
||||
"control_codeword_counts",
|
||||
"unassigned_codeword_counts",
|
||||
"deprecated_codeword_counts",
|
||||
"total_counts",
|
||||
"cell_area",
|
||||
"nucleus_area",
|
||||
"nucleus_count",
|
||||
"segmentation_method",
|
||||
"region",
|
||||
"z_level",
|
||||
"cell_labels",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == ["spatialdata_attrs"]
|
||||
assert list(adata.obsm.keys()) == ["spatial"]
|
||||
assert list(adata.var.keys()) == ["gene_ids", "feature_types", "genome"]
|
||||
|
||||
assert all(adata.var["feature_types"] == "Gene Expression")
|
||||
assert adata.obsm["spatial"].dtype == "float"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
78
src/convert/from_xenium_to_h5mu/config.vsh.yaml
Normal file
78
src/convert/from_xenium_to_h5mu/config.vsh.yaml
Normal file
@@ -0,0 +1,78 @@
|
||||
name: "from_xenium_to_h5mu"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Converts the output from Xenium to a single .h5mu file, where the count matrix is written to the `rna` modality.
|
||||
The following files are expected to be present in the Xenium output bundle:
|
||||
├── cell_feature_matrix.h5
|
||||
├── cells.parquet
|
||||
├── experiment.xenium
|
||||
└── metrics_summary.csv
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input folder. Must contain the output from a Xenium run.
|
||||
example: xenium_output_bundle
|
||||
direction: input
|
||||
required: true
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: Output .h5mu file.
|
||||
example: "xenium.h5mu"
|
||||
direction: output
|
||||
- name: "--obsm_coordinates"
|
||||
type: string
|
||||
description: Name of the .obsm slot under which to store the cell centroid coordinates.
|
||||
default: "spatial"
|
||||
- name: "--uns_experiment"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to store the Xenium experiment specifications.
|
||||
default: "xenium_experiment"
|
||||
- name: "--uns_metrics"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to store the summary QC metrics.
|
||||
default: "xenium_metrics"
|
||||
__merge__: [., /src/base/h5_compression_argument.yaml]
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
- path: /src/utils/unzip_archived_folder.py
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/xenium/xenium_tiny
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- build-essential
|
||||
- zlib1g-dev
|
||||
- git
|
||||
- type: python
|
||||
__merge__: [/src/base/requirements/anndata_mudata.yaml, /src/base/requirements/scanpy.yaml, .]
|
||||
packages: [ pyarrow ]
|
||||
# Windows explorer uses DEFLATE64 compression for large ZIP files,
|
||||
# which is not supported by most standard library zipfile module
|
||||
git: [ https://codeberg.org/miurahr/zipfile-inflate64.git@v0.2 ]
|
||||
test_setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- zip
|
||||
- type: python
|
||||
__merge__: [ /src/base/requirements/viashpy.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
100
src/convert/from_xenium_to_h5mu/script.py
Normal file
100
src/convert/from_xenium_to_h5mu/script.py
Normal file
@@ -0,0 +1,100 @@
|
||||
import sys
|
||||
from pathlib import Path
|
||||
import scanpy as sc
|
||||
import pandas as pd
|
||||
import mudata as mu
|
||||
import zipfile_inflate64 as zipfile
|
||||
import json
|
||||
import os
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "test/xenium_tiny.zip",
|
||||
"output": "xenium_tiny_test.h5mu",
|
||||
"output_compression": "gzip",
|
||||
"obsm_coordinates": "spatial",
|
||||
"uns_experiment": "xenium_experiment",
|
||||
"uns_metrics": "xenium_metrics",
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
from unzip_archived_folder import extract_selected_files_from_zip
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
|
||||
def _retrieve_input_data(xenium_output_bundle):
|
||||
# Expected folder structure (showing only relevant files):
|
||||
# ├── cell_feature_matrix.h5
|
||||
# ├── cells.parquet
|
||||
# ├── experiment.xenium
|
||||
# └── metrics_summary.csv
|
||||
|
||||
required_file_patterns = {
|
||||
"count_matrix": "**/cell_feature_matrix.h5",
|
||||
"cells_metadata": "**/cells.parquet",
|
||||
"experiment": "**/experiment.xenium",
|
||||
"metrics_summary": "**/metrics_summary.csv",
|
||||
}
|
||||
|
||||
if zipfile.is_zipfile(xenium_output_bundle):
|
||||
xenium_output_bundle = extract_selected_files_from_zip(
|
||||
xenium_output_bundle,
|
||||
members=[pattern for pattern in required_file_patterns.values()],
|
||||
)
|
||||
else:
|
||||
xenium_output_bundle = Path(xenium_output_bundle)
|
||||
|
||||
assert os.path.isdir(xenium_output_bundle), (
|
||||
"Input is expected to be a (compressed) directory."
|
||||
)
|
||||
|
||||
input_data = {}
|
||||
for key, pattern in required_file_patterns.items():
|
||||
file = list(xenium_output_bundle.glob(pattern))
|
||||
assert len(file) == 1, (
|
||||
f"Expected exactly one file matching pattern {pattern}, found {len(file)}."
|
||||
)
|
||||
input_data[key] = file[0]
|
||||
|
||||
return input_data
|
||||
|
||||
|
||||
def _format_cell_id_column(cell_id_column: pd.Series) -> pd.Series:
|
||||
"""Convert cell IDs to string format, decoding bytes if necessary."""
|
||||
return cell_id_column.apply(
|
||||
lambda x: x.decode("utf-8") if isinstance(x, bytes) else str(x)
|
||||
)
|
||||
|
||||
|
||||
# Read data from Xenium output bundle
|
||||
logger.info("Reading input data...")
|
||||
|
||||
input_data = _retrieve_input_data(par["input"])
|
||||
|
||||
adata = sc.read_10x_h5(input_data["count_matrix"])
|
||||
metadata = pd.read_parquet(input_data["cells_metadata"], engine="pyarrow")
|
||||
with open(input_data["experiment"], "r") as f:
|
||||
specs = json.load(f)
|
||||
metrics_summary = pd.read_csv(
|
||||
input_data["metrics_summary"], decimal=".", quotechar='"', thousands=","
|
||||
)
|
||||
|
||||
# Extract and format required columns
|
||||
cell_ids = _format_cell_id_column(metadata["cell_id"])
|
||||
coordinates = metadata[["x_centroid", "y_centroid"]].to_numpy()
|
||||
metadata.drop(["cell_id", "x_centroid", "y_centroid"], axis=1, inplace=True)
|
||||
|
||||
# Updata AnnData with metadata
|
||||
adata.obs = metadata
|
||||
adata.obs_names = cell_ids
|
||||
adata.obsm[par["obsm_coordinates"]] = coordinates
|
||||
adata.uns[par["uns_experiment"]] = specs
|
||||
adata.uns[par["uns_metrics"]] = metrics_summary
|
||||
|
||||
# Write output MuData
|
||||
mdata = mu.MuData({"rna": adata})
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
161
src/convert/from_xenium_to_h5mu/test.py
Normal file
161
src/convert/from_xenium_to_h5mu/test.py
Normal file
@@ -0,0 +1,161 @@
|
||||
import pytest
|
||||
import sys
|
||||
import subprocess
|
||||
import mudata as mu
|
||||
|
||||
## VIASH START
|
||||
meta = {
|
||||
"executable": "./target/executable/convert/from_xenium_to_h5mu/from_xenium_to_h5mu",
|
||||
"resources_dir": "resources_test/",
|
||||
"config": "src/convert/from_xenium_to_h5mu/config.vsh.yaml",
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
input = f"{meta['resources_dir']}/xenium_tiny"
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output = tmp_path / "xenium.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
["--input", input, "--output", str(output), "--output_compression", "gzip"]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.obs.keys()) == [
|
||||
"transcript_counts",
|
||||
"control_probe_counts",
|
||||
"genomic_control_counts",
|
||||
"control_codeword_counts",
|
||||
"unassigned_codeword_counts",
|
||||
"deprecated_codeword_counts",
|
||||
"total_counts",
|
||||
"cell_area",
|
||||
"nucleus_area",
|
||||
"nucleus_count",
|
||||
"segmentation_method",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == ["xenium_experiment", "xenium_metrics"]
|
||||
assert list(adata.obsm.keys()) == ["spatial"]
|
||||
assert list(adata.var.keys()) == ["gene_ids", "feature_types", "genome"]
|
||||
|
||||
assert adata.X.dtype.kind == "f"
|
||||
assert all(adata.var["feature_types"] == "Gene Expression")
|
||||
assert adata.obsm["spatial"].dtype == "float"
|
||||
obs_counts = [
|
||||
"transcript_counts",
|
||||
"control_probe_counts",
|
||||
"genomic_control_counts",
|
||||
"unassigned_codeword_counts",
|
||||
"deprecated_codeword_counts",
|
||||
"total_counts",
|
||||
"nucleus_count",
|
||||
]
|
||||
assert all([adata.obs[obs].dtype == "int" for obs in obs_counts])
|
||||
obs_areas = ["cell_area", "nucleus_area"]
|
||||
assert all([adata.obs[obs].dtype == "float" for obs in obs_areas])
|
||||
|
||||
|
||||
def test_compressed_input(run_component, tmp_path):
|
||||
output = tmp_path / "xenium.h5mu"
|
||||
zipped_input = tmp_path / "xenium_tiny.zip"
|
||||
|
||||
subprocess.run(
|
||||
["zip", "-r", str(zipped_input), "xenium_tiny"],
|
||||
cwd=meta["resources_dir"],
|
||||
check=True,
|
||||
)
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
zipped_input,
|
||||
"--output",
|
||||
str(output),
|
||||
"--output_compression",
|
||||
"gzip",
|
||||
]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.obs.keys()) == [
|
||||
"transcript_counts",
|
||||
"control_probe_counts",
|
||||
"genomic_control_counts",
|
||||
"control_codeword_counts",
|
||||
"unassigned_codeword_counts",
|
||||
"deprecated_codeword_counts",
|
||||
"total_counts",
|
||||
"cell_area",
|
||||
"nucleus_area",
|
||||
"nucleus_count",
|
||||
"segmentation_method",
|
||||
]
|
||||
|
||||
assert list(adata.uns.keys()) == ["xenium_experiment", "xenium_metrics"]
|
||||
assert list(adata.obsm.keys()) == ["spatial"]
|
||||
assert list(adata.var.keys()) == ["gene_ids", "feature_types", "genome"]
|
||||
|
||||
assert adata.X.dtype.kind == "f"
|
||||
assert all(adata.var["feature_types"] == "Gene Expression")
|
||||
assert adata.obsm["spatial"].dtype == "float"
|
||||
obs_counts = [
|
||||
"transcript_counts",
|
||||
"control_probe_counts",
|
||||
"genomic_control_counts",
|
||||
"unassigned_codeword_counts",
|
||||
"deprecated_codeword_counts",
|
||||
"total_counts",
|
||||
"nucleus_count",
|
||||
]
|
||||
assert all([adata.obs[obs].dtype == "int" for obs in obs_counts])
|
||||
obs_areas = ["cell_area", "nucleus_area"]
|
||||
assert all([adata.obs[obs].dtype == "float" for obs in obs_areas])
|
||||
|
||||
|
||||
def test_rename_fields(run_component, tmp_path):
|
||||
output = tmp_path / "xenium.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
input,
|
||||
"--output",
|
||||
str(output),
|
||||
"--obsm_coordinates",
|
||||
"test_coord",
|
||||
"--uns_experiment",
|
||||
"test_experiment",
|
||||
"--uns_metrics",
|
||||
"test_metrics",
|
||||
"--output_compression",
|
||||
"gzip",
|
||||
]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"]
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
assert list(adata.uns.keys()) == ["test_experiment", "test_metrics"]
|
||||
assert list(adata.obsm.keys()) == ["test_coord"]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
110
src/convert/from_xenium_to_spatialdata/config.vsh.yaml
Normal file
110
src/convert/from_xenium_to_spatialdata/config.vsh.yaml
Normal file
@@ -0,0 +1,110 @@
|
||||
name: "from_xenium_to_spatialdata"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Converts the output from 10X Genomics Xenium dataset into a SpatialData objcet.
|
||||
By default, the following files will be converted:
|
||||
- `experiment.xenium`: File containing specifications.
|
||||
- `nucleus_boundaries.parquet`: Polygons of nucleus boundaries.
|
||||
- `cell_boundaries.parquet`: Polygons of cell boundaries.
|
||||
- `transcripts.parquet`: File containing transcripts.
|
||||
- `cell_feature_matrix.h5`: File containing cell feature matrix.
|
||||
- `cells.parquet`: File containing cell metadata.
|
||||
- `morphology_mip.ome.tif`: File containing morphology mip.
|
||||
- `morphology_focus.ome.tif`: File containing morphology focus.
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input folder. Must contain the output from a xenium run.
|
||||
example: xenium_data
|
||||
direction: input
|
||||
required: true
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: Zarr directory where the SpatialData object will be stored
|
||||
example: "xenium_data.zarr"
|
||||
direction: output
|
||||
- name: "--cells_boundaries"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read cell boundaries (polygons).
|
||||
- name: "--nucleus_boundaries"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read nucleus boundaries (polygons).
|
||||
- name: "--cells_as_circles"
|
||||
type: boolean_true
|
||||
description: Whether to read cells also as circles (the center and the radius of each circle is computed from the corresponding labels cell).
|
||||
- name: "--cells_labels"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read cell labels (raster). The polygonal version of the cell labels are simplified for visualization purposes, and using the raster version is recommended for analysis.
|
||||
- name: "--transcripts"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read transcripts.
|
||||
- name: "--nucleus_labels"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read nucleus labels (raster). The polygonal version of the nucleus labels are simplified for visualization purposes, and using the raster version is recommended for analysis.
|
||||
- name: "--morphology_mip"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read the morphology mip image (available in versions < 2.0.0).
|
||||
- name: "--morphology_focus"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read the morphology focus image.
|
||||
- name: "--aligned_images"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to also parse, when available, additional H&E or IF aligned images. For more control over the aligned images being read, in particular, to specify the axes of the aligned images, please set this parameter to False and use the xenium_aligned_image function directly.
|
||||
- name: "--cells_table"
|
||||
type: boolean
|
||||
default: True
|
||||
description: Whether to read the cell annotations in the AnnData table.
|
||||
- name: "--n_jobs"
|
||||
type: integer
|
||||
default: 1
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
- path: /src/utils/unzip_archived_folder.py
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/xenium/xenium_tiny/
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- build-essential
|
||||
- zlib1g-dev
|
||||
- git
|
||||
- type: python
|
||||
# Windows explorer uses DEFLATE64 compression for large ZIP files,
|
||||
# which is not supported by most standard library zipfile module
|
||||
git: [ https://codeberg.org/miurahr/zipfile-inflate64.git@v0.2 ]
|
||||
__merge__: [ /src/base/requirements/spatialdata-io.yaml, . ]
|
||||
test_setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- zip
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
65
src/convert/from_xenium_to_spatialdata/script.py
Normal file
65
src/convert/from_xenium_to_spatialdata/script.py
Normal file
@@ -0,0 +1,65 @@
|
||||
import sys
|
||||
from spatialdata_io import xenium
|
||||
import zipfile_inflate64 as zipfile
|
||||
from pathlib import Path
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "resources_test/xenium/xenium_tiny",
|
||||
"output": "./test/xenium_tiny.zarr",
|
||||
"cells_boundaries": True,
|
||||
"nucleus_boundaries": True,
|
||||
"cells_as_circles": None,
|
||||
"cells_labels": True,
|
||||
"nucleus_labels": True,
|
||||
"transcripts": True,
|
||||
"morphology_mip": True,
|
||||
"morphology_focus": True,
|
||||
"aligned_images": True,
|
||||
"cells_table": True,
|
||||
"n_jobs": 1,
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
from unzip_archived_folder import unzip_archived_folder
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
logger.info("Reading in Xenium data...")
|
||||
|
||||
if zipfile.is_zipfile(par["input"]):
|
||||
required_file_patterns = [
|
||||
"**/experiment.xenium",
|
||||
"**/nucleus_boundaries.parquet",
|
||||
"**/cell_boundaries.parquet",
|
||||
"**/transcripts.parquet",
|
||||
"**/cell_feature_matrix.h5",
|
||||
"**/cells.parquet",
|
||||
"**/morphology_mip.ome.tif",
|
||||
"**/morphology_focus.ome.tif",
|
||||
]
|
||||
xenium_output_bundle = unzip_archived_folder(par["input"])
|
||||
else:
|
||||
xenium_output_bundle = Path(par["input"])
|
||||
|
||||
sdata = xenium(
|
||||
xenium_output_bundle,
|
||||
cells_boundaries=par["cells_boundaries"],
|
||||
nucleus_boundaries=par["nucleus_boundaries"],
|
||||
cells_as_circles=par["cells_as_circles"],
|
||||
cells_labels=par["cells_labels"],
|
||||
nucleus_labels=par["nucleus_labels"],
|
||||
transcripts=par["transcripts"],
|
||||
morphology_mip=par["morphology_mip"], # only available in version < 2.0.0
|
||||
morphology_focus=par["morphology_focus"],
|
||||
aligned_images=par["aligned_images"],
|
||||
cells_table=par["cells_table"],
|
||||
n_jobs=par["n_jobs"],
|
||||
)
|
||||
|
||||
|
||||
logger.info("Writing out SpatialData object to Zarr...")
|
||||
sdata.write(par["output"], overwrite=True)
|
||||
73
src/convert/from_xenium_to_spatialdata/test.py
Normal file
73
src/convert/from_xenium_to_spatialdata/test.py
Normal file
@@ -0,0 +1,73 @@
|
||||
import pytest
|
||||
import os
|
||||
import sys
|
||||
import spatialdata as sd
|
||||
import subprocess
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output_sd_path = tmp_path / "sd"
|
||||
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
meta["resources_dir"] + "/xenium_tiny",
|
||||
"--output",
|
||||
output_sd_path,
|
||||
]
|
||||
)
|
||||
|
||||
assert os.path.exists(output_sd_path), "Output zarr folder was not created"
|
||||
|
||||
sdata = sd.read_zarr(output_sd_path)
|
||||
assert isinstance(sdata, sd.SpatialData), (
|
||||
"the generated output is not a SpatialData object"
|
||||
)
|
||||
|
||||
assert os.path.exists(output_sd_path / "images"), "images folder was not created"
|
||||
assert os.path.exists(output_sd_path / "labels"), "labels folder was not created"
|
||||
assert os.path.exists(output_sd_path / "points"), "images folder was not created"
|
||||
assert os.path.exists(output_sd_path / "shapes"), "shapes folder was not created"
|
||||
assert os.path.exists(output_sd_path / "tables"), "tables folder was not created"
|
||||
assert (output_sd_path / "zarr.json").is_file(), (
|
||||
"zarr metadata file was not created"
|
||||
)
|
||||
|
||||
|
||||
def test_compressed_input(run_component, tmp_path):
|
||||
output_sd_path = tmp_path / "sd"
|
||||
zipped_input = tmp_path / "xenium_tiny.zip"
|
||||
|
||||
subprocess.run(
|
||||
["zip", "-r", str(zipped_input), "xenium_tiny"],
|
||||
cwd=meta["resources_dir"],
|
||||
check=True,
|
||||
)
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
zipped_input,
|
||||
"--output",
|
||||
output_sd_path,
|
||||
]
|
||||
)
|
||||
|
||||
assert os.path.exists(output_sd_path), "Output zarr folder was not created"
|
||||
|
||||
sdata = sd.read_zarr(output_sd_path)
|
||||
assert isinstance(sdata, sd.SpatialData), (
|
||||
"the generated output is not a SpatialData object"
|
||||
)
|
||||
|
||||
assert os.path.exists(output_sd_path / "images"), "images folder was not created"
|
||||
assert os.path.exists(output_sd_path / "labels"), "labels folder was not created"
|
||||
assert os.path.exists(output_sd_path / "points"), "images folder was not created"
|
||||
assert os.path.exists(output_sd_path / "shapes"), "shapes folder was not created"
|
||||
assert os.path.exists(output_sd_path / "tables"), "tables folder was not created"
|
||||
assert (output_sd_path / "zarr.json").is_file(), (
|
||||
"zarr metadata file was not created"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
80
src/convert/from_xenium_to_spatialexperiment/config.vsh.yaml
Normal file
80
src/convert/from_xenium_to_spatialexperiment/config.vsh.yaml
Normal file
@@ -0,0 +1,80 @@
|
||||
name: "from_xenium_to_spatialexperiment"
|
||||
namespace: "convert"
|
||||
scope: "public"
|
||||
description: |
|
||||
Creates a SpatialExperiment object from the downloaded unzipped Xenium Output Bundle directory
|
||||
for 10x Genomics Xenium spatial gene expression data, and saves it as a SpatialExperiment object.
|
||||
The constructor assumes the downloaded unzipped Xenium Output Bundle has the following structure:
|
||||
|
||||
Mandatory files
|
||||
· | — cell_feature_matrix.h5
|
||||
· | — cells.parquet
|
||||
Optional files, by default added to the metadata() as a list of paths (will be converted to parquet):
|
||||
· | — transcripts.parquet
|
||||
· | — cell_boundaries.parquet
|
||||
· | — nucleus_boundaries.parquet
|
||||
· | — experiment.xenium
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input Xenium Output Bundle
|
||||
direction: input
|
||||
required: true
|
||||
example: path/to/xenium_bundle
|
||||
- name: "--add_experiment_xenium"
|
||||
type: boolean
|
||||
default: true
|
||||
description: Whether to add xenium.experiment parameters to the metadata.
|
||||
- name: "--add_parquet_paths"
|
||||
type: boolean
|
||||
default: true
|
||||
description: |
|
||||
Whether to add parquet paths to the metadata.
|
||||
If True, `transcripts.parquet`, `cell_boundaries.parquet`, `nucleus_boundaries.parquet` will be added to the metadata.
|
||||
- name: "--alternative_experiment_features"
|
||||
type: string
|
||||
multiple: true
|
||||
description: Feature names containing these strings will be moved to altExps(sxe) slots as separate SpatialExperiment objects.
|
||||
default: [NegControlProbe, UnassignedCodeword, NegControlCodeword, antisense, BLANK]
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: Output SpatialExperiment file
|
||||
direction: output
|
||||
required: true
|
||||
example: output.rds
|
||||
resources:
|
||||
- type: r_script
|
||||
path: script.R
|
||||
- path: /src/utils/unzip_archived_folder.R
|
||||
test_resources:
|
||||
- type: r_script
|
||||
path: test.R
|
||||
- path: /resources_test/xenium/xenium_tiny
|
||||
engines:
|
||||
- type: docker
|
||||
image: rocker/r2u:24.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- libhdf5-dev
|
||||
- libgeos-dev
|
||||
- type: docker
|
||||
env: ["LIBARROW_MINIMAL=false"]
|
||||
- type: r
|
||||
script: 'install.packages("arrow", type = "source")'
|
||||
- type: r
|
||||
bioc: [ SpatialExperimentIO ]
|
||||
test_setup:
|
||||
- type: r
|
||||
cran: [ testthat ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
53
src/convert/from_xenium_to_spatialexperiment/script.R
Normal file
53
src/convert/from_xenium_to_spatialexperiment/script.R
Normal file
@@ -0,0 +1,53 @@
|
||||
library(SpatialExperimentIO)
|
||||
|
||||
### VIASH START
|
||||
par <- list(
|
||||
input = "resources_test/xenium/temp_dir.zip",
|
||||
add_experiment_xenium = TRUE,
|
||||
add_parquet_paths = TRUE,
|
||||
alternative_experiment_features = c(
|
||||
"NegControlProbe", "UnassignedCodeword",
|
||||
"NegControlCodeword", "antisense", "BLANK"
|
||||
),
|
||||
output = "spe_test.rds"
|
||||
)
|
||||
meta <- list(
|
||||
resources_dir = "src/utils/"
|
||||
)
|
||||
### VIASH END
|
||||
|
||||
source(paste0(meta$resources_dir, "/unzip_archived_folder.R"))
|
||||
|
||||
cat("Reading input data...")
|
||||
if (tools::file_ext(par$input) == "zip") {
|
||||
required_file_patterns <- c(
|
||||
"**/cell_feature_matrix.h5",
|
||||
"**/*.parquet",
|
||||
"**/experiment.xenium"
|
||||
)
|
||||
tmp_dir <- extract_selected_files(
|
||||
par$input,
|
||||
members = required_file_patterns
|
||||
)
|
||||
xenium_output_bundle <- file.path(
|
||||
tmp_dir,
|
||||
tools::file_path_sans_ext(basename(par$input))
|
||||
)
|
||||
} else {
|
||||
xenium_output_bundle <- par$input
|
||||
}
|
||||
|
||||
cat("Converting to SpatialExperiment")
|
||||
spe <- readXeniumSXE(
|
||||
dirName = xenium_output_bundle,
|
||||
returnType = "SPE",
|
||||
countMatPattern = "cell_feature_matrix.h5",
|
||||
metaDataPattern = "cells.parquet",
|
||||
coordNames = c("x_centroid", "y_centroid"),
|
||||
addExperimentXenium = par$add_experiment_xenium,
|
||||
addParquetPaths = par$add_parquet_paths,
|
||||
altExps = par$alternative_experiment_features
|
||||
)
|
||||
|
||||
cat("Saving output...")
|
||||
saveRDS(spe, file = par$output)
|
||||
182
src/convert/from_xenium_to_spatialexperiment/test.R
Normal file
182
src/convert/from_xenium_to_spatialexperiment/test.R
Normal file
@@ -0,0 +1,182 @@
|
||||
library(testthat, warn.conflicts = FALSE)
|
||||
library(SpatialExperimentIO)
|
||||
library(SpatialExperiment)
|
||||
|
||||
## VIASH START
|
||||
meta <- list(
|
||||
executable = "./from_xenium_to_spatialexperiment",
|
||||
resources_dir = "resources_test/xenium",
|
||||
name = "from_xenium_to_spatial_experiment"
|
||||
)
|
||||
## VIASH END
|
||||
|
||||
cat("> Checking simple execution\n")
|
||||
|
||||
spe <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/xenium_tiny"
|
||||
)
|
||||
out_rds <- "output.rds"
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", spe,
|
||||
"--output", out_rds
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out$status, 0)
|
||||
expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj <- readRDS(file = out_rds)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(spatialCoordsNames(obj), c("x_centroid", "y_centroid"))
|
||||
# Alternative experiments
|
||||
expect_equal(
|
||||
altExpNames(obj),
|
||||
c("NegControlProbe", "UnassignedCodeword", "NegControlCodeword")
|
||||
)
|
||||
# Metadata components
|
||||
metadata_components <- c(
|
||||
"experiment.xenium", "transcripts", "cell_boundaries", "nucleus_boundaries"
|
||||
)
|
||||
expect_named(
|
||||
metadata(obj),
|
||||
metadata_components,
|
||||
ignore.order = TRUE
|
||||
)
|
||||
# Parquet paths
|
||||
parquet_components <- c("transcripts", "cell_boundaries", "nucleus_boundaries")
|
||||
for (component in parquet_components) {
|
||||
expect_true(grepl("\\.parquet$", metadata(obj)[[component]]))
|
||||
}
|
||||
# Dimensions
|
||||
input <- readXeniumSXE(
|
||||
dirName = spe,
|
||||
returnType = "SPE"
|
||||
)
|
||||
dim_rds <- dim(obj)
|
||||
dim_input <- dim(input)
|
||||
|
||||
expect_equal(dim_rds, dim_input)
|
||||
|
||||
|
||||
cat("> Checking execution with compressed input\n")
|
||||
|
||||
spe <- paste0(
|
||||
meta[["resources_dir"]],
|
||||
"/xenium_tiny"
|
||||
)
|
||||
out_rds <- "output.rds"
|
||||
|
||||
create_folder_archive <- function(folder_path, archive = "xenium_tiny.zip") {
|
||||
old_wd <- getwd()
|
||||
on.exit(setwd(old_wd))
|
||||
setwd(meta$resources_dir)
|
||||
system2("zip", c("-r", archive, "xenium_tiny"))
|
||||
paste0(meta$resources_dir, "/", archive)
|
||||
}
|
||||
|
||||
zipped_spe <- create_folder_archive(spe)
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", zipped_spe,
|
||||
"--output", out_rds
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out$status, 0)
|
||||
expect_true(file.exists(out_rds))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj <- readRDS(file = out_rds)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(spatialCoordsNames(obj), c("x_centroid", "y_centroid"))
|
||||
# Alternative experiments
|
||||
expect_equal(
|
||||
altExpNames(obj),
|
||||
c("NegControlProbe", "UnassignedCodeword", "NegControlCodeword")
|
||||
)
|
||||
# Metadata components
|
||||
metadata_components <- c(
|
||||
"experiment.xenium", "transcripts", "cell_boundaries", "nucleus_boundaries"
|
||||
)
|
||||
expect_named(
|
||||
metadata(obj),
|
||||
metadata_components,
|
||||
ignore.order = TRUE
|
||||
)
|
||||
# Parquet paths
|
||||
parquet_components <- c("transcripts", "cell_boundaries", "nucleus_boundaries")
|
||||
for (component in parquet_components) {
|
||||
expect_true(grepl("\\.parquet$", metadata(obj)[[component]]))
|
||||
}
|
||||
# Dimensions
|
||||
input <- readXeniumSXE(
|
||||
dirName = spe,
|
||||
returnType = "SPE"
|
||||
)
|
||||
dim_rds <- dim(obj)
|
||||
dim_input <- dim(input)
|
||||
|
||||
expect_equal(dim_rds, dim_input)
|
||||
|
||||
|
||||
cat("> Checking parameter functionality\n")
|
||||
|
||||
out_rds_ext <- "output_ext.rds"
|
||||
|
||||
cat("> Running ", meta[["name"]], "\n", sep = "")
|
||||
out_ext <- processx::run(
|
||||
meta[["executable"]],
|
||||
c(
|
||||
"--input", spe,
|
||||
"--add_experiment_xenium", FALSE,
|
||||
"--add_parquet_paths", FALSE,
|
||||
"--alternative_experiment_features", c("NegControlProbe"),
|
||||
"--output", out_rds_ext
|
||||
)
|
||||
)
|
||||
|
||||
cat("> Checking whether output file exists\n")
|
||||
expect_equal(out_ext$status, 0)
|
||||
expect_true(file.exists(out_rds_ext))
|
||||
|
||||
cat("> Reading output file\n")
|
||||
obj_ext <- readRDS(file = out_rds_ext)
|
||||
|
||||
cat("> Checking whether Seurat object is in the right format\n")
|
||||
# Object type
|
||||
expect_is(obj_ext, "SpatialExperiment")
|
||||
# Assay structure
|
||||
expect_equal(names(slot(obj_ext, "assays")), "counts")
|
||||
# Spatial coordinates
|
||||
expect_equal(spatialCoordsNames(obj_ext), c("x_centroid", "y_centroid"))
|
||||
# Alternative experiments
|
||||
expect_equal(altExpNames(obj_ext), c("NegControlProbe"))
|
||||
# Metadata components
|
||||
expect_true(length(metadata(obj_ext)) == 0)
|
||||
|
||||
dim_rds_ext <- dim(obj_ext)
|
||||
expect_true(identical(dim_rds_ext[2], dim_input[2]))
|
||||
expect_false(identical(dim_rds_ext[1], dim_input[1]))
|
||||
72
src/filter/subset_cosmx/config.vsh.yaml
Normal file
72
src/filter/subset_cosmx/config.vsh.yaml
Normal file
@@ -0,0 +1,72 @@
|
||||
name: "subset_cosmx"
|
||||
scope: "private"
|
||||
namespace: "filter"
|
||||
description: |
|
||||
Filters the output from NanoString experiment to keep only a subset of the fields of view.
|
||||
Expected input folder structure:
|
||||
path/to/dataset/
|
||||
├── CellComposite/
|
||||
├── CellLabels/
|
||||
├── CellOverlay/
|
||||
├── CompartmentLabels/
|
||||
├── <dataset_id>_exprMat_file.csv
|
||||
├── <dataset_id>_fov_positions_file.csv
|
||||
├── <dataset_id>_metadata_file.csv
|
||||
└── <dataset_id>_tx_file.csv
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
arguments:
|
||||
- name: "--input"
|
||||
alternatives: ["-i"]
|
||||
type: file
|
||||
description: Input folder. Must contain the output from a NanoString CosMx run.
|
||||
example: cosmx_data
|
||||
direction: input
|
||||
required: true
|
||||
- name: "--num_fovs"
|
||||
type: integer
|
||||
required: true
|
||||
description: Number of fields of views to keep. Will keep only the first <num_fovs> fields of view.
|
||||
- name: "--subset_transcripts_file"
|
||||
type: boolean
|
||||
default: true
|
||||
description: Whether to subset the <dataset_id>_tx_file.csv file.
|
||||
- name: "--subset_polygons_file"
|
||||
type: boolean
|
||||
default: true
|
||||
description: Whether to subset the <dataset_id>_polygons.csv file.
|
||||
- name: "--output"
|
||||
alternatives: ["-o"]
|
||||
type: file
|
||||
description: The directory where the subset data will be stored.
|
||||
example: "cosmx_data_tiny"
|
||||
direction: output
|
||||
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/cosmx/Lung5_Rep2_tiny/
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- type: python
|
||||
__merge__: [ /src/base/requirements/squidpy.yaml ]
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowmem, singlecpu]
|
||||
69
src/filter/subset_cosmx/script.py
Normal file
69
src/filter/subset_cosmx/script.py
Normal file
@@ -0,0 +1,69 @@
|
||||
import os
|
||||
import shutil
|
||||
import pandas as pd
|
||||
import glob
|
||||
import sys
|
||||
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"input": "./resources_test/cosmx/Lung5_Rep2",
|
||||
"output": "./resources_test/cosmx/Lung5_Rep2_tiny/",
|
||||
"subset_transcripts_file": True,
|
||||
"subset_polygons_file": False,
|
||||
"num_fovs": 5,
|
||||
}
|
||||
meta = {"resources_dir": "src/utils"}
|
||||
## VIASH END
|
||||
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
|
||||
def find_matrix_file(suffix):
|
||||
pattern = os.path.join(par["input"], f"*{suffix}")
|
||||
files = glob.glob(pattern)
|
||||
assert len(files) == 1, (
|
||||
f"Only one file matching pattern {pattern} should be present"
|
||||
)
|
||||
return files[0]
|
||||
|
||||
|
||||
kept_fovs = list(range(1, par["num_fovs"] + 1))
|
||||
|
||||
os.makedirs(par["output"], exist_ok=True)
|
||||
|
||||
# Images
|
||||
image_dirs = ["CellComposite", "CellLabels", "CellOverlay", "CompartmentLabels"]
|
||||
|
||||
for image_dir in image_dirs:
|
||||
logger.info(f"Subsetting {image_dir}, keeping fovs {kept_fovs}")
|
||||
os.makedirs(f"{par['output']}/{image_dir}", exist_ok=True)
|
||||
for fov in kept_fovs:
|
||||
fov_str = f"{image_dir}_F{fov:03d}.*"
|
||||
|
||||
file_path = glob.glob(os.path.join(par["input"], image_dir, fov_str))
|
||||
assert len(file_path) == 1
|
||||
shutil.copy2(file_path[0], os.path.join(par["output"], image_dir))
|
||||
|
||||
# Matrices
|
||||
counts_file = find_matrix_file("exprMat_file.csv")
|
||||
fov_file = find_matrix_file("fov_positions_file.csv")
|
||||
meta_file = find_matrix_file("metadata_file.csv")
|
||||
|
||||
matrices = [counts_file, fov_file, meta_file]
|
||||
if par["subset_transcripts_file"]:
|
||||
tx_file = find_matrix_file("tx_file.csv")
|
||||
matrices.append(tx_file)
|
||||
if par["subset_polygons_file"]:
|
||||
polygons_file = find_matrix_file("polygons.csv")
|
||||
matrices.append(polygons_file)
|
||||
|
||||
for matrix in matrices:
|
||||
logger.info(f"Subsetting {matrix}, keeping fovs {kept_fovs}")
|
||||
data = pd.read_csv(matrix)
|
||||
data_tiny = data[data["fov"].isin(kept_fovs)]
|
||||
data_tiny.to_csv(os.path.join(par["output"], os.path.basename(matrix)), index=False)
|
||||
48
src/filter/subset_cosmx/test.py
Normal file
48
src/filter/subset_cosmx/test.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import os
|
||||
import sys
|
||||
import pytest
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def test_simple_execution(run_component, tmp_path):
|
||||
output_path = tmp_path / "output"
|
||||
dataset_id = "Lung5_Rep2"
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
meta["resources_dir"] + "/Lung5_Rep2_tiny",
|
||||
"--subset_transcripts_file",
|
||||
"True",
|
||||
"--subset_polygons_file",
|
||||
"False",
|
||||
"--num_fovs",
|
||||
"2",
|
||||
"--output",
|
||||
output_path,
|
||||
]
|
||||
)
|
||||
|
||||
assert os.path.exists(output_path), "Output folder was not created"
|
||||
|
||||
counts_file = output_path / f"{dataset_id}_exprMat_file.csv"
|
||||
fov_file = output_path / f"{dataset_id}_fov_positions_file.csv"
|
||||
meta_file = output_path / f"{dataset_id}_metadata_file.csv"
|
||||
tx_file = output_path / f"{dataset_id}_tx_file.csv"
|
||||
|
||||
matrices = [counts_file, fov_file, meta_file, tx_file]
|
||||
images = ["CellComposite", "CellLabels", "CellOverlay", "CompartmentLabels"]
|
||||
|
||||
for image in images:
|
||||
assert os.path.exists(output_path / image), f"{image} folder was not created"
|
||||
assert len(os.listdir(output_path / image)) == 2, (
|
||||
f"{image} folder should contain 2 files"
|
||||
)
|
||||
|
||||
for matrix in matrices:
|
||||
assert os.path.exists(matrix), f"{matrix} file was not created"
|
||||
data = pd.read_csv(matrix)
|
||||
data["fov"].value_counts().shape[0] == 2, f"{matrix} should contain 2 fovs"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
212
src/mapping/spaceranger_count/config.vsh.yaml
Normal file
212
src/mapping/spaceranger_count/config.vsh.yaml
Normal file
@@ -0,0 +1,212 @@
|
||||
name: spaceranger_count
|
||||
namespace: mapping
|
||||
scope: public
|
||||
description: Count gene expression and protein expression reads from a single capture area.
|
||||
keywords: [spaceranger]
|
||||
links:
|
||||
documentation: https://www.10xgenomics.com/support/software/space-ranger/latest/analysis/running-pipelines/space-ranger-count
|
||||
authors:
|
||||
- __merge__: /src/authors/jakub_majercik.yaml
|
||||
roles: [ author ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- type: file
|
||||
name: --gex_reference
|
||||
required: true
|
||||
description: Path of folder containing 10x-compatible reference
|
||||
example: "/path/to/refdata-gex-GRCh38-2020-A"
|
||||
|
||||
- type: file
|
||||
name: --input
|
||||
required: true
|
||||
multiple: true
|
||||
description: |
|
||||
The fastq.gz files to align. Can also be a single directory containing fastq.gz files.
|
||||
|
||||
Individual FASTQ files should follow the naming convention of 10x Genomics:
|
||||
[Sample Name]_S[Sample Number]_L[Lane Number]_[Read Type]_001.fastq.gz
|
||||
|
||||
Where:
|
||||
[Sample Name] is the name assigned during sample preparation/sequencing
|
||||
S[Sample Number] is the sample index (usually S1, S2, etc.)
|
||||
L[Lane Number] identifies the sequencing lane (L001, L002, etc.)
|
||||
|
||||
[Read Type] will be one of:
|
||||
R1 - Read 1 (contains the spatial barcode and UMI)
|
||||
R2 - Read 2 (contains the actual cDNA sequence)
|
||||
|
||||
example: [ "sample_S1_L001_R1_001.fastq.gz", "sample_S1_L001_R2_001.fastq.gz" ]
|
||||
|
||||
- type: file
|
||||
name: --probe_set
|
||||
required: true
|
||||
description: CSV file specifying the probe set used
|
||||
example: "Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv"
|
||||
|
||||
- type: file
|
||||
name: --cytaimage
|
||||
required: false
|
||||
description: |
|
||||
Brightfield image generated by the CytAssist instrument.
|
||||
When using CytAssist workflow, either this or --image must be provided.
|
||||
example: "cyta_image.tif"
|
||||
|
||||
- type: file
|
||||
name: --image
|
||||
required: false
|
||||
description: |
|
||||
H&E or fluorescence microscope image in TIFF or JPG format.
|
||||
Required for standard Visium workflow, optional when using --cytaimage for CytAssist workflow.
|
||||
example: "brightfield.tif"
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- type: file
|
||||
name: --output
|
||||
required: true
|
||||
direction: output
|
||||
description: The folder to store the alignment results
|
||||
example: "/path/to/output"
|
||||
|
||||
- name: Slide Information
|
||||
arguments:
|
||||
- type: string
|
||||
name: --slide
|
||||
description: Visium slide serial number (e.g., 'V10J25-015')
|
||||
required: false
|
||||
example: "V10J25-015"
|
||||
|
||||
- type: string
|
||||
name: --area
|
||||
description: Visium capture area identifier (e.g., 'A1')
|
||||
required: false
|
||||
example: "A1"
|
||||
|
||||
- type: string
|
||||
name: --unknown_slide
|
||||
description: |
|
||||
Use this option if the slide serial number and area were entered incorrectly on the CytAssist
|
||||
instrument and the correct values are unknown. Not compatible with --slide, --area, or
|
||||
--slide-file options
|
||||
required: false
|
||||
choices: [visium-1, visium-2, visium-2-large, visium-hd]
|
||||
|
||||
- type: file
|
||||
name: --slidefile
|
||||
description: Slide design file for offline use
|
||||
required: false
|
||||
example: "slide_design.gpr"
|
||||
|
||||
- type: boolean_true
|
||||
name: --override_id
|
||||
description: Overrides the slide serial number and capture area provided in the Cytassist image metadata
|
||||
|
||||
- name: Image Options
|
||||
arguments:
|
||||
- type: file
|
||||
name: --darkimage
|
||||
description: Multi-channel, dark-background fluorescence image
|
||||
required: false
|
||||
example: "fluorescence.tif"
|
||||
|
||||
- type: file
|
||||
name: --colorizedimage
|
||||
description: Color image representing pre-colored dark-background fluorescence images
|
||||
required: false
|
||||
example: "colored_fluorescence.tif"
|
||||
|
||||
- type: integer
|
||||
name: --dapi_index
|
||||
description: Index of DAPI channel (1-indexed) of fluorescence image
|
||||
required: false
|
||||
example: 1
|
||||
min: 1
|
||||
|
||||
- type: double
|
||||
name: --image_scale
|
||||
description: Microns per microscope image pixel
|
||||
required: false
|
||||
example: 0.65
|
||||
min: 0.01
|
||||
max: 10
|
||||
|
||||
- type: boolean
|
||||
name: --reorient_images
|
||||
default: true
|
||||
description: Whether to rotate and mirror image to align fiducial pattern
|
||||
|
||||
- name: Processing Options
|
||||
arguments:
|
||||
- type: boolean
|
||||
name: --create_bam
|
||||
required: true
|
||||
description: Enable or disable BAM file generation
|
||||
default: true
|
||||
|
||||
- type: boolean_true
|
||||
name: --nosecondary
|
||||
description: Disable secondary analysis (e.g., clustering)
|
||||
|
||||
- type: integer
|
||||
name: --r1_length
|
||||
required: false
|
||||
description: Hard trim the input Read 1 to this length before analysis
|
||||
min: 1
|
||||
|
||||
- type: integer
|
||||
name: --r2_length
|
||||
required: false
|
||||
description: Hard trim the input Read 2 to this length before analysis
|
||||
min: 1
|
||||
|
||||
- type: boolean
|
||||
name: --filter_probes
|
||||
default: true
|
||||
description: Whether to filter the probe set using the "included" column
|
||||
|
||||
- type: integer
|
||||
name: --custom_bin_size
|
||||
description: Bin Visium HD data to specified size in microns (4-100, even values only) in addition to the standard binning size (2 µm, 8 µm, 16 µm)
|
||||
min: 4
|
||||
max: 100
|
||||
|
||||
- name: Input Selection
|
||||
arguments:
|
||||
- type: string
|
||||
name: --project
|
||||
required: false
|
||||
description: Project folder name within mkfastq output
|
||||
|
||||
- type: string
|
||||
name: --sample
|
||||
required: false
|
||||
description: Prefix of FASTQ filenames to select
|
||||
|
||||
- type: integer
|
||||
name: --lanes
|
||||
multiple: true
|
||||
required: false
|
||||
description: Only use FASTQs from selected lanes
|
||||
example: [1,2,3]
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/visium
|
||||
- path: /resources_test/GRCh38
|
||||
engines:
|
||||
- type: docker
|
||||
image: ghcr.io/data-intuitive/spaceranger:3.1
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
DEBIAN_FRONTEND=noninteractive apt update && \
|
||||
apt upgrade -y && apt install -y procps && rm -rf /var/lib/apt/lists/*
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
100
src/mapping/spaceranger_count/script.sh
Normal file
100
src/mapping/spaceranger_count/script.sh
Normal file
@@ -0,0 +1,100 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
## VIASH START
|
||||
par_input='resources_test/visium/Visium_FFPE_Human_Ovarian_Cancer_fastqs'
|
||||
par_image='resources_test/visium/Visium_FFPE_Human_Ovarian_Cancer_image.jpg'
|
||||
par_output='spaceranger_test'
|
||||
par_gex_reference='resources_test/GRCh38'
|
||||
par_probe_set='resources_test/visium/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv'
|
||||
par_slide='V10L13-020'
|
||||
par_area='D1'
|
||||
par_create_bam='false'
|
||||
## VIASH END
|
||||
|
||||
unset_if_false=(
|
||||
par_override_id
|
||||
par_nosecondary
|
||||
)
|
||||
|
||||
for par in ${unset_if_false[@]}; do
|
||||
test_val="${!par}"
|
||||
[[ "$test_val" == "false" ]] && unset $par
|
||||
done
|
||||
|
||||
# Make sure paths are absolute
|
||||
par_gex_reference=`realpath $par_gex_reference`
|
||||
par_output=`realpath $par_output`
|
||||
par_probe_set=`realpath $par_probe_set`
|
||||
[[ -n "${par_image:-}" ]] && par_image=$(realpath "$par_image")
|
||||
[[ -n "${par_cytaimage:-}" ]] && par_cytaimage=$(realpath "$par_cytaimage")
|
||||
|
||||
# create temporary directory
|
||||
tmpdir=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXXXX")
|
||||
function clean_up {
|
||||
rm -rf "$tmpdir"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
# process inputs
|
||||
# for every fastq file found, make a symlink into the tempdir
|
||||
fastq_dir="$tmpdir/fastqs"
|
||||
mkdir -p "$fastq_dir"
|
||||
IFS=";"
|
||||
for var in $par_input; do
|
||||
unset IFS
|
||||
abs_path=`realpath $var`
|
||||
if [ -d "$abs_path" ]; then
|
||||
find "$abs_path" -name *.fastq.gz -exec ln -s {} "$fastq_dir" \;
|
||||
else
|
||||
ln -s "$abs_path" "$fastq_dir"
|
||||
fi
|
||||
done
|
||||
|
||||
# process reference
|
||||
if file $par_gex_reference | grep -q 'gzip compressed data'; then
|
||||
echo "Untarring genome"
|
||||
reference_dir="$tmpdir/fastqs"
|
||||
mkdir -p "$reference_dir"
|
||||
tar -xvf "$par_gex_reference" -C "$reference_dir" --strip-components=1
|
||||
par_gex_reference="$reference_dir"
|
||||
fi
|
||||
|
||||
# cd into tempdir
|
||||
cd "$tmpdir"
|
||||
|
||||
temp_id="spaceranger_run"
|
||||
|
||||
spaceranger count \
|
||||
--id="$temp_id" \
|
||||
--fastqs="$fastq_dir" \
|
||||
--transcriptome="$par_gex_reference" \
|
||||
${par_probe_set:+--probe-set="$par_probe_set"} \
|
||||
${par_cytaimage:+--cytaimage="$par_cytaimage"} \
|
||||
${par_image:+--image="$par_image"} \
|
||||
${par_slide:+--slide="$par_slide"} \
|
||||
${par_area:+--area="$par_area"} \
|
||||
${par_unknown_slide:+--unknown-slide="$par_unknown_slide"} \
|
||||
${par_slidefile:+--slidefile="$par_slidefile"} \
|
||||
${par_override_id:+--override-id} \
|
||||
${par_darkimage:+--darkimage="$par_darkimage"} \
|
||||
${par_colorizedimage:+--colorizedimage="$par_colorizedimage"} \
|
||||
${par_dapi_index:+--dapi-index="$par_dapi_index"} \
|
||||
${par_image_scale:+--image-scale="$par_image_scale"} \
|
||||
${par_reorient_images:+--reorient-images="$par_reorient_images"} \
|
||||
${par_create_bam:+--create-bam="$par_create_bam"} \
|
||||
${par_nosecondary:+--nosecondary} \
|
||||
${par_r1_length:+--r1-length="$par_r1_length"} \
|
||||
${par_r2_length:+--r2-length="$par_r2_length"} \
|
||||
${par_filter_probes:+--filter-probes="$par_filter_probes"} \
|
||||
${par_custom_bin_size:+--custom-bin-size="$par_custom_bin_size"} \
|
||||
${par_project:+--project="$par_project"} \
|
||||
${par_sample:+--sample="$par_sample"} \
|
||||
${par_lanes:+--lanes="$par_lanes"} \
|
||||
${meta_cpus:+--localcores="$meta_cpus"} \
|
||||
${meta_memory_gb:+--localmem=$(($meta_memory_gb-2))}
|
||||
|
||||
mkdir -p "$par_output"
|
||||
mv -f "$temp_id"/outs/* "$par_output"/
|
||||
rm -rf "$temp_id"/outs
|
||||
138
src/mapping/spaceranger_count/test.py
Normal file
138
src/mapping/spaceranger_count/test.py
Normal file
@@ -0,0 +1,138 @@
|
||||
import sys
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
|
||||
## VIASH START
|
||||
meta = {"name": "spaceranger_count", "resources_dir": "resources_test"}
|
||||
## VIASH END
|
||||
|
||||
input = meta["resources_dir"] + "/visium/Visium_FFPE_Human_Ovarian_Cancer_tiny/"
|
||||
probe_set = (
|
||||
meta["resources_dir"] + "/visium/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv"
|
||||
)
|
||||
image = (
|
||||
meta["resources_dir"] + "/visium/Visium_FFPE_Human_Ovarian_Cancer_image_tiny.jpg"
|
||||
)
|
||||
reference = meta["resources_dir"] + "/GRCh38"
|
||||
|
||||
|
||||
def test_simple_execution(run_component, random_path):
|
||||
output = random_path()
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
input,
|
||||
"--gex_reference",
|
||||
reference,
|
||||
"--probe_set",
|
||||
probe_set,
|
||||
"--image",
|
||||
image,
|
||||
"--area",
|
||||
"D1",
|
||||
"--slide",
|
||||
"V10L13-020",
|
||||
"--create_bam",
|
||||
"false",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
|
||||
assert (output / "filtered_feature_bc_matrix.h5").is_file(), (
|
||||
"No filtered .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "raw_feature_bc_matrix.h5").is_file(), (
|
||||
"No raw .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "metrics_summary.csv").is_file(), "No metrics summary was created."
|
||||
|
||||
assert (output / "web_summary.html").is_file(), "No web summary was created."
|
||||
|
||||
|
||||
def test_with_fastqs(run_component, random_path):
|
||||
output = random_path()
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
Path(input + "/Visium_FFPE_Human_Ovarian_Cancer_S1_L001_R1_001.fastq.gz"),
|
||||
"--input",
|
||||
Path(input + "/Visium_FFPE_Human_Ovarian_Cancer_S1_L001_R2_001.fastq.gz"),
|
||||
"--gex_reference",
|
||||
reference,
|
||||
"--probe_set",
|
||||
probe_set,
|
||||
"--image",
|
||||
image,
|
||||
"--area",
|
||||
"D1",
|
||||
"--slide",
|
||||
"V10L13-020",
|
||||
"--create_bam",
|
||||
"false",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
|
||||
assert (output / "filtered_feature_bc_matrix.h5").is_file(), (
|
||||
"No filtered .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "raw_feature_bc_matrix.h5").is_file(), (
|
||||
"No raw .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "metrics_summary.csv").is_file(), "No metrics summary was created."
|
||||
|
||||
assert (output / "web_summary.html").is_file(), "No web summary was created."
|
||||
|
||||
|
||||
def test_with_optional_params(run_component, random_path):
|
||||
output = random_path()
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
input,
|
||||
"--gex_reference",
|
||||
reference,
|
||||
"--probe_set",
|
||||
probe_set,
|
||||
"--image",
|
||||
image,
|
||||
"--area",
|
||||
"D1",
|
||||
"--slide",
|
||||
"V10L13-020",
|
||||
"--nosecondary",
|
||||
"true",
|
||||
"--r1_length",
|
||||
"100",
|
||||
"--r2_length",
|
||||
"100",
|
||||
"filter_probes",
|
||||
"false",
|
||||
"--create_bam",
|
||||
"true",
|
||||
"--output",
|
||||
output,
|
||||
]
|
||||
)
|
||||
|
||||
assert (output / "filtered_feature_bc_matrix.h5").is_file(), (
|
||||
"No filtered .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "raw_feature_bc_matrix.h5").is_file(), (
|
||||
"No raw .h5 count matrix was created."
|
||||
)
|
||||
|
||||
assert (output / "metrics_summary.csv").is_file(), "No metrics summary was created."
|
||||
|
||||
assert (output / "web_summary.html").is_file(), "No web summary was created."
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
89
src/neighbors/spatial_neighborhood_graph/config.vsh.yaml
Normal file
89
src/neighbors/spatial_neighborhood_graph/config.vsh.yaml
Normal file
@@ -0,0 +1,89 @@
|
||||
name: spatial_neighborhood_graph
|
||||
namespace: neighbors
|
||||
scope: public
|
||||
description: Calculates a spatial neighborhood graph based on the spatial coordinates.
|
||||
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: Input H5MU file
|
||||
- name: "--modality"
|
||||
description: |
|
||||
Which modality from the input MuData file to process.
|
||||
type: string
|
||||
default: "rna"
|
||||
required: false
|
||||
- name: "--input_obsm_spatial_coords"
|
||||
type: string
|
||||
default: "spatial"
|
||||
description: "Key in adata.obsm where spatial coordinates are stored"
|
||||
|
||||
- name: "Spatial Neighbors Calculation"
|
||||
arguments:
|
||||
- name: "--coord_type"
|
||||
type: string
|
||||
default: "generic"
|
||||
choices: ["generic", "grid"]
|
||||
description: |
|
||||
Type of coordinate system. Valid options are:
|
||||
`grid` - grid coordinates. Only relevant for lattice-based layouts (e.g. Visium). Builds a graph assuming a regular grid topology, where neighbors aredefined by grid adjacency rather than distance.
|
||||
`generic` - Generic coordinates. Recommended for imaging-based data. This option is appropriate for irregularly spaced points, such as cell centroids or spot coordinates from imaging-based assays (e.g. Xenium, CosMx). Setting this option avoids grid-assumption artifacts and ensures each node has spatial neighbors as defined by distance.
|
||||
- name: "--n_spatial_neighbors"
|
||||
type: integer
|
||||
default: 6
|
||||
min: 1
|
||||
description: |
|
||||
Depending on `--coord_type`:
|
||||
`grid` - number of neighboring tiles.
|
||||
`generic` - number of neighborhoods for non-grid data. Only used when `--delaunay False`.
|
||||
- name: "--delaunay"
|
||||
type: boolean
|
||||
default: false
|
||||
description: |
|
||||
Whether to use Delaunay triangulation to determine spatial neighborhood graph.
|
||||
Only used when `--coord_type generic`.
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: --output
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: Output H5MU file path.
|
||||
example: output.h5mu
|
||||
__merge__: [., /src/base/h5_compression_argument.yaml]
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
|
||||
test_resources:
|
||||
- type: python_script
|
||||
path: test.py
|
||||
- path: /resources_test/xenium/xenium_tiny.h5mu
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- procps
|
||||
- type: python
|
||||
__merge__: [ /src/base/requirements/squidpy.yaml, /src/base/requirements/anndata_mudata.yaml, . ]
|
||||
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
|
||||
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
directives:
|
||||
label: [lowcpu, midmem, middisk]
|
||||
48
src/neighbors/spatial_neighborhood_graph/script.py
Normal file
48
src/neighbors/spatial_neighborhood_graph/script.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import sys
|
||||
import squidpy as sq
|
||||
import mudata as mu
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
# Inputs
|
||||
"input": "resources_test/cosmx/Lung5_Rep2_tiny.h5mu",
|
||||
"modality": "rna",
|
||||
"input_obsm_spatial_coords": "spatial",
|
||||
## Spatial neighbor calculation
|
||||
"n_spatial_neighbors": 4,
|
||||
"coord_type": "generic",
|
||||
"delaunay": False,
|
||||
"output": "foo.h5mu",
|
||||
}
|
||||
|
||||
meta = {"resources_dir": "src/utils/"}
|
||||
## VIASH END
|
||||
|
||||
sys.path.append(meta["resources_dir"])
|
||||
from setup_logger import setup_logger
|
||||
|
||||
logger = setup_logger()
|
||||
|
||||
## Read in data
|
||||
adata = mu.read_h5ad(par["input"], mod=par["modality"])
|
||||
|
||||
## Compute spatial neighbor graph
|
||||
logger.info("Computing spatial neighbor graph...")
|
||||
sq.gr.spatial_neighbors(
|
||||
adata,
|
||||
coord_type=par["coord_type"],
|
||||
spatial_key=par["input_obsm_spatial_coords"],
|
||||
n_neighs=par["n_spatial_neighbors"],
|
||||
delaunay=par["delaunay"],
|
||||
)
|
||||
|
||||
# Making the connectivity matrix symmetric
|
||||
logger.info("Making the connectivity matrix symmetric...")
|
||||
adata.obsp["spatial_connectivities"] = adata.obsp["spatial_connectivities"].maximum(
|
||||
adata.obsp["spatial_connectivities"].T
|
||||
)
|
||||
|
||||
## Save model and data
|
||||
logger.info("Saving output data...")
|
||||
mdata = mu.MuData({par["modality"]: adata})
|
||||
mdata.write_h5mu(par["output"], compression=par["output_compression"])
|
||||
45
src/neighbors/spatial_neighborhood_graph/test.py
Normal file
45
src/neighbors/spatial_neighborhood_graph/test.py
Normal file
@@ -0,0 +1,45 @@
|
||||
import pytest
|
||||
import mudata as mu
|
||||
import sys
|
||||
|
||||
## VIASH START
|
||||
meta = {
|
||||
"executable": "./target/executable/neighbors/spatial_neighborhood_graph/spatial_neighborhood_graph",
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
input_xenium = f"{meta['resources_dir']}/xenium_tiny.h5mu"
|
||||
input_cosmx = f"{meta['resources_dir']}/Lung5_Rep2_tiny.h5mu"
|
||||
|
||||
|
||||
def test_simple_execution_xenium(run_component, tmp_path):
|
||||
output = tmp_path / "nc_xenium.h5mu"
|
||||
|
||||
# run component
|
||||
run_component(
|
||||
[
|
||||
"--input",
|
||||
input_xenium,
|
||||
"--output",
|
||||
str(output),
|
||||
"--output_compression",
|
||||
"gzip",
|
||||
]
|
||||
)
|
||||
|
||||
assert output.is_file(), "output file was not created"
|
||||
mdata = mu.read_h5mu(output)
|
||||
assert list(mdata.mod.keys()) == ["rna"], "Expected modality rna"
|
||||
adata = mdata.mod["rna"]
|
||||
|
||||
expected_obsp_keys = ["spatial_connectivities", "spatial_distances"]
|
||||
assert all([obsp in adata.obsp.keys() for obsp in expected_obsp_keys]), (
|
||||
"Not all expected obsp keys found"
|
||||
)
|
||||
assert all(adata.obsp[obsp].dtype.kind == "f" for obsp in expected_obsp_keys), (
|
||||
"Expected obsp matrices to be float type"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__]))
|
||||
87
src/utils/compress_h5mu.py
Normal file
87
src/utils/compress_h5mu.py
Normal file
@@ -0,0 +1,87 @@
|
||||
import shutil
|
||||
from anndata import AnnData
|
||||
from mudata import write_h5ad
|
||||
from h5py import File as H5File
|
||||
from h5py import Group, Dataset
|
||||
from pathlib import Path
|
||||
from typing import Union, Literal
|
||||
from functools import partial
|
||||
|
||||
|
||||
def compress_h5mu(
|
||||
input_path: Union[str, Path],
|
||||
output_path: Union[str, Path],
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
):
|
||||
input_path, output_path = str(input_path), str(output_path)
|
||||
|
||||
def copy_attributes(in_object, out_object):
|
||||
for key, value in in_object.attrs.items():
|
||||
out_object.attrs[key] = value
|
||||
|
||||
def visit_path(
|
||||
output_h5: H5File,
|
||||
compression: Union[Literal["gzip"], Literal["lzf"]],
|
||||
name: str,
|
||||
object: Union[Group, Dataset],
|
||||
):
|
||||
if isinstance(object, Group):
|
||||
new_group = output_h5.create_group(name)
|
||||
copy_attributes(object, new_group)
|
||||
elif isinstance(object, Dataset):
|
||||
# Compression only works for non-scalar Dataset objects
|
||||
# Scalar objects dont have a shape defined
|
||||
if not object.compression and object.shape not in [None, ()]:
|
||||
new_dataset = output_h5.create_dataset(
|
||||
name, data=object, compression=compression
|
||||
)
|
||||
copy_attributes(object, new_dataset)
|
||||
else:
|
||||
output_h5.copy(object, name)
|
||||
else:
|
||||
raise NotImplementedError(
|
||||
f"Could not copy element {name}, "
|
||||
f"type has not been implemented yet: {type(object)}"
|
||||
)
|
||||
|
||||
with (
|
||||
H5File(input_path, "r") as input_h5,
|
||||
H5File(output_path, "w", userblock_size=512) as output_h5,
|
||||
):
|
||||
copy_attributes(input_h5, output_h5)
|
||||
input_h5.visititems(partial(visit_path, output_h5, compression))
|
||||
|
||||
with open(input_path, "rb") as input_bytes:
|
||||
# Mudata puts metadata like this in the first 512 bytes:
|
||||
# MuData (format-version=0.1.0;creator=muon;creator-version=0.2.0)
|
||||
# See mudata/_core/io.py, read_h5mu() function
|
||||
starting_metadata = input_bytes.read(100)
|
||||
# The metadata is padded with extra null bytes up until 512 bytes
|
||||
truncate_location = starting_metadata.find(b"\x00")
|
||||
starting_metadata = starting_metadata[:truncate_location]
|
||||
with open(output_path, "br+") as f:
|
||||
nbytes = f.write(starting_metadata)
|
||||
f.write(b"\0" * (512 - nbytes))
|
||||
|
||||
|
||||
def write_h5ad_to_h5mu_with_compression(
|
||||
output_file: Union[str, Path],
|
||||
h5mu: Union[str, Path],
|
||||
modality_name: str,
|
||||
modality_data: AnnData,
|
||||
output_compression=None,
|
||||
):
|
||||
output_file = Path(output_file)
|
||||
h5mu = Path(h5mu)
|
||||
output_file_uncompressed = (
|
||||
output_file.with_name(output_file.stem + "_uncompressed.h5mu")
|
||||
if output_compression
|
||||
else output_file
|
||||
)
|
||||
shutil.copyfile(h5mu, output_file_uncompressed)
|
||||
write_h5ad(filename=output_file_uncompressed, mod=modality_name, data=modality_data)
|
||||
if output_compression:
|
||||
compress_h5mu(
|
||||
output_file_uncompressed, output_file, compression=output_compression
|
||||
)
|
||||
output_file_uncompressed.unlink()
|
||||
12
src/utils/setup_logger.py
Normal file
12
src/utils/setup_logger.py
Normal file
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
22
src/utils/unzip_archived_folder.R
Normal file
22
src/utils/unzip_archived_folder.R
Normal file
@@ -0,0 +1,22 @@
|
||||
extract_selected_files <- function(zip_path, members) {
|
||||
# Create a temporary directory for extraction
|
||||
temp_dir <- tempfile("unzip_dir_")
|
||||
dir.create(temp_dir)
|
||||
|
||||
# List all files in the archive
|
||||
all_files <- utils::unzip(zip_path, list = TRUE)$Name
|
||||
|
||||
# Find files matching any of the glob patterns in 'members'
|
||||
selected <- unique(unlist(
|
||||
lapply(members, function(pattern) {
|
||||
regex <- glob2rx(pattern)
|
||||
grep(regex, all_files, value = TRUE)
|
||||
})
|
||||
))
|
||||
|
||||
# Extract only the selected files
|
||||
utils::unzip(zip_path, files = selected, exdir = temp_dir)
|
||||
|
||||
# Return the path to the extracted folder
|
||||
file.path(temp_dir)
|
||||
}
|
||||
50
src/utils/unzip_archived_folder.py
Normal file
50
src/utils/unzip_archived_folder.py
Normal file
@@ -0,0 +1,50 @@
|
||||
import fnmatch
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Union
|
||||
import zipfile_inflate64 as zipfile
|
||||
|
||||
|
||||
def unzip_archived_folder(archived_folder: Union[str, Path]) -> Union[str, Path]:
|
||||
"""
|
||||
Extracts a ZIP archive to a temporary directory and returns the path to the extracted folder.
|
||||
|
||||
Args:
|
||||
zip_path (Union[str, Path]): Path to the ZIP archive.
|
||||
|
||||
Returns:
|
||||
extracted_path (Union[str, Path]): Path to the extracted folder inside the temporary directory.
|
||||
"""
|
||||
|
||||
temp_dir = Path(tempfile.TemporaryDirectory().name)
|
||||
with zipfile.ZipFile(archived_folder, "r") as archive:
|
||||
archive.extractall(temp_dir)
|
||||
|
||||
return temp_dir / Path(archived_folder).stem
|
||||
|
||||
|
||||
def extract_selected_files_from_zip(
|
||||
zip_path: Union[str, Path], members: list[Union[str, Path]]
|
||||
) -> Union[str, Path]:
|
||||
"""
|
||||
Extracts selected files (supports glob patterns) from a ZIP archive to a temporary directory.
|
||||
|
||||
Args:
|
||||
zip_path (Union[str, Path]): Path to the ZIP archive.
|
||||
members (list[str]): List of file paths within the archive to extract.
|
||||
|
||||
Returns:
|
||||
Path: Path to the extraction directory.
|
||||
"""
|
||||
|
||||
temp_dir = Path(tempfile.TemporaryDirectory().name)
|
||||
|
||||
with zipfile.ZipFile(zip_path, "r") as archive:
|
||||
all_files = archive.namelist()
|
||||
selected = set()
|
||||
for pattern in members:
|
||||
selected.update(fnmatch.filter(all_files, str(pattern)))
|
||||
for member in selected:
|
||||
archive.extract(member, temp_dir)
|
||||
|
||||
return temp_dir
|
||||
211
src/workflows/ingestion/spaceranger_mapping/config.vsh.yaml
Normal file
211
src/workflows/ingestion/spaceranger_mapping/config.vsh.yaml
Normal file
@@ -0,0 +1,211 @@
|
||||
name: "spaceranger_mapping"
|
||||
namespace: "workflows/ingestion"
|
||||
scope: "public"
|
||||
description: "A pipeline for running SpaceRanger mapping."
|
||||
info:
|
||||
name: SpaceRanger mapping
|
||||
test_dependencies:
|
||||
- name: spaceranger_mapping_test
|
||||
namespace: test_workflows/ingestion
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ maintainer ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--id"
|
||||
required: true
|
||||
type: string
|
||||
description: ID of the sample.
|
||||
example: foo
|
||||
- name: --input
|
||||
type: file
|
||||
required: true
|
||||
multiple: true
|
||||
description: |
|
||||
The fastq.gz files to align. Can also be a single directory containing fastq.gz files.
|
||||
|
||||
Individual FASTQ files should follow the naming convention of 10x Genomics:
|
||||
[Sample Name]_S[Sample Number]_L[Lane Number]_[Read Type]_001.fastq.gz
|
||||
|
||||
Where:
|
||||
[Sample Name] is the name assigned during sample preparation/sequencing
|
||||
S[Sample Number] is the sample index (usually S1, S2, etc.)
|
||||
L[Lane Number] identifies the sequencing lane (L001, L002, etc.)
|
||||
|
||||
[Read Type] will be one of:
|
||||
R1 - Read 1 (contains the spatial barcode and UMI)
|
||||
R2 - Read 2 (contains the actual cDNA sequence)
|
||||
|
||||
example: [ "sample_S1_L001_R1_001.fastq.gz", "sample_S1_L001_R2_001.fastq.gz" ]
|
||||
- name: --gex_reference
|
||||
type: file
|
||||
required: true
|
||||
description: Path of folder containing 10x-compatible reference
|
||||
example: "/path/to/refdata-gex-GRCh38-2020-A"
|
||||
- name: --probe_set
|
||||
type: file
|
||||
required: true
|
||||
description: CSV file specifying the probe set used
|
||||
example: "Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv"
|
||||
- name: --cytaimage
|
||||
type: file
|
||||
required: false
|
||||
description: |
|
||||
Brightfield image generated by the CytAssist instrument.
|
||||
When using CytAssist workflow, either this or --image must be provided.
|
||||
example: "cyta_image.tif"
|
||||
- name: --image
|
||||
type: file
|
||||
required: false
|
||||
description: |
|
||||
H&E or fluorescence microscope image in TIFF or JPG format.
|
||||
Required for standard Visium workflow, optional when using --cytaimage for CytAssist workflow.
|
||||
example: "brightfield.tif"
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: "--output_raw"
|
||||
type: file
|
||||
direction: output
|
||||
description: "Location where the output folder from Cell Ranger will be stored."
|
||||
required: true
|
||||
example: output_dir/
|
||||
- name: "--output_h5mu"
|
||||
type: file
|
||||
direction: output
|
||||
description: "The output from Cell Ranger, converted to h5mu."
|
||||
required: true
|
||||
example: output.h5mu
|
||||
- name: "--output_type"
|
||||
type: string
|
||||
description: "Which Cell Ranger output to use for converting to h5mu."
|
||||
choices: [ raw, filtered ]
|
||||
default: raw
|
||||
- name: "--uns_metrics"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to QC metrics (if any).
|
||||
default: "metrics_summary"
|
||||
- name: "--uns_probe_set"
|
||||
type: string
|
||||
description: Name of the .uns slot under which to store probe set information (if any).
|
||||
default: "probe_set"
|
||||
- name: "--obsm_coordinates"
|
||||
type: string
|
||||
description: Name of the .obsm slot under which to store the cell centroid coordinates.
|
||||
default: "spatial"
|
||||
- name: "--output_compression"
|
||||
type: string
|
||||
description: Compression to use when writing the h5mu file.
|
||||
choices: [ gzip, lzf ]
|
||||
|
||||
- name: Image Options
|
||||
arguments:
|
||||
- name: --darkimage
|
||||
type: file
|
||||
description: Multi-channel, dark-background fluorescence image
|
||||
required: false
|
||||
example: "fluorescence.tif"
|
||||
- name: --colorizedimage
|
||||
type: file
|
||||
description: Color image representing pre-colored dark-background fluorescence images
|
||||
required: false
|
||||
example: "colored_fluorescence.tif"
|
||||
- name: --dapi_index
|
||||
type: integer
|
||||
description: Index of DAPI channel (1-indexed) of fluorescence image
|
||||
required: false
|
||||
example: 1
|
||||
min: 1
|
||||
- name: --image_scale
|
||||
type: double
|
||||
description: Microns per microscope image pixel
|
||||
required: false
|
||||
example: 0.65
|
||||
min: 0.01
|
||||
max: 10
|
||||
- name: --reorient_images
|
||||
type: boolean
|
||||
default: true
|
||||
description: Whether to rotate and mirror image to align fiducial pattern
|
||||
|
||||
- name: Slide Information
|
||||
arguments:
|
||||
- name: --slide
|
||||
type: string
|
||||
description: Visium slide serial number (e.g., 'V10J25-015')
|
||||
required: false
|
||||
example: "V10J25-015"
|
||||
- name: --area
|
||||
type: string
|
||||
description: Visium capture area identifier (e.g., 'A1')
|
||||
required: false
|
||||
example: "A1"
|
||||
- name: --unknown_slide
|
||||
type: string
|
||||
description: |
|
||||
Use this option if the slide serial number and area were entered incorrectly on the CytAssist
|
||||
instrument and the correct values are unknown. Not compatible with --slide, --area, or
|
||||
--slide-file options
|
||||
required: false
|
||||
choices: [visium-1, visium-2, visium-2-large, visium-hd]
|
||||
- name: --slidefile
|
||||
type: file
|
||||
description: Slide design file for offline use
|
||||
required: false
|
||||
example: "slide_design.gpr"
|
||||
- name: --override_id
|
||||
type: boolean_true
|
||||
description: Overrides the slide serial number and capture area provided in the Cytassist image metadata
|
||||
|
||||
- name: SpaceRanger arguments
|
||||
arguments:
|
||||
- name: --create_bam
|
||||
type: boolean
|
||||
required: true
|
||||
description: Enable or disable BAM file generation
|
||||
default: true
|
||||
- name: --nosecondary
|
||||
type: boolean_true
|
||||
description: Disable secondary analysis (e.g., clustering)
|
||||
- name: --r1_length
|
||||
type: integer
|
||||
required: false
|
||||
description: Hard trim the input Read 1 to this length before analysis
|
||||
min: 1
|
||||
- name: --r2_length
|
||||
type: integer
|
||||
required: false
|
||||
description: Hard trim the input Read 2 to this length before analysis
|
||||
min: 1
|
||||
- name: --filter_probes
|
||||
type: boolean
|
||||
default: true
|
||||
description: Whether to filter the probe set using the "included" column
|
||||
- name: --custom_bin_size
|
||||
type: integer
|
||||
description: Bin Visium HD data to specified size in microns (4-100, even values only) in addition to the standard binning size (2 µm, 8 µm, 16 µm)
|
||||
min: 4
|
||||
max: 100
|
||||
|
||||
dependencies:
|
||||
- name: mapping/spaceranger_count
|
||||
- name: convert/from_spaceranger_to_h5mu
|
||||
|
||||
resources:
|
||||
- type: nextflow_script
|
||||
path: main.nf
|
||||
entrypoint: run_wf
|
||||
- type: file
|
||||
path: /src/workflows/utils/
|
||||
test_resources:
|
||||
- type: nextflow_script
|
||||
path: test.nf
|
||||
entrypoint: test_wf
|
||||
- path: /resources_test/visium
|
||||
- path: /resources_test/GRCh38
|
||||
runners:
|
||||
- type: nextflow
|
||||
15
src/workflows/ingestion/spaceranger_mapping/integration_test.sh
Executable file
15
src/workflows/ingestion/spaceranger_mapping/integration_test.sh
Executable file
@@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/workflows/ingestion/spaceranger_mapping/test.nf \
|
||||
-entry test_wf \
|
||||
-profile docker \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-c src/workflows/utils/integration_tests.config
|
||||
57
src/workflows/ingestion/spaceranger_mapping/main.nf
Normal file
57
src/workflows/ingestion/spaceranger_mapping/main.nf
Normal file
@@ -0,0 +1,57 @@
|
||||
workflow run_wf {
|
||||
take:
|
||||
input_ch
|
||||
|
||||
main:
|
||||
output_ch = input_ch
|
||||
| spaceranger_count.run(
|
||||
fromState: { id, state -> [
|
||||
"input": state.input,
|
||||
"gex_reference": state.gex_reference,
|
||||
"probe_set": state.probe_set,
|
||||
"cytaimage": state.cytaimage,
|
||||
"image": state.image,
|
||||
"slide": state.slide,
|
||||
"area": state.area,
|
||||
"unkown_slide": state.unkown_slide,
|
||||
"slidefile": state.slidefile,
|
||||
"override_id": state.override_id,
|
||||
"darkimage": state.darkimage,
|
||||
"colorizedimage": state.colorizedimage,
|
||||
"dapi_index": state.dapi_index,
|
||||
"image_scale": state.image_scale,
|
||||
"reorient_images": state.reorient_images,
|
||||
"create_bam": state.create_bam,
|
||||
"nosecondary": state.nosecondary,
|
||||
"r1_length": state.r1_length,
|
||||
"r2_length": state.r2_length,
|
||||
"filter_probes": state.filter_probes,
|
||||
"custom_bin_size": state.custom_bin_size,
|
||||
"output": state.output_raw,
|
||||
]},
|
||||
toState: [
|
||||
"input": "output",
|
||||
"output_raw": "output"
|
||||
]
|
||||
)
|
||||
// convert to h5mu
|
||||
| from_spaceranger_to_h5mu.run(
|
||||
fromState: {id, state ->
|
||||
[
|
||||
"input": state.input,
|
||||
"output_compression": state.output_compression,
|
||||
"output": state.output_h5mu,
|
||||
"uns_metrics": state.uns_metrics,
|
||||
"uns_probe_set": state.uns_probe_set,
|
||||
"obsm_coordinates": state.obsm_coordinates,
|
||||
"output_type": state.output_type,
|
||||
"output_compression": state.output_compression,
|
||||
]
|
||||
},
|
||||
toState: ["output_h5mu": "output"]
|
||||
)
|
||||
| setState(["output_raw", "output_h5mu"])
|
||||
|
||||
emit:
|
||||
output_ch
|
||||
}
|
||||
10
src/workflows/ingestion/spaceranger_mapping/nextflow.config
Normal file
10
src/workflows/ingestion/spaceranger_mapping/nextflow.config
Normal file
@@ -0,0 +1,10 @@
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
params {
|
||||
rootDir = java.nio.file.Paths.get("$projectDir/../../../../").toAbsolutePath().normalize().toString()
|
||||
}
|
||||
|
||||
// include common settings
|
||||
includeConfig("${params.rootDir}/src/workflows/utils/labels.config")
|
||||
42
src/workflows/ingestion/spaceranger_mapping/test.nf
Normal file
42
src/workflows/ingestion/spaceranger_mapping/test.nf
Normal file
@@ -0,0 +1,42 @@
|
||||
nextflow.enable.dsl=2
|
||||
|
||||
include { spaceranger_mapping } from params.rootDir + "/target/nextflow/workflows/ingestion/spaceranger_mapping/main.nf"
|
||||
include { spaceranger_mapping_test } from params.rootDir + "/target/_test/nextflow/test_workflows/ingestion/spaceranger_mapping_test/main.nf"
|
||||
|
||||
params.resources_test = params.rootDir + "/resources_test"
|
||||
|
||||
workflow test_wf {
|
||||
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "foo",
|
||||
input: resources_test.resolve("visium/Visium_FFPE_Human_Ovarian_Cancer_tiny"),
|
||||
gex_reference: resources_test.resolve("GRCh38"),
|
||||
image: resources_test.resolve("visium/Visium_FFPE_Human_Ovarian_Cancer_image_tiny.jpg"),
|
||||
probe_set: resources_test.resolve("visium/Visium_FFPE_Human_Ovarian_Cancer_probe_set.csv"),
|
||||
create_bam: "false",
|
||||
slide: "V10L13-020",
|
||||
area: "D1",
|
||||
output_type: "filtered",
|
||||
]
|
||||
])
|
||||
| map{ state -> [state.id, state] }
|
||||
| spaceranger_mapping
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "outputs should contain two elements; [id, out]"
|
||||
assert output[1] instanceof Map : "Output should be a Map."
|
||||
"Output: $output"
|
||||
}
|
||||
|
||||
| spaceranger_mapping_test.run(
|
||||
fromState: ["input": "output_h5mu"]
|
||||
)
|
||||
|
||||
| toSortedList()
|
||||
| map { output_list ->
|
||||
assert output_list.size() == 1 : "output channel should contain one event"
|
||||
assert output_list[0][0] == "foo" : "Output ID should be same as input ID"
|
||||
}
|
||||
}
|
||||
312
src/workflows/multiomics/spatial_process_samples/config.vsh.yaml
Normal file
312
src/workflows/multiomics/spatial_process_samples/config.vsh.yaml
Normal file
@@ -0,0 +1,312 @@
|
||||
name: "spatial_process_samples"
|
||||
namespace: "workflows/multiomics"
|
||||
scope: "public"
|
||||
description: "A pipeline to pre-process multiple spatial omics samples."
|
||||
authors:
|
||||
- __merge__: /src/authors/dries_schaumont.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ contributor ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--id"
|
||||
required: true
|
||||
type: string
|
||||
description: ID of the sample.
|
||||
example: foo
|
||||
- name: "--input"
|
||||
alternatives: [-i]
|
||||
description: Path to the sample.
|
||||
required: true
|
||||
example: input.h5mu
|
||||
type: file
|
||||
- name: "--rna_layer"
|
||||
type: string
|
||||
description: "Input layer for the gene expression modality. If not specified, .X is used."
|
||||
required: false
|
||||
- name: "--prot_layer"
|
||||
type: string
|
||||
description: "Input layer for the antibody capture modality. If not specified, .X is used."
|
||||
required: false
|
||||
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
required: true
|
||||
direction: output
|
||||
description: Destination path to the output.
|
||||
example: output.h5mu
|
||||
|
||||
- name: "Sample ID options"
|
||||
description: |
|
||||
Options for adding the id to .obs on the MuData object. Having a sample
|
||||
id present in a requirement of several components for this pipeline.
|
||||
arguments:
|
||||
- name: "--add_id_to_obs"
|
||||
description: "Add the value passed with --id to .obs."
|
||||
type: boolean
|
||||
default: true
|
||||
- name: --add_id_obs_output
|
||||
description: |
|
||||
.Obs column to add the sample IDs to. Required and only used when
|
||||
--add_id_to_obs is set to 'true'
|
||||
type: string
|
||||
default: "sample_id"
|
||||
- name: "--add_id_make_observation_keys_unique"
|
||||
type: boolean
|
||||
description: |
|
||||
Join the id to the .obs index (.obs_names).
|
||||
Only used when --add_id_to_obs is set to 'true'.
|
||||
default: true
|
||||
|
||||
- name: "RNA filtering options"
|
||||
arguments:
|
||||
- name: "--rna_min_counts"
|
||||
example: 200
|
||||
min: 1
|
||||
type: integer
|
||||
description: Minimum number of counts captured per cell.
|
||||
- name: "--rna_max_counts"
|
||||
example: 5000000
|
||||
min: 1
|
||||
type: integer
|
||||
description: Maximum number of counts captured per cell.
|
||||
- name: "--rna_min_genes_per_cell"
|
||||
type: integer
|
||||
min: 1
|
||||
example: 200
|
||||
description: Minimum of non-zero values per cell.
|
||||
- name: "--rna_max_genes_per_cell"
|
||||
example: 1500000
|
||||
min: 1
|
||||
type: integer
|
||||
description: Maximum of non-zero values per cell.
|
||||
- name: "--rna_min_cells_per_gene"
|
||||
example: 3
|
||||
min: 1
|
||||
type: integer
|
||||
description: Minimum of non-zero values per gene.
|
||||
- name: "--rna_min_fraction_mito"
|
||||
example: 0
|
||||
min: 0
|
||||
max: 1
|
||||
type: double
|
||||
description: Minimum fraction of UMIs that are mitochondrial.
|
||||
- name: "--rna_max_fraction_mito"
|
||||
type: double
|
||||
min: 0
|
||||
max: 1
|
||||
example: 0.2
|
||||
description: Maximum fraction of UMIs that are mitochondrial.
|
||||
- name: "--rna_min_fraction_ribo"
|
||||
example: 0
|
||||
min: 0
|
||||
max: 1
|
||||
type: double
|
||||
description: Minimum fraction of UMIs that are mitochondrial.
|
||||
- name: "--rna_max_fraction_ribo"
|
||||
type: double
|
||||
min: 0
|
||||
max: 1
|
||||
example: 0.2
|
||||
description: Maximum fraction of UMIs that are mitochondrial.
|
||||
|
||||
- name: "Protein filtering options"
|
||||
arguments:
|
||||
- name: "--prot_min_counts"
|
||||
description: Minimum number of counts per cell.
|
||||
type: integer
|
||||
min: 1
|
||||
example: 3
|
||||
- name: "--prot_max_counts"
|
||||
description: Minimum number of counts per cell.
|
||||
type: integer
|
||||
min: 1
|
||||
example: 5000000
|
||||
- name: "--prot_min_proteins_per_cell"
|
||||
type: integer
|
||||
min: 1
|
||||
example: 200
|
||||
description: Minimum of non-zero values per cell.
|
||||
- name: "--prot_max_proteins_per_cell"
|
||||
description: Maximum of non-zero values per cell.
|
||||
type: integer
|
||||
min: 1
|
||||
example: 100000000
|
||||
- name: "--prot_min_cells_per_protein"
|
||||
example: 3
|
||||
min: 1
|
||||
type: integer
|
||||
description: Minimum of non-zero values per protein.
|
||||
|
||||
- name: "Highly variable features detection"
|
||||
arguments:
|
||||
- name: "--highly_variable_features_var_output"
|
||||
alternatives: ["--filter_with_hvg_var_output"]
|
||||
required: false
|
||||
type: string
|
||||
default: "filter_with_hvg"
|
||||
description: In which .var slot to store a boolean array corresponding to the highly variable genes.
|
||||
- name: "--highly_variable_features_obs_batch_key"
|
||||
alternatives: ["--filter_with_hvg_obs_batch_key"]
|
||||
type: string
|
||||
default: "sample_id"
|
||||
required: false
|
||||
description: |
|
||||
If specified, highly-variable genes are selected within each batch separately and merged. This simple
|
||||
process avoids the selection of batch-specific genes and acts as a lightweight batch correction method.
|
||||
- name: "Mitochondrial & Ribosomal Gene Detection"
|
||||
arguments:
|
||||
- name: "--var_gene_names"
|
||||
required: false
|
||||
example: "gene_symbol"
|
||||
type: string
|
||||
description: |
|
||||
.var column name to be used to detect mitochondrial/ribosomal genes instead of .var_names (default if not set).
|
||||
Gene names matching with the regex value from --mitochondrial_gene_regex or --ribosomal_gene_regex will be
|
||||
identified as mitochondrial or ribosomal genes, respectively.
|
||||
- name: "--var_name_mitochondrial_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the mitochondrial genes.
|
||||
- name: "--obs_name_mitochondrial_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
When specified, write the fraction of counts originating from mitochondrial genes
|
||||
(based on --mitochondrial_gene_regex) to an .obs column with the specified name.
|
||||
Requires --var_name_mitochondrial_genes.
|
||||
- name: --mitochondrial_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies mitochondrial genes from --var_gene_names.
|
||||
By default will detect human and mouse mitochondrial genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[mM][tT]-"
|
||||
- name: "--var_name_ribosomal_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the ribosomal genes.
|
||||
- name: "--obs_name_ribosomal_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
When specified, write the fraction of counts originating from ribosomal genes
|
||||
(based on --ribosomal_gene_regex) to an .obs column with the specified name.
|
||||
Requires --var_name_ribosomal_genes.
|
||||
- name: --ribosomal_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies ribosomal genes from --var_gene_names.
|
||||
By default will detect human and mouse ribosomal genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[Mm]?[Rr][Pp][LlSs]"
|
||||
|
||||
- name: "QC metrics calculation options"
|
||||
arguments:
|
||||
- name: "--var_qc_metrics"
|
||||
description: |
|
||||
Keys to select a boolean (containing only True or False) column from .var.
|
||||
For each cell, calculate the proportion of total values for genes which are labeled 'True',
|
||||
compared to the total sum of the values for all genes. Defaults to the combined values specified for
|
||||
--var_name_mitochondrial_genes and --highly_variable_features_var_output.
|
||||
type: string
|
||||
multiple: True
|
||||
multiple_sep: ','
|
||||
required: false
|
||||
example: "ercc,highly_variable"
|
||||
- name: "--top_n_vars"
|
||||
type: integer
|
||||
description: |
|
||||
Number of top vars to be used to calculate cumulative proportions.
|
||||
If not specified, proportions are not calculated. `--top_n_vars 20,50` finds
|
||||
cumulative proportion to the 20th and 50th most expressed vars.
|
||||
multiple: true
|
||||
multiple_sep: ','
|
||||
required: false
|
||||
default: [50, 100, 200, 500]
|
||||
|
||||
- name: "PCA options"
|
||||
arguments:
|
||||
- name: "--pca_overwrite"
|
||||
type: boolean_true
|
||||
description: "Allow overwriting slots for PCA output."
|
||||
|
||||
- name: "CLR options"
|
||||
arguments:
|
||||
- name: "--clr_axis"
|
||||
type: integer
|
||||
description: "Axis to perform the CLR transformation on."
|
||||
default: 0
|
||||
required: false
|
||||
|
||||
- name: "RNA Scaling options"
|
||||
description: |
|
||||
Options for enabling scaling of the log-normalized data to unit variance and zero mean.
|
||||
The scaled data will be output a different layer and representation with reduced dimensions
|
||||
will be created and stored in addition to the non-scaled data.
|
||||
arguments:
|
||||
- name: "--rna_enable_scaling"
|
||||
description: "Enable scaling for the RNA modality."
|
||||
type: boolean_true
|
||||
- name: "--rna_scaling_output_layer"
|
||||
type: string
|
||||
default: "scaled"
|
||||
description: "Output layer where the scaled log-normalized data will be stored."
|
||||
- name: "--rna_scaling_pca_obsm_output"
|
||||
type: string
|
||||
description: |
|
||||
Name of the .obsm key where the PCA representation of the log-normalized
|
||||
and scaled data is stored.
|
||||
default: "scaled_pca"
|
||||
- name: "--rna_scaling_pca_loadings_varm_output"
|
||||
type: string
|
||||
description: |
|
||||
Name of the .varm key where the PCA loadings of the log-normalized and scaled
|
||||
data is stored.
|
||||
default: "scaled_pca_loadings"
|
||||
- name: "--rna_scaling_pca_variance_uns_output"
|
||||
type: string
|
||||
description: |
|
||||
Name of the .uns key where the variance and variance ratio will be stored as a map.
|
||||
The map will contain two keys: variance and variance_ratio respectively.
|
||||
default: "scaled_pca_variance"
|
||||
- name: "--rna_scaling_umap_obsm_output"
|
||||
type: string
|
||||
description:
|
||||
Name of the .obsm key where the UMAP representation of the log-normalized
|
||||
and scaled data is stored.
|
||||
default: "scaled_umap"
|
||||
- name: "--rna_scaling_max_value"
|
||||
description: "Clip (truncate) data to this value after scaling. If not specified, do not clip."
|
||||
required: false
|
||||
type: double
|
||||
- name: "--rna_scaling_zero_center"
|
||||
type: boolean_false
|
||||
description: If set, omit zero-centering variables, which allows to handle sparse input efficiently."
|
||||
|
||||
dependencies:
|
||||
- name: workflows/multiomics/process_samples
|
||||
alias: spatial_sample_processing
|
||||
repository: openpipeline
|
||||
|
||||
resources:
|
||||
- type: nextflow_script
|
||||
path: main.nf
|
||||
entrypoint: run_wf
|
||||
|
||||
test_resources:
|
||||
- type: nextflow_script
|
||||
path: test.nf
|
||||
entrypoint: test_wf
|
||||
- path: /resources_test/xenium/xenium_tiny.h5mu
|
||||
|
||||
runners:
|
||||
- type: nextflow
|
||||
17
src/workflows/multiomics/spatial_process_samples/integration_test.sh
Executable file
17
src/workflows/multiomics/spatial_process_samples/integration_test.sh
Executable file
@@ -0,0 +1,17 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/workflows/multiomics/spatial_process_samples/test.nf \
|
||||
-entry test_wf \
|
||||
-profile docker,no_publish \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-c src/workflows/utils/integration_tests.config
|
||||
77
src/workflows/multiomics/spatial_process_samples/main.nf
Normal file
77
src/workflows/multiomics/spatial_process_samples/main.nf
Normal file
@@ -0,0 +1,77 @@
|
||||
workflow run_wf {
|
||||
take:
|
||||
input_ch
|
||||
|
||||
main:
|
||||
output_ch = input_ch
|
||||
| map { id, state ->
|
||||
def new_state = [
|
||||
state.id,
|
||||
state + ["_meta": ["join_id": id], "workflow_output": state.output]
|
||||
]
|
||||
new_state
|
||||
}
|
||||
| spatial_sample_processing.run(
|
||||
fromState: { id, state -> [
|
||||
"id": id,
|
||||
"input": state.input,
|
||||
"rna_layer": state.rna_layer,
|
||||
"prot_layer": state.prot_layer,
|
||||
"add_id_to_obs": state.add_id_to_obs,
|
||||
"add_id_obs_output": state.add_id_obs_output,
|
||||
"add_id_make_observation_keys_unique": state.add_id_make_observation_keys_unique,
|
||||
"rna_min_counts": state.rna_min_counts,
|
||||
"rna_max_counts": state.rna_max_counts,
|
||||
"rna_min_genes_per_cell": state.rna_min_genes_per_cell,
|
||||
"rna_max_genes_per_cell": state.rna_max_genes_per_cell,
|
||||
"rna_min_cells_per_gene": state.rna_min_cells_per_gene,
|
||||
"rna_min_fraction_mito": state.rna_min_fraction_mito,
|
||||
"rna_max_fraction_mito": state.rna_max_fraction_mito,
|
||||
"rna_min_fraction_ribo": state.rna_min_fraction_ribo,
|
||||
"rna_max_fraction_ribo": state.rna_max_fraction_ribo,
|
||||
"prot_min_counts": state.prot_min_counts,
|
||||
"prot_max_counts": state.prot_max_counts,
|
||||
"prot_min_proteins_per_cell": state.prot_min_proteins_per_cell,
|
||||
"prot_max_proteins_per_cell": state.prot_max_proteins_per_cell,
|
||||
"prot_min_cells_per_protein": state.prot_min_cells_per_protein,
|
||||
"highly_variable_features_var_output": state.highly_variable_features_var_output,
|
||||
"highly_variable_features_obs_batch_key": state.highly_variable_features_obs_batch_key,
|
||||
"var_gene_names": state.var_gene_names,
|
||||
"var_name_mitochondrial_genes": state.var_name_mitochondrial_genes,
|
||||
"obs_name_mitochondrial_fraction": state.obs_name_mitochondrial_fraction,
|
||||
"mitochondrial_gene_regex": state.mitochondrial_gene_regex,
|
||||
"var_name_ribosomal_genes": state.var_name_ribosomal_genes,
|
||||
"obs_name_ribosomal_fraction": state.obs_name_ribosomal_fraction,
|
||||
"ribosomal_gene_regex": state.ribosomal_gene_regex,
|
||||
"var_qc_metrics": state.var_qc_metrics,
|
||||
"top_n_vars": state.top_n_vars,
|
||||
"pca_overwrite": state.pca_overwrite,
|
||||
"clr_axis": state.clr_axis,
|
||||
"rna_enable_scaling": state.rna_enable_scaling,
|
||||
"rna_scaling_output_layer": state.rna_scaling_output_layer,
|
||||
"rna_scaling_pca_obsm_output": state.rna_scaling_pca_obsm_output,
|
||||
"rna_scaling_pca_loadings_varm_output": state.rna_scaling_pca_loadings_varm_output,
|
||||
"rna_scaling_pca_variance_uns_output": state.rna_scaling_pca_variance_uns_output,
|
||||
"rna_scaling_umap_obsm_output": state.rna_scaling_umap_obsm_output,
|
||||
"rna_scaling_max_value": state.rna_scaling_max_value,
|
||||
"rna_scaling_zero_center": state.rna_scaling_zero_center,
|
||||
"output": state.workflow_output
|
||||
]},
|
||||
args: [
|
||||
"skip_scrublet_doublet_detection": "true",
|
||||
],
|
||||
toState: [
|
||||
"output": "output"
|
||||
]
|
||||
)
|
||||
|
||||
| setState(
|
||||
[
|
||||
"_meta": "_meta",
|
||||
"output": "output"
|
||||
]
|
||||
)
|
||||
|
||||
emit:
|
||||
output_ch
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
params {
|
||||
rootDir = java.nio.file.Paths.get("$projectDir/../../../../").toAbsolutePath().normalize().toString()
|
||||
}
|
||||
|
||||
// include common settings
|
||||
includeConfig("${params.rootDir}/src/workflows/utils/labels.config")
|
||||
33
src/workflows/multiomics/spatial_process_samples/test.nf
Normal file
33
src/workflows/multiomics/spatial_process_samples/test.nf
Normal file
@@ -0,0 +1,33 @@
|
||||
nextflow.enable.dsl=2
|
||||
targetDir = params.rootDir + "/target/nextflow"
|
||||
|
||||
include { spatial_process_samples } from targetDir + "/workflows/multiomics/spatial_process_samples/main.nf"
|
||||
|
||||
params.resources_test = params.rootDir + "/resources_test"
|
||||
|
||||
workflow test_wf {
|
||||
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch = Channel.fromList([
|
||||
[
|
||||
id: "xenium",
|
||||
input: resources_test.resolve("xenium/xenium_tiny.h5mu"),
|
||||
publish_dir: "foo/",
|
||||
output: "test.h5mu",
|
||||
]
|
||||
])
|
||||
| map{ state -> [state.id, state] }
|
||||
| spatial_process_samples
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "outputs should contain two elements; [id, file]"
|
||||
assert output[1].output.toString().endsWith("test.h5mu") : "Output file should be a h5mu file. Found: ${output[1].output}"
|
||||
"Output: $output"
|
||||
}
|
||||
| toSortedList()
|
||||
| map { output_list ->
|
||||
assert output_list.size() == 1 : "output channel should contain one event"
|
||||
assert output_list[0][0] == "merged" : "Output ID should be 'merged'"
|
||||
}
|
||||
|
||||
}
|
||||
174
src/workflows/qc/spatial_qc/config.vsh.yaml
Normal file
174
src/workflows/qc/spatial_qc/config.vsh.yaml
Normal file
@@ -0,0 +1,174 @@
|
||||
name: "spatial_qc"
|
||||
namespace: "workflows/qc"
|
||||
scope: "public"
|
||||
description: "A pipeline to add basic qc statistics to a MuData containing spatial data."
|
||||
authors:
|
||||
- __merge__: /src/authors/dries_schaumont.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
roles: [ contributor ]
|
||||
- __merge__: /src/authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
info:
|
||||
test_dependencies:
|
||||
- name: qc_test
|
||||
namespace: test_workflows/qc
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--id"
|
||||
required: true
|
||||
type: string
|
||||
description: ID of the sample.
|
||||
example: foo
|
||||
- name: "--input"
|
||||
alternatives: [-i]
|
||||
description: Path to the sample.
|
||||
required: true
|
||||
example: input.h5mu
|
||||
type: file
|
||||
- name: "--modality"
|
||||
description: Which modality to process.
|
||||
type: string
|
||||
default: "rna"
|
||||
required: false
|
||||
- name: "--layer"
|
||||
description: "Use specified layer for calculation of qc metrics. If not specified, adata.X is used."
|
||||
type: string
|
||||
example: "raw_counts"
|
||||
required: false
|
||||
- name: "Mitochondrial & Ribosomal Gene Detection"
|
||||
arguments:
|
||||
- name: "--var_gene_names"
|
||||
required: false
|
||||
example: "gene_symbol"
|
||||
type: string
|
||||
description: |
|
||||
.var column name to be used to detect mitochondrial/ribosomal genes instead of .var_names (default if not set).
|
||||
Gene names matching with the regex value from --mitochondrial_gene_regex or --ribosomal_gene_regex will be
|
||||
identified as mitochondrial or ribosomal genes, respectively.
|
||||
- name: "--var_name_mitochondrial_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the mitochondrial genes.
|
||||
- name: "--obs_name_mitochondrial_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
.Obs slot to store the fraction of reads found to be mitochondrial. Defaults to 'fraction_' suffixed by the value of --var_name_mitochondrial_genes
|
||||
- name: --mitochondrial_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies mitochondrial genes from --var_gene_names.
|
||||
By default will detect human and mouse mitochondrial genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[mM][tT]-"
|
||||
- name: "--var_name_ribosomal_genes"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
In which .var slot to store a boolean array corresponding the ribosomal genes.
|
||||
- name: "--obs_name_ribosomal_fraction"
|
||||
type: string
|
||||
required: false
|
||||
description: |
|
||||
When specified, write the fraction of counts originating from ribosomal genes
|
||||
(based on --ribosomal_gene_regex) to an .obs column with the specified name.
|
||||
Requires --var_name_ribosomal_genes.
|
||||
- name: --ribosomal_gene_regex
|
||||
type: string
|
||||
description: |
|
||||
Regex string that identifies ribosomal genes from --var_gene_names.
|
||||
By default will detect human and mouse ribosomal genes from a gene symbol.
|
||||
required: false
|
||||
default: "^[Mm]?[Rr][Pp][LlSs]"
|
||||
- name: "QC metrics calculation options"
|
||||
arguments:
|
||||
- name: "--var_qc_metrics"
|
||||
description: |
|
||||
Keys to select a boolean (containing only True or False) column from .var.
|
||||
For each cell, calculate the proportion of total values for genes which are labeled 'True',
|
||||
compared to the total sum of the values for all genes. Defaults to the value from
|
||||
--var_name_mitochondrial_genes.
|
||||
type: string
|
||||
multiple: True
|
||||
multiple_sep: ','
|
||||
required: false
|
||||
example: "ercc,highly_variable"
|
||||
- name: "--top_n_vars"
|
||||
type: integer
|
||||
description: |
|
||||
Number of top vars to be used to calculate cumulative proportions.
|
||||
If not specified, proportions are not calculated. `--top_n_vars 20,50` finds
|
||||
cumulative proportion to the 20th and 50th most expressed vars.
|
||||
multiple: true
|
||||
multiple_sep: ','
|
||||
required: false
|
||||
default: [50, 100, 200, 500]
|
||||
- name: "--output_obs_num_nonzero_vars"
|
||||
description: |
|
||||
Name of column in .obs describing, for each observation, the number of stored values
|
||||
(including explicit zeroes). In other words, the name of the column that counts
|
||||
for each row the number of columns that contain data.
|
||||
type: string
|
||||
required: false
|
||||
default: "num_nonzero_vars"
|
||||
- name: "--output_obs_total_counts_vars"
|
||||
description: |
|
||||
Name of the column for .obs describing, for each observation (row),
|
||||
the sum of the stored values in the columns.
|
||||
type: string
|
||||
required: false
|
||||
default: total_counts
|
||||
- name: "--output_var_num_nonzero_obs"
|
||||
description: |
|
||||
Name of column describing, for each feature, the number of stored values
|
||||
(including explicit zeroes). In other words, the name of the column that counts
|
||||
for each column the number of rows that contain data.
|
||||
type: string
|
||||
required: false
|
||||
default: "num_nonzero_obs"
|
||||
- name: "--output_var_total_counts_obs"
|
||||
description: |
|
||||
Name of the column in .var describing, for each feature (column),
|
||||
the sum of the stored values in the rows.
|
||||
type: string
|
||||
required: false
|
||||
default: total_counts
|
||||
- name: "--output_var_obs_mean"
|
||||
type: string
|
||||
description: |
|
||||
Name of the column in .obs providing the mean of the values in each row.
|
||||
default: "obs_mean"
|
||||
required: false
|
||||
- name: "--output_var_pct_dropout"
|
||||
type: string
|
||||
default: "pct_dropout"
|
||||
description: |
|
||||
Name of the column in .obs providing for each feature the percentage of
|
||||
observations the feature does not appear on (i.e. is missing). Same as `--output_var_num_nonzero_obs`
|
||||
but percentage based.
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
required: true
|
||||
direction: output
|
||||
description: Destination path to the output.
|
||||
example: output.h5mu
|
||||
dependencies:
|
||||
- name: workflows/qc/qc
|
||||
alias: spatial_qc_workflow
|
||||
repository: openpipeline
|
||||
resources:
|
||||
- type: nextflow_script
|
||||
path: main.nf
|
||||
entrypoint: run_wf
|
||||
test_resources:
|
||||
- type: nextflow_script
|
||||
path: test.nf
|
||||
entrypoint: test_wf
|
||||
- path: /resources_test/xenium/xenium_tiny.h5mu
|
||||
runners:
|
||||
- type: nextflow
|
||||
15
src/workflows/qc/spatial_qc/integration_test.sh
Normal file
15
src/workflows/qc/spatial_qc/integration_test.sh
Normal file
@@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
|
||||
# get the root of the directory
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel)
|
||||
|
||||
# ensure that the command below is run from the root of the repository
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
nextflow \
|
||||
run . \
|
||||
-main-script src/workflows/qc/spatial_qc/test.nf \
|
||||
-entry test_wf \
|
||||
-profile docker,no_publish \
|
||||
-c src/workflows/utils/labels_ci.config \
|
||||
-c src/workflows/utils/integration_tests.config
|
||||
38
src/workflows/qc/spatial_qc/main.nf
Normal file
38
src/workflows/qc/spatial_qc/main.nf
Normal file
@@ -0,0 +1,38 @@
|
||||
workflow run_wf {
|
||||
take:
|
||||
input_ch
|
||||
|
||||
main:
|
||||
output_ch = input_ch
|
||||
| spatial_qc_workflow.run(
|
||||
fromState: { id, state -> [
|
||||
"id": id,
|
||||
"input": state.input,
|
||||
"modality": state.modality,
|
||||
"layer": state.layer,
|
||||
"var_gene_names": state.var_gene_names,
|
||||
"var_name_mitochondrial_genes": state.var_name_mitochondrial_genes,
|
||||
"obs_name_mitochondrial_fraction": state.obs_name_mitochondrial_fraction,
|
||||
"mitochondrial_gene_regex": state.mitochondrial_gene_regex,
|
||||
"var_name_ribosomal_genes": state.var_name_ribosomal_genes,
|
||||
"obs_name_ribosomal_fraction": state.obs_name_ribosomal_fraction,
|
||||
"ribosomal_gene_regex": state.ribosomal_gene_regex,
|
||||
"var_qc_metrics": state.var_qc_metrics,
|
||||
"top_n_vars": state.top_n_vars,
|
||||
"output_obs_num_nonzero_vars": state.output_obs_num_nonzero_vars,
|
||||
"output_obs_total_counts_vars": state.output_obs_total_counts_vars,
|
||||
"output_var_num_nonzero_obs": state.output_var_num_nonzero_obs,
|
||||
"output_var_total_counts_obs": state.output_var_total_counts_obs,
|
||||
"output_var_obs_mean": state.output_var_obs_mean,
|
||||
"output_var_pct_dropout": state.output_var_pct_dropout
|
||||
]},
|
||||
toState: [
|
||||
"output": "output"
|
||||
]
|
||||
)
|
||||
|
||||
| setState(["output"])
|
||||
|
||||
emit:
|
||||
output_ch
|
||||
}
|
||||
10
src/workflows/qc/spatial_qc/nextflow.config
Normal file
10
src/workflows/qc/spatial_qc/nextflow.config
Normal file
@@ -0,0 +1,10 @@
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
params {
|
||||
rootDir = java.nio.file.Paths.get("$projectDir/../../../../").toAbsolutePath().normalize().toString()
|
||||
}
|
||||
|
||||
// include common settings
|
||||
includeConfig("${params.rootDir}/src/workflows/utils/labels.config")
|
||||
40
src/workflows/qc/spatial_qc/test.nf
Normal file
40
src/workflows/qc/spatial_qc/test.nf
Normal file
@@ -0,0 +1,40 @@
|
||||
nextflow.enable.dsl=2
|
||||
|
||||
include { spatial_qc } from params.rootDir + "/target/nextflow/workflows/qc/spatial_qc/main.nf"
|
||||
|
||||
params.resources_test = params.rootDir + "/resources_test"
|
||||
|
||||
workflow test_wf {
|
||||
|
||||
resources_test = file(params.resources_test)
|
||||
|
||||
output_ch =
|
||||
Channel.fromList([
|
||||
[
|
||||
id: "xenium_test",
|
||||
input: resources_test.resolve("xenium/xenium_tiny.h5mu"),
|
||||
var_name_mitochondrial_genes: "mitochondrial",
|
||||
var_name_ribosomal_genes: "ribosomal",
|
||||
]
|
||||
])
|
||||
| map { state -> [state.id, state] }
|
||||
| spatial_qc.run(
|
||||
toState: { id, output, state -> output + [og_input: state.input] }
|
||||
)
|
||||
|
||||
| view { output ->
|
||||
assert output.size() == 2 : "Outputs should contain two elements; [id, state]"
|
||||
|
||||
// check id
|
||||
def id = output[0]
|
||||
assert id.endsWith("_test")
|
||||
|
||||
// check output
|
||||
def state = output[1]
|
||||
assert state instanceof Map : "State should be a map. Found: ${state}"
|
||||
assert state.containsKey("output") : "Output should contain key 'output'."
|
||||
assert state.output.isFile() : "'output' should be a file."
|
||||
assert state.output.toString().endsWith(".h5mu") : "Output file should end with '.h5mu'. Found: ${state.output}"
|
||||
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
name: "spaceranger_mapping_test"
|
||||
namespace: "test_workflows/ingestion"
|
||||
scope: "test"
|
||||
description: "This component test the output of the integration test of the spaceranger mapping workflow."
|
||||
authors:
|
||||
- __merge__: /src/authors/dorien_roosen.yaml
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: Path to h5mu output.
|
||||
example: foo.final.h5mu
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: /src/utils/setup_logger.py
|
||||
engines:
|
||||
- type: docker
|
||||
image: python:3.12-slim
|
||||
__merge__: /src/base/requirements/testworkflows_setup.yaml
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
@@ -0,0 +1,32 @@
|
||||
from mudata import read_h5mu
|
||||
import sys
|
||||
import pytest
|
||||
|
||||
##VIASH START
|
||||
par = {"input": "input.h5mu"}
|
||||
|
||||
meta = {"resources_dir": "resources_test"}
|
||||
##VIASH END
|
||||
|
||||
|
||||
def test_run():
|
||||
input_mudata = read_h5mu(par["input"])
|
||||
expected_var_columns = ["gene_symbol", "feature_types", "genome"]
|
||||
|
||||
assert list(input_mudata.mod.keys()) == ["rna"], (
|
||||
"Input should contain rna modality."
|
||||
)
|
||||
assert list(input_mudata.var.columns) == expected_var_columns, (
|
||||
f"Input var columns should be: {expected_var_columns}."
|
||||
)
|
||||
assert list(input_mudata.mod["rna"].var.columns) == expected_var_columns, (
|
||||
f"Input mod['rna'] var columns should be: {expected_var_columns}."
|
||||
)
|
||||
|
||||
assert list(input_mudata.mod["rna"].obsm.keys()) == ["spatial"], (
|
||||
"Input mod['rna'] obsm should contain spatial column."
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(pytest.main([__file__, "--import-mode=importlib"]))
|
||||
36
src/workflows/utils/integration_tests.config
Normal file
36
src/workflows/utils/integration_tests.config
Normal file
@@ -0,0 +1,36 @@
|
||||
profiles {
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
}
|
||||
68
src/workflows/utils/labels.config
Normal file
68
src/workflows/utils/labels.config
Normal file
@@ -0,0 +1,68 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
maxMemory = null
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: veryhighmem { memory = { get_memory( 75.GB * task.attempt ) } }
|
||||
|
||||
// Disk space
|
||||
// Nextflow apparently can't handle empty directives, i.e.
|
||||
// withLabel: lowdisk {}
|
||||
// so for that reason we have to add a dummy directive
|
||||
withLabel: lowdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: middisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: highdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
105
src/workflows/utils/labels_ci.config
Normal file
105
src/workflows/utils/labels_ci.config
Normal file
@@ -0,0 +1,105 @@
|
||||
process {
|
||||
withLabel: lowmem { memory = 13.Gb }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midmem { memory = 13.Gb }
|
||||
withLabel: midcpu { cpus = 4 }
|
||||
withLabel: highmem { memory = 13.Gb }
|
||||
withLabel: highcpu { cpus = 4 }
|
||||
withLabel: veryhighmem { memory = 13.Gb }
|
||||
withLabel: lowdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: middisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: highdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
disk = {process.disk ? process.disk : null}
|
||||
}
|
||||
}
|
||||
|
||||
env.NUMBA_CACHE_DIR = '/tmp'
|
||||
|
||||
trace {
|
||||
enabled = true
|
||||
overwrite = true
|
||||
}
|
||||
dag {
|
||||
overwrite = true
|
||||
}
|
||||
|
||||
process.maxForks = 1
|
||||
|
||||
profiles {
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
docker {
|
||||
docker.fixOwnership = true
|
||||
docker.enabled = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
|
||||
local {
|
||||
// This config is for local processing.
|
||||
process {
|
||||
maxMemory = 25.GB
|
||||
withLabel: verylowcpu { cpus = 2 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 6 }
|
||||
withLabel: highcpu { cpus = 12 }
|
||||
|
||||
withLabel: lowmem { memory = { get_memory( 8.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 12.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 20.GB * task.attempt ) } }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
0
target/.build.yaml
Normal file
0
target/.build.yaml
Normal file
258
target/_private/executable/filter/subset_cosmx/.config.vsh.yaml
Normal file
258
target/_private/executable/filter/subset_cosmx/.config.vsh.yaml
Normal file
@@ -0,0 +1,258 @@
|
||||
name: "subset_cosmx"
|
||||
namespace: "filter"
|
||||
version: "fix-unit-tests"
|
||||
authors:
|
||||
- name: "Dorien Roosen"
|
||||
roles:
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dorien@data-intuitive.com"
|
||||
github: "dorien-er"
|
||||
linkedin: "dorien-roosen"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
- name: "Weiwei Schultz"
|
||||
roles:
|
||||
- "contributor"
|
||||
info:
|
||||
role: "Contributor"
|
||||
organizations:
|
||||
- name: "Janssen R&D US"
|
||||
role: "Associate Director Data Sciences"
|
||||
argument_groups:
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Input folder. Must contain the output from a NanoString CosMx run."
|
||||
info: null
|
||||
example:
|
||||
- "cosmx_data"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--num_fovs"
|
||||
description: "Number of fields of views to keep. Will keep only the first <num_fovs>\
|
||||
\ fields of view."
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--subset_transcripts_file"
|
||||
description: "Whether to subset the <dataset_id>_tx_file.csv file."
|
||||
info: null
|
||||
default:
|
||||
- true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--subset_polygons_file"
|
||||
description: "Whether to subset the <dataset_id>_polygons.csv file."
|
||||
info: null
|
||||
default:
|
||||
- true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "The directory where the subset data will be stored."
|
||||
info: null
|
||||
example:
|
||||
- "cosmx_data_tiny"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Filters the output from NanoString experiment to keep only a subset\
|
||||
\ of the fields of view.\nExpected input folder structure:\npath/to/dataset/\n \
|
||||
\ ├── CellComposite/\n ├── CellLabels/\n ├── CellOverlay/\n ├── CompartmentLabels/\n\
|
||||
\ ├── <dataset_id>_exprMat_file.csv\n ├── <dataset_id>_fov_positions_file.csv\n\
|
||||
\ ├── <dataset_id>_metadata_file.csv\n └── <dataset_id>_tx_file.csv \n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "Lung5_Rep2_tiny"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
repositories:
|
||||
- type: "vsh"
|
||||
name: "openpipeline"
|
||||
repo: "openpipeline"
|
||||
tag: "v3.0.0"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "lowmem"
|
||||
- "singlecpu"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.12-slim"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "fix-unit-tests"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "scanpy~=1.10.4"
|
||||
- "squidpy~=1.7.0"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.9.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/filter/subset_cosmx/config.vsh.yaml"
|
||||
runner: "executable"
|
||||
engine: "docker|native"
|
||||
output: "target/_private/executable/filter/subset_cosmx"
|
||||
executable: "target/_private/executable/filter/subset_cosmx/subset_cosmx"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "1bcdc62a71a2599e892078235df32ea3f72fec39"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
package_config:
|
||||
name: "openpipeline_spatial"
|
||||
version: "fix-unit-tests"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-bio/openpipeline_spatial/resources_test"
|
||||
dest: "resources_test"
|
||||
repositories:
|
||||
- type: "vsh"
|
||||
name: "openpipeline"
|
||||
repo: "openpipeline"
|
||||
tag: "v3.0.0"
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'fix-unit-tests'"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
docker_registry: "ghcr.io"
|
||||
@@ -0,0 +1,68 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
maxMemory = null
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: veryhighmem { memory = { get_memory( 75.GB * task.attempt ) } }
|
||||
|
||||
// Disk space
|
||||
// Nextflow apparently can't handle empty directives, i.e.
|
||||
// withLabel: lowdisk {}
|
||||
// so for that reason we have to add a dummy directive
|
||||
withLabel: lowdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: middisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: highdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
1274
target/_private/executable/filter/subset_cosmx/subset_cosmx
Executable file
1274
target/_private/executable/filter/subset_cosmx/subset_cosmx
Executable file
File diff suppressed because it is too large
Load Diff
258
target/_private/nextflow/filter/subset_cosmx/.config.vsh.yaml
Normal file
258
target/_private/nextflow/filter/subset_cosmx/.config.vsh.yaml
Normal file
@@ -0,0 +1,258 @@
|
||||
name: "subset_cosmx"
|
||||
namespace: "filter"
|
||||
version: "fix-unit-tests"
|
||||
authors:
|
||||
- name: "Dorien Roosen"
|
||||
roles:
|
||||
- "maintainer"
|
||||
info:
|
||||
role: "Core Team Member"
|
||||
links:
|
||||
email: "dorien@data-intuitive.com"
|
||||
github: "dorien-er"
|
||||
linkedin: "dorien-roosen"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
- name: "Weiwei Schultz"
|
||||
roles:
|
||||
- "contributor"
|
||||
info:
|
||||
role: "Contributor"
|
||||
organizations:
|
||||
- name: "Janssen R&D US"
|
||||
role: "Associate Director Data Sciences"
|
||||
argument_groups:
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--input"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Input folder. Must contain the output from a NanoString CosMx run."
|
||||
info: null
|
||||
example:
|
||||
- "cosmx_data"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "integer"
|
||||
name: "--num_fovs"
|
||||
description: "Number of fields of views to keep. Will keep only the first <num_fovs>\
|
||||
\ fields of view."
|
||||
info: null
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--subset_transcripts_file"
|
||||
description: "Whether to subset the <dataset_id>_tx_file.csv file."
|
||||
info: null
|
||||
default:
|
||||
- true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "boolean"
|
||||
name: "--subset_polygons_file"
|
||||
description: "Whether to subset the <dataset_id>_polygons.csv file."
|
||||
info: null
|
||||
default:
|
||||
- true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
description: "The directory where the subset data will be stored."
|
||||
info: null
|
||||
example:
|
||||
- "cosmx_data_tiny"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "python_script"
|
||||
path: "script.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "setup_logger.py"
|
||||
- type: "file"
|
||||
path: "nextflow_labels.config"
|
||||
dest: "nextflow_labels.config"
|
||||
description: "Filters the output from NanoString experiment to keep only a subset\
|
||||
\ of the fields of view.\nExpected input folder structure:\npath/to/dataset/\n \
|
||||
\ ├── CellComposite/\n ├── CellLabels/\n ├── CellOverlay/\n ├── CompartmentLabels/\n\
|
||||
\ ├── <dataset_id>_exprMat_file.csv\n ├── <dataset_id>_fov_positions_file.csv\n\
|
||||
\ ├── <dataset_id>_metadata_file.csv\n └── <dataset_id>_tx_file.csv \n"
|
||||
test_resources:
|
||||
- type: "python_script"
|
||||
path: "test.py"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "Lung5_Rep2_tiny"
|
||||
info: null
|
||||
status: "enabled"
|
||||
scope:
|
||||
image: "private"
|
||||
target: "private"
|
||||
repositories:
|
||||
- type: "vsh"
|
||||
name: "openpipeline"
|
||||
repo: "openpipeline"
|
||||
tag: "v3.0.0"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
docker_registry: "ghcr.io"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
label:
|
||||
- "lowmem"
|
||||
- "singlecpu"
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
script:
|
||||
- "includeConfig(\"nextflow_labels.config\")"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "python:3.12-slim"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "fix-unit-tests"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "procps"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "scanpy~=1.10.4"
|
||||
- "squidpy~=1.7.0"
|
||||
upgrade: true
|
||||
test_setup:
|
||||
- type: "apt"
|
||||
packages:
|
||||
- "git"
|
||||
interactive: false
|
||||
- type: "python"
|
||||
user: false
|
||||
packages:
|
||||
- "viashpy==0.9.0"
|
||||
github:
|
||||
- "openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils"
|
||||
upgrade: true
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/filter/subset_cosmx/config.vsh.yaml"
|
||||
runner: "nextflow"
|
||||
engine: "docker|native"
|
||||
output: "target/_private/nextflow/filter/subset_cosmx"
|
||||
executable: "target/_private/nextflow/filter/subset_cosmx/main.nf"
|
||||
viash_version: "0.9.4"
|
||||
git_commit: "1bcdc62a71a2599e892078235df32ea3f72fec39"
|
||||
git_remote: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
package_config:
|
||||
name: "openpipeline_spatial"
|
||||
version: "fix-unit-tests"
|
||||
info:
|
||||
test_resources:
|
||||
- type: "s3"
|
||||
path: "s3://openpipelines-bio/openpipeline_spatial/resources_test"
|
||||
dest: "resources_test"
|
||||
repositories:
|
||||
- type: "vsh"
|
||||
name: "openpipeline"
|
||||
repo: "openpipeline"
|
||||
tag: "v3.0.0"
|
||||
viash_version: "0.9.4"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".resources += {path: '/src/workflows/utils/labels.config', dest: 'nextflow_labels.config'}\n\
|
||||
.runners[.type == 'nextflow'].config.script := 'includeConfig(\"nextflow_labels.config\"\
|
||||
)'"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'fix-unit-tests'"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/openpipelines-bio/openpipeline_spatial"
|
||||
docker_registry: "ghcr.io"
|
||||
3984
target/_private/nextflow/filter/subset_cosmx/main.nf
Normal file
3984
target/_private/nextflow/filter/subset_cosmx/main.nf
Normal file
File diff suppressed because it is too large
Load Diff
126
target/_private/nextflow/filter/subset_cosmx/nextflow.config
Normal file
126
target/_private/nextflow/filter/subset_cosmx/nextflow.config
Normal file
@@ -0,0 +1,126 @@
|
||||
manifest {
|
||||
name = 'filter/subset_cosmx'
|
||||
mainScript = 'main.nf'
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
version = 'fix-unit-tests'
|
||||
description = 'Filters the output from NanoString experiment to keep only a subset of the fields of view.\nExpected input folder structure:\npath/to/dataset/\n ├── CellComposite/\n ├── CellLabels/\n ├── CellOverlay/\n ├── CompartmentLabels/\n ├── <dataset_id>_exprMat_file.csv\n ├── <dataset_id>_fov_positions_file.csv\n ├── <dataset_id>_metadata_file.csv\n └── <dataset_id>_tx_file.csv \n'
|
||||
author = 'Dorien Roosen, Weiwei Schultz'
|
||||
}
|
||||
|
||||
process.container = 'nextflow/bash:latest'
|
||||
|
||||
// detect tempdir
|
||||
tempDir = java.nio.file.Paths.get(
|
||||
System.getenv('NXF_TEMP') ?:
|
||||
System.getenv('VIASH_TEMP') ?:
|
||||
System.getenv('TEMPDIR') ?:
|
||||
System.getenv('TMPDIR') ?:
|
||||
'/tmp'
|
||||
).toAbsolutePath()
|
||||
|
||||
profiles {
|
||||
no_publish {
|
||||
process {
|
||||
withName: '.*' {
|
||||
publishDir = [
|
||||
enabled: false
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
mount_temp {
|
||||
docker.temp = tempDir
|
||||
podman.temp = tempDir
|
||||
charliecloud.temp = tempDir
|
||||
}
|
||||
docker {
|
||||
docker.enabled = true
|
||||
// docker.userEmulation = true
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
singularity {
|
||||
singularity.enabled = true
|
||||
singularity.autoMounts = true
|
||||
docker.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
podman {
|
||||
podman.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
shifter.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
shifter {
|
||||
shifter.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
charliecloud.enabled = false
|
||||
}
|
||||
charliecloud {
|
||||
charliecloud.enabled = true
|
||||
docker.enabled = false
|
||||
singularity.enabled = false
|
||||
podman.enabled = false
|
||||
shifter.enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
process{
|
||||
withLabel: mem1gb { memory = 1000000000.B }
|
||||
withLabel: mem2gb { memory = 2000000000.B }
|
||||
withLabel: mem5gb { memory = 5000000000.B }
|
||||
withLabel: mem10gb { memory = 10000000000.B }
|
||||
withLabel: mem20gb { memory = 20000000000.B }
|
||||
withLabel: mem50gb { memory = 50000000000.B }
|
||||
withLabel: mem100gb { memory = 100000000000.B }
|
||||
withLabel: mem200gb { memory = 200000000000.B }
|
||||
withLabel: mem500gb { memory = 500000000000.B }
|
||||
withLabel: mem1tb { memory = 1000000000000.B }
|
||||
withLabel: mem2tb { memory = 2000000000000.B }
|
||||
withLabel: mem5tb { memory = 5000000000000.B }
|
||||
withLabel: mem10tb { memory = 10000000000000.B }
|
||||
withLabel: mem20tb { memory = 20000000000000.B }
|
||||
withLabel: mem50tb { memory = 50000000000000.B }
|
||||
withLabel: mem100tb { memory = 100000000000000.B }
|
||||
withLabel: mem200tb { memory = 200000000000000.B }
|
||||
withLabel: mem500tb { memory = 500000000000000.B }
|
||||
withLabel: mem1gib { memory = 1073741824.B }
|
||||
withLabel: mem2gib { memory = 2147483648.B }
|
||||
withLabel: mem4gib { memory = 4294967296.B }
|
||||
withLabel: mem8gib { memory = 8589934592.B }
|
||||
withLabel: mem16gib { memory = 17179869184.B }
|
||||
withLabel: mem32gib { memory = 34359738368.B }
|
||||
withLabel: mem64gib { memory = 68719476736.B }
|
||||
withLabel: mem128gib { memory = 137438953472.B }
|
||||
withLabel: mem256gib { memory = 274877906944.B }
|
||||
withLabel: mem512gib { memory = 549755813888.B }
|
||||
withLabel: mem1tib { memory = 1099511627776.B }
|
||||
withLabel: mem2tib { memory = 2199023255552.B }
|
||||
withLabel: mem4tib { memory = 4398046511104.B }
|
||||
withLabel: mem8tib { memory = 8796093022208.B }
|
||||
withLabel: mem16tib { memory = 17592186044416.B }
|
||||
withLabel: mem32tib { memory = 35184372088832.B }
|
||||
withLabel: mem64tib { memory = 70368744177664.B }
|
||||
withLabel: mem128tib { memory = 140737488355328.B }
|
||||
withLabel: mem256tib { memory = 281474976710656.B }
|
||||
withLabel: mem512tib { memory = 562949953421312.B }
|
||||
withLabel: cpu1 { cpus = 1 }
|
||||
withLabel: cpu2 { cpus = 2 }
|
||||
withLabel: cpu5 { cpus = 5 }
|
||||
withLabel: cpu10 { cpus = 10 }
|
||||
withLabel: cpu20 { cpus = 20 }
|
||||
withLabel: cpu50 { cpus = 50 }
|
||||
withLabel: cpu100 { cpus = 100 }
|
||||
withLabel: cpu200 { cpus = 200 }
|
||||
withLabel: cpu500 { cpus = 500 }
|
||||
withLabel: cpu1000 { cpus = 1000 }
|
||||
}
|
||||
|
||||
includeConfig("nextflow_labels.config")
|
||||
@@ -0,0 +1,68 @@
|
||||
process {
|
||||
// Default resources for components that hardly do any processing
|
||||
memory = { 2.GB * task.attempt }
|
||||
cpus = 1
|
||||
|
||||
// Retry for exit codes that have something to do with memory issues
|
||||
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
|
||||
maxRetries = 3
|
||||
maxMemory = null
|
||||
|
||||
// CPU resources
|
||||
withLabel: singlecpu { cpus = 1 }
|
||||
withLabel: lowcpu { cpus = 4 }
|
||||
withLabel: midcpu { cpus = 10 }
|
||||
withLabel: highcpu { cpus = 20 }
|
||||
|
||||
// Memory resources
|
||||
withLabel: lowmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: midmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: highmem { memory = { get_memory( 50.GB * task.attempt ) } }
|
||||
withLabel: veryhighmem { memory = { get_memory( 75.GB * task.attempt ) } }
|
||||
|
||||
// Disk space
|
||||
// Nextflow apparently can't handle empty directives, i.e.
|
||||
// withLabel: lowdisk {}
|
||||
// so for that reason we have to add a dummy directive
|
||||
withLabel: lowdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: middisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: highdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
withLabel: veryhighdisk {
|
||||
dummyDirective = "dummyValue"
|
||||
}
|
||||
// NOTE: The above labels intentionally do not have an effect by default.
|
||||
// The user should set the disk space requirements by adding the following
|
||||
// to the compute environment:
|
||||
//
|
||||
// withLabel: lowdisk { disk = { 20.GB * task.attempt } }
|
||||
// withLabel: middisk { disk = { 100.GB * task.attempt } }
|
||||
// withLabel: highdisk { disk = { 200.GB * task.attempt } }
|
||||
// withLabel: veryhighdisk { disk = { 500.GB * task.attempt } }
|
||||
}
|
||||
|
||||
def get_memory(to_compare) {
|
||||
if (!process.containsKey("maxMemory") || !process.maxMemory) {
|
||||
return to_compare
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
|
||||
return process.maxMemory
|
||||
}
|
||||
else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
|
||||
return max_memory as nextflow.util.MemoryUnit
|
||||
}
|
||||
else {
|
||||
return to_compare
|
||||
}
|
||||
} catch (all) {
|
||||
println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
|
||||
System.exit(1)
|
||||
}
|
||||
}
|
||||
12
target/_private/nextflow/filter/subset_cosmx/setup_logger.py
Normal file
12
target/_private/nextflow/filter/subset_cosmx/setup_logger.py
Normal file
@@ -0,0 +1,12 @@
|
||||
def setup_logger():
|
||||
import logging
|
||||
from sys import stdout
|
||||
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console_handler = logging.StreamHandler(stdout)
|
||||
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
|
||||
console_handler.setFormatter(logFormatter)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
return logger
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user