Files
htrnaseq/README.md
CI f409617d7f Build branch runner_multiple_input_directories with version runner_multiple_input_directories (a2b2f95)
Build pipeline: viash-hub.htrnaseq.runner-multiple-input-directories-q647z

Source commit: a2b2f95917

Source message: Add resource labels back to well mapping
2025-02-11 12:55:43 +00:00

129 lines
5.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HT-RNAseq - A pipeline for processing high-throughput RNA-seq data
## Introduction
__TODO__: Add a description of the pipeline here.
## Test data
As test data, we use [a DRUGseq dataset](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176150) from the [NCBI Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra).
The original data has been (partly) subsampled to reduce the test runtime. We used [seqtk](https://github.com/lh3/seqtk) for this with a seed of 1, e.g.:
```bash
seqtk sample -s1 orig/SRR14730302/VH02001614_S8_R1_001.fastq.gz 10000 > 10k/SRR14730302/VH02001614_S8_R1_001.fastq.gz
```
The data is available at: `gs://viash-hub-test-data/htrnaseq/v1/`:
```
gcstree -f viash-hub-test-data/htrnaseq/v1/
viash-hub-test-data
└── htrnaseq
└── v1
├── [ 48] 2-wells.fasta
├── [465.3K] GSE176150_metadata.csv
├── 100k
│ ├── SRR14730301
│ │ ├── [8.5M] VH02001612_S9_R1_001.fastq
│ │ └── [14.9M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [8.5M] VH02001614_S8_R1_001.fastq.gz
│ └── [14.9M] VH02001614_S8_R2_001.fastq.gz
├── 10k
│ ├── SRR14730301
│ │ ├── [845.4K] VH02001612_S9_R1_001.fastq
│ │ └── [1.5M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [845.3K] VH02001614_S8_R1_001.fastq.gz
│ └── [1.5M] VH02001614_S8_R2_001.fastq.gz
└── orig
├── [20.4G] SRR14730301
│ └── [20.4G] SRR14730301
├── SRR14730301
│ ├── [9.1G] VH02001612_S9_R1_001.fastq.gz
│ └── [22.0G] VH02001612_S9_R2_001.fastq.gz
├── [16.9G] SRR14730302
│ └── [16.9G] SRR14730302
├── SRR14730302
│ ├── [7.6G] VH02001614_S8_R1_001.fastq.gz
│ └── [18.0G] VH02001614_S8_R2_001.fastq.gz
├── [18.0G] SRR14730303
│ └── [18.0G] SRR14730303
├── SRR14730303
│ ├── [8.1G] VH02001618_S7_R1_001.fastq.gz
│ └── [19.2G] VH02001618_S7_R2_001.fastq.gz
├── [16.5G] SRR14730304
│ └── [16.5G] SRR14730304
├── SRR14730304
│ ├── [7.5G] VH02001700_S6_R1_001.fastq.gz
│ └── [17.8G] VH02001700_S6_R2_001.fastq.gz
├── [19.0G] SRR14730305
│ └── [19.0G] SRR14730305
├── SRR14730305
│ ├── [8.4G] VH02001702_S5_R1_001.fastq.gz
│ └── [20.6G] VH02001702_S5_R2_001.fastq.gz
├── [14.6G] SRR14730306
│ └── [14.6G] SRR14730306
├── SRR14730306
│ ├── [6.6G] VH02001704_S4_R1_001.fastq.gz
│ └── [16.0G] VH02001704_S4_R2_001.fastq.gz
├── [21.5G] SRR14730307
│ └── [21.5G] SRR14730307
├── SRR14730307
│ ├── [9.6G] VH02001708_S3_R1_001.fastq.gz
│ └── [23.2G] VH02001708_S3_R2_001.fastq.gz
├── [20.7G] SRR14730308
│ └── [20.7G] SRR14730308
├── SRR14730308
│ ├── [9.3G] VH02001710_S2_R1_001.fastq.gz
│ └── [22.1G] VH02001710_S2_R2_001.fastq.gz
├── [15.8G] SRR14730309
│ └── [15.8G] SRR14730309
└── SRR14730309
├── [7.2G] VH02001712_S1_R1_001.fastq.gz
└── [16.9G] VH02001712_S1_R2_001.fastq.gz
18 directories, 37 files
```
The `orig` directory contains the original fastq files. The fastq files are available for 10k and 100k subsamples in the `10k` and `100k` directories, respectively.
The `2-wells.fasta` file contains the barcodes for 2 wells.
## Test run
The pipeline can be run by creating a `params.yaml` file like this:
```yaml
param_list:
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R1_001.fastq"
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R2_001.fastq"
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
id: sample_one
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R1_001.fastq"
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R2_001.fastq"
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
id: sample_two
```
and then:
```bash
viash ns build --setup cb
nextflow run . -main-script target/nextflow/workflows/htrnaseq/main.nf \
-profile docker \
-c target/nextflow/workflows/htrnaseq/nextflow.config \
-params-file params.yaml \
-resume \
--publish_dir output
```
Or, by running `src/workflows/htrnaseq/integration_test.sh`.
# Special Thanks
Developed in collaboration with Data Intuitive and Open Analytics.