Files
htrnaseq/README.md
CI 044a3af7a9 Build branch main with version main (21831c2)
Build pipeline: viash-hub.htrnaseq.main-n8w4c

Source commit: 21831c2104

Source message: Fix well_demultiplex test
2024-08-29 08:10:20 +00:00

125 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HT-RNAseq - A pipeline for processing high-throughput RNA-seq data
## Introduction
__TODO__: Add a description of the pipeline here.
## Test data
As test data, we use [a DRUGseq dataset](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176150) from the [NCBI Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra).
The original data has been (partly) subsampled to reduce the test runtime. We used [seqtk](https://github.com/lh3/seqtk) for this with a seed of 1, e.g.:
```bash
seqtk sample -s1 orig/SRR14730302/VH02001614_S8_R1_001.fastq.gz 10000 > 10k/SRR14730302/VH02001614_S8_R1_001.fastq.gz
```
The data is available at: `gs://viash-hub-test-data/htrnaseq/v1/`:
```
gcstree -f viash-hub-test-data/htrnaseq/v1/
viash-hub-test-data
└── htrnaseq
└── v1
├── [ 48] 2-wells.fasta
├── [465.3K] GSE176150_metadata.csv
├── 100k
│ ├── SRR14730301
│ │ ├── [8.5M] VH02001612_S9_R1_001.fastq
│ │ └── [14.9M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [8.5M] VH02001614_S8_R1_001.fastq.gz
│ └── [14.9M] VH02001614_S8_R2_001.fastq.gz
├── 10k
│ ├── SRR14730301
│ │ ├── [845.4K] VH02001612_S9_R1_001.fastq
│ │ └── [1.5M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [845.3K] VH02001614_S8_R1_001.fastq.gz
│ └── [1.5M] VH02001614_S8_R2_001.fastq.gz
└── orig
├── [20.4G] SRR14730301
│ └── [20.4G] SRR14730301
├── SRR14730301
│ ├── [9.1G] VH02001612_S9_R1_001.fastq.gz
│ └── [22.0G] VH02001612_S9_R2_001.fastq.gz
├── [16.9G] SRR14730302
│ └── [16.9G] SRR14730302
├── SRR14730302
│ ├── [7.6G] VH02001614_S8_R1_001.fastq.gz
│ └── [18.0G] VH02001614_S8_R2_001.fastq.gz
├── [18.0G] SRR14730303
│ └── [18.0G] SRR14730303
├── SRR14730303
│ ├── [8.1G] VH02001618_S7_R1_001.fastq.gz
│ └── [19.2G] VH02001618_S7_R2_001.fastq.gz
├── [16.5G] SRR14730304
│ └── [16.5G] SRR14730304
├── SRR14730304
│ ├── [7.5G] VH02001700_S6_R1_001.fastq.gz
│ └── [17.8G] VH02001700_S6_R2_001.fastq.gz
├── [19.0G] SRR14730305
│ └── [19.0G] SRR14730305
├── SRR14730305
│ ├── [8.4G] VH02001702_S5_R1_001.fastq.gz
│ └── [20.6G] VH02001702_S5_R2_001.fastq.gz
├── [14.6G] SRR14730306
│ └── [14.6G] SRR14730306
├── SRR14730306
│ ├── [6.6G] VH02001704_S4_R1_001.fastq.gz
│ └── [16.0G] VH02001704_S4_R2_001.fastq.gz
├── [21.5G] SRR14730307
│ └── [21.5G] SRR14730307
├── SRR14730307
│ ├── [9.6G] VH02001708_S3_R1_001.fastq.gz
│ └── [23.2G] VH02001708_S3_R2_001.fastq.gz
├── [20.7G] SRR14730308
│ └── [20.7G] SRR14730308
├── SRR14730308
│ ├── [9.3G] VH02001710_S2_R1_001.fastq.gz
│ └── [22.1G] VH02001710_S2_R2_001.fastq.gz
├── [15.8G] SRR14730309
│ └── [15.8G] SRR14730309
└── SRR14730309
├── [7.2G] VH02001712_S1_R1_001.fastq.gz
└── [16.9G] VH02001712_S1_R2_001.fastq.gz
18 directories, 37 files
```
The `orig` directory contains the original fastq files. The fastq files are available for 10k and 100k subsamples in the `10k` and `100k` directories, respectively.
The `2-wells.fasta` file contains the barcodes for 2 wells.
## Test run
The pipeline can be run by creating a `params.yaml` file like this:
```yaml
param_list:
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R1_001.fastq"
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R2_001.fastq"
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
id: sample_one
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R1_001.fastq"
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R2_001.fastq"
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
id: sample_two
```
and then:
```bash
viash ns build --setup cb
nextflow run . -main-script target/nextflow/workflows/htrnaseq/main.nf \
-profile docker \
-c target/nextflow/workflows/htrnaseq/nextflow.config \
-params-file params.yaml \
-resume \
--publish_dir output
```
Or, by running `src/workflows/htrnaseq/integration_test.sh`.