129 lines
5.3 KiB
Markdown
129 lines
5.3 KiB
Markdown
|
|
# HT-RNAseq - A pipeline for processing high-throughput RNA-seq data
|
|||
|
|
|
|||
|
|
## Introduction
|
|||
|
|
__TODO__: Add a description of the pipeline here.
|
|||
|
|
|
|||
|
|
## Test data
|
|||
|
|
|
|||
|
|
As test data, we use [a DRUGseq dataset](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176150) from the [NCBI Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra).
|
|||
|
|
|
|||
|
|
The original data has been (partly) subsampled to reduce the test runtime. We used [seqtk](https://github.com/lh3/seqtk) for this with a seed of 1, e.g.:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
seqtk sample -s1 orig/SRR14730302/VH02001614_S8_R1_001.fastq.gz 10000 > 10k/SRR14730302/VH02001614_S8_R1_001.fastq.gz
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The data is available at: `gs://viash-hub-test-data/htrnaseq/v1/`:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
❯ gcstree -f viash-hub-test-data/htrnaseq/v1/
|
|||
|
|
viash-hub-test-data
|
|||
|
|
└── htrnaseq
|
|||
|
|
└── v1
|
|||
|
|
├── [ 48] 2-wells.fasta
|
|||
|
|
├── [465.3K] GSE176150_metadata.csv
|
|||
|
|
├── 100k
|
|||
|
|
│ ├── SRR14730301
|
|||
|
|
│ │ ├── [8.5M] VH02001612_S9_R1_001.fastq
|
|||
|
|
│ │ └── [14.9M] VH02001612_S9_R2_001.fastq
|
|||
|
|
│ └── SRR14730302
|
|||
|
|
│ ├── [8.5M] VH02001614_S8_R1_001.fastq.gz
|
|||
|
|
│ └── [14.9M] VH02001614_S8_R2_001.fastq.gz
|
|||
|
|
├── 10k
|
|||
|
|
│ ├── SRR14730301
|
|||
|
|
│ │ ├── [845.4K] VH02001612_S9_R1_001.fastq
|
|||
|
|
│ │ └── [1.5M] VH02001612_S9_R2_001.fastq
|
|||
|
|
│ └── SRR14730302
|
|||
|
|
│ ├── [845.3K] VH02001614_S8_R1_001.fastq.gz
|
|||
|
|
│ └── [1.5M] VH02001614_S8_R2_001.fastq.gz
|
|||
|
|
└── orig
|
|||
|
|
├── [20.4G] SRR14730301
|
|||
|
|
│ └── [20.4G] SRR14730301
|
|||
|
|
├── SRR14730301
|
|||
|
|
│ ├── [9.1G] VH02001612_S9_R1_001.fastq.gz
|
|||
|
|
│ └── [22.0G] VH02001612_S9_R2_001.fastq.gz
|
|||
|
|
├── [16.9G] SRR14730302
|
|||
|
|
│ └── [16.9G] SRR14730302
|
|||
|
|
├── SRR14730302
|
|||
|
|
│ ├── [7.6G] VH02001614_S8_R1_001.fastq.gz
|
|||
|
|
│ └── [18.0G] VH02001614_S8_R2_001.fastq.gz
|
|||
|
|
├── [18.0G] SRR14730303
|
|||
|
|
│ └── [18.0G] SRR14730303
|
|||
|
|
├── SRR14730303
|
|||
|
|
│ ├── [8.1G] VH02001618_S7_R1_001.fastq.gz
|
|||
|
|
│ └── [19.2G] VH02001618_S7_R2_001.fastq.gz
|
|||
|
|
├── [16.5G] SRR14730304
|
|||
|
|
│ └── [16.5G] SRR14730304
|
|||
|
|
├── SRR14730304
|
|||
|
|
│ ├── [7.5G] VH02001700_S6_R1_001.fastq.gz
|
|||
|
|
│ └── [17.8G] VH02001700_S6_R2_001.fastq.gz
|
|||
|
|
├── [19.0G] SRR14730305
|
|||
|
|
│ └── [19.0G] SRR14730305
|
|||
|
|
├── SRR14730305
|
|||
|
|
│ ├── [8.4G] VH02001702_S5_R1_001.fastq.gz
|
|||
|
|
│ └── [20.6G] VH02001702_S5_R2_001.fastq.gz
|
|||
|
|
├── [14.6G] SRR14730306
|
|||
|
|
│ └── [14.6G] SRR14730306
|
|||
|
|
├── SRR14730306
|
|||
|
|
│ ├── [6.6G] VH02001704_S4_R1_001.fastq.gz
|
|||
|
|
│ └── [16.0G] VH02001704_S4_R2_001.fastq.gz
|
|||
|
|
├── [21.5G] SRR14730307
|
|||
|
|
│ └── [21.5G] SRR14730307
|
|||
|
|
├── SRR14730307
|
|||
|
|
│ ├── [9.6G] VH02001708_S3_R1_001.fastq.gz
|
|||
|
|
│ └── [23.2G] VH02001708_S3_R2_001.fastq.gz
|
|||
|
|
├── [20.7G] SRR14730308
|
|||
|
|
│ └── [20.7G] SRR14730308
|
|||
|
|
├── SRR14730308
|
|||
|
|
│ ├── [9.3G] VH02001710_S2_R1_001.fastq.gz
|
|||
|
|
│ └── [22.1G] VH02001710_S2_R2_001.fastq.gz
|
|||
|
|
├── [15.8G] SRR14730309
|
|||
|
|
│ └── [15.8G] SRR14730309
|
|||
|
|
└── SRR14730309
|
|||
|
|
├── [7.2G] VH02001712_S1_R1_001.fastq.gz
|
|||
|
|
└── [16.9G] VH02001712_S1_R2_001.fastq.gz
|
|||
|
|
|
|||
|
|
18 directories, 37 files
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
|
|||
|
|
The `orig` directory contains the original fastq files. The fastq files are available for 10k and 100k subsamples in the `10k` and `100k` directories, respectively.
|
|||
|
|
|
|||
|
|
The `2-wells.fasta` file contains the barcodes for 2 wells.
|
|||
|
|
|
|||
|
|
## Test run
|
|||
|
|
|
|||
|
|
The pipeline can be run by creating a `params.yaml` file like this:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
param_list:
|
|||
|
|
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R1_001.fastq"
|
|||
|
|
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730301/VH02001612_S9_R2_001.fastq"
|
|||
|
|
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
|
|||
|
|
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
|
|||
|
|
id: sample_one
|
|||
|
|
- input_r1: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R1_001.fastq"
|
|||
|
|
input_r2: "gs://viash-hub-test-data/htrnaseq/v1/100k/SRR14730302/VH02001614_S8_R2_001.fastq"
|
|||
|
|
genomeDir: "gs://viash-hub-test-data/htrnaseq/v1/genomeDir/gencode.v41.star.sparse"
|
|||
|
|
barcodesFasta: "gs://viash-hub-test-data/htrnaseq/v1/2-wells.fasta"
|
|||
|
|
id: sample_two
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
and then:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
viash ns build --setup cb
|
|||
|
|
nextflow run . -main-script target/nextflow/workflows/htrnaseq/main.nf \
|
|||
|
|
-profile docker \
|
|||
|
|
-c target/nextflow/workflows/htrnaseq/nextflow.config \
|
|||
|
|
-params-file params.yaml \
|
|||
|
|
-resume \
|
|||
|
|
--publish_dir output
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or, by running `src/workflows/htrnaseq/integration_test.sh`.
|
|||
|
|
|
|||
|
|
|
|||
|
|
# Special Thanks
|
|||
|
|
|
|||
|
|
Developed in collaboration with Data Intuitive and Open Analytics.
|