# HT-RNAseq - A pipeline for processing high-throughput RNA-seq data
## Introduction
__TODO__: Add a description of the pipeline here.
## Test data
As test data, we use [a DRUGseq dataset](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176150) from the [NCBI Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra).
The original data has been (partly) subsampled to reduce the test runtime. We used [seqtk](https://github.com/lh3/seqtk) for this with a seed of 1, e.g.:
The data is available at: `gs://viash-hub-test-data/htrnaseq/v1/`:
```
❯ gcstree -f viash-hub-test-data/htrnaseq/v1/
viash-hub-test-data
└── htrnaseq
└── v1
├── [ 48] 2-wells.fasta
├── [465.3K] GSE176150_metadata.csv
├── 100k
│ ├── SRR14730301
│ │ ├── [8.5M] VH02001612_S9_R1_001.fastq
│ │ └── [14.9M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [8.5M] VH02001614_S8_R1_001.fastq.gz
│ └── [14.9M] VH02001614_S8_R2_001.fastq.gz
├── 10k
│ ├── SRR14730301
│ │ ├── [845.4K] VH02001612_S9_R1_001.fastq
│ │ └── [1.5M] VH02001612_S9_R2_001.fastq
│ └── SRR14730302
│ ├── [845.3K] VH02001614_S8_R1_001.fastq.gz
│ └── [1.5M] VH02001614_S8_R2_001.fastq.gz
└── orig
├── [20.4G] SRR14730301
│ └── [20.4G] SRR14730301
├── SRR14730301
│ ├── [9.1G] VH02001612_S9_R1_001.fastq.gz
│ └── [22.0G] VH02001612_S9_R2_001.fastq.gz
├── [16.9G] SRR14730302
│ └── [16.9G] SRR14730302
├── SRR14730302
│ ├── [7.6G] VH02001614_S8_R1_001.fastq.gz
│ └── [18.0G] VH02001614_S8_R2_001.fastq.gz
├── [18.0G] SRR14730303
│ └── [18.0G] SRR14730303
├── SRR14730303
│ ├── [8.1G] VH02001618_S7_R1_001.fastq.gz
│ └── [19.2G] VH02001618_S7_R2_001.fastq.gz
├── [16.5G] SRR14730304
│ └── [16.5G] SRR14730304
├── SRR14730304
│ ├── [7.5G] VH02001700_S6_R1_001.fastq.gz
│ └── [17.8G] VH02001700_S6_R2_001.fastq.gz
├── [19.0G] SRR14730305
│ └── [19.0G] SRR14730305
├── SRR14730305
│ ├── [8.4G] VH02001702_S5_R1_001.fastq.gz
│ └── [20.6G] VH02001702_S5_R2_001.fastq.gz
├── [14.6G] SRR14730306
│ └── [14.6G] SRR14730306
├── SRR14730306
│ ├── [6.6G] VH02001704_S4_R1_001.fastq.gz
│ └── [16.0G] VH02001704_S4_R2_001.fastq.gz
├── [21.5G] SRR14730307
│ └── [21.5G] SRR14730307
├── SRR14730307
│ ├── [9.6G] VH02001708_S3_R1_001.fastq.gz
│ └── [23.2G] VH02001708_S3_R2_001.fastq.gz
├── [20.7G] SRR14730308
│ └── [20.7G] SRR14730308
├── SRR14730308
│ ├── [9.3G] VH02001710_S2_R1_001.fastq.gz
│ └── [22.1G] VH02001710_S2_R2_001.fastq.gz
├── [15.8G] SRR14730309
│ └── [15.8G] SRR14730309
└── SRR14730309
├── [7.2G] VH02001712_S1_R1_001.fastq.gz
└── [16.9G] VH02001712_S1_R2_001.fastq.gz
18 directories, 37 files
```
The `orig` directory contains the original fastq files. The fastq files are available for 10k and 100k subsamples in the `10k` and `100k` directories, respectively.
The `2-wells.fasta` file contains the barcodes for 2 wells.
## Test run
The pipeline can be run by creating a `params.yaml` file like this: