Build branch main with version main (1e1ffb3)
Build pipeline: vsh-ci-dev-jsbwk
Source commit: 1e1ffb315f
Source message: Merge pull request #17 from viash-hub/add_biobox_modules
- Migrate a number of components to biobox
- Fix tests
- Reduce size of test resources
- Prepare for Viash Hub
This commit is contained in:
136
README.md
Normal file
136
README.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# RNAseq.vsh
|
||||
|
||||
<!-- README.md is generated by running 'quarto render README.qmd' -->
|
||||
|
||||
A version of the [nf-core/rnaseq](https://github.com/nf-core/rnaseq)
|
||||
pipeline (version 3.14.0) in the [Viash framework](http://www.viash.io).
|
||||
|
||||
## Rationale
|
||||
|
||||
We stick to the original nf-core pipeline as much as possible. This also
|
||||
means that we create a subworkflow for the 5 main stages of the pipeline
|
||||
as depicted in the [README](https://github.com/nf-core/rnaseq).
|
||||
|
||||
## Getting started
|
||||
|
||||
As test data, we can use the small dataset nf-core provided with [their
|
||||
`test`
|
||||
profile](https://github.com/nf-core/test-datasets/blob/rnaseq3/samplesheet/v3.10/samplesheet_test.csv):
|
||||
<https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004>.
|
||||
|
||||
A simple script has been provided to fetch those files from the github
|
||||
repository and store them under `testData/minimal_test` (the
|
||||
subdirectory is created to support `full_test` later as well):
|
||||
`bin/get_minimal_test_data.sh`.
|
||||
|
||||
Additionally, a script has been provided to fetch some additional
|
||||
resources for unit testing the components. Thes will be stored under
|
||||
`testData/unit_test_resources`: `bin/get_unit test_data.sh`
|
||||
|
||||
To get started, we need to:
|
||||
|
||||
1. [Install
|
||||
`nextflow`](https://www.nextflow.io/docs/latest/getstarted.html)
|
||||
system-wide
|
||||
|
||||
2. Fetch the test data:
|
||||
|
||||
``` bash
|
||||
bin/minimal_test.sh
|
||||
bin/get_minimal_test_data.sh
|
||||
```
|
||||
|
||||
## Running the pipeline
|
||||
|
||||
To actually run the pipeline, we first need to build the components and
|
||||
pipeline:
|
||||
|
||||
``` bash
|
||||
viash ns build --setup cb --parallel
|
||||
```
|
||||
|
||||
Now we can run the pipeline using the command:
|
||||
|
||||
``` bash
|
||||
nextflow run target/nextflow/workflows/pre_processing/main.nf \
|
||||
-profile docker \
|
||||
--id test \
|
||||
--input testData/minimal_test/SRR6357070_1.fastq.gz \
|
||||
--publish_dir testData/test_output/
|
||||
```
|
||||
|
||||
Alternatively, we can run the pipeline with a sample sheet using the
|
||||
built-in `--param_list` functionality: (Read file paths must be
|
||||
specified relative to the sample sheet’s path)
|
||||
|
||||
``` bash
|
||||
cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
|
||||
id,fastq_1,fastq_2,strandedness
|
||||
WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
|
||||
WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
|
||||
RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
|
||||
HERE
|
||||
|
||||
nextflow run target/nextflow/workflows/rnaseq/main.nf \
|
||||
--param_list testData/minimal_test/input_fastq/sample_sheet.csv \
|
||||
--publish_dir "test_results/full_pipeline_test" \
|
||||
--fasta testData/minimal_test/reference/genome.fasta \
|
||||
--gtf testData/minimal_test/reference/genes.gtf.gz \
|
||||
--transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
|
||||
-profile docker
|
||||
```
|
||||
|
||||
## Pipeline sub-workflows and components
|
||||
|
||||
The pipeline has 5 sub-workflows that can be run separately.
|
||||
|
||||
1. Prepare genome: This is a workflow for preparing all the reference
|
||||
data required for downstream analysis, i.e., uncompress provided
|
||||
reference data or generate the required index files (for STAR,
|
||||
Salmon, Kallisto, RSEM, BBSplit).
|
||||
|
||||
2. Pre-processing: This is a workflow for performing quality control on
|
||||
the input reads It performs FastQC, extracts UMIs, trims adapters,
|
||||
and removes ribosomal RNA reads. Adapters can be trimmed using
|
||||
either Trim galore! or fastp (work in progress).
|
||||
|
||||
3. Genome alignment and quantification: This is a workflow for
|
||||
performing genome alignment using STAR and transcript quantification
|
||||
using Salmon or RSEM (using RSEM’s built-in support for STAR) (work
|
||||
in progress). Alignment sorting and indexing, as well as computation
|
||||
of statistics from the BAM files is performed using Samtools.
|
||||
UMI-based deduplication is also performed.
|
||||
|
||||
4. Post-processing: This is a workflow for duplicate read marking
|
||||
(picard MarkDuplicates), transcript assembly and quantification
|
||||
(StringTie), and creation of bigWig coverage files.
|
||||
|
||||
5. Pseudo alignment and quantification: This is a workflow for
|
||||
performing pseudo alignment and transcript quantification using
|
||||
Salmon or Kallisto.
|
||||
|
||||
6. Final QC: This is a workflow for performing extensive quality
|
||||
control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts).
|
||||
It presents QC for raw reads, alignments, gene biotype, sample
|
||||
similarity, and strand specificity (MultiQC).
|
||||
|
||||
## Reusing components from biobox
|
||||
|
||||
At the moment, this pipeline makes use of the following components from
|
||||
[biobox](https://github.com/viash-hub/biobox):
|
||||
|
||||
- `gffread`
|
||||
- `star/star_genome_generate`
|
||||
- `star/star_align_reads`
|
||||
- `salmon/salmon_index`
|
||||
- `salmon/salmon_quant`
|
||||
- `featurecounts`
|
||||
- `samtools/samtools_sort`
|
||||
- `samtools/samtools_index`
|
||||
- `samtools/samtools_stats`
|
||||
- `samtools/samtools_flagstat`
|
||||
- `samtools/samtools_idxstats`
|
||||
- `multiqc` (work in progress - updating `assets/multiqc_config.yaml`)
|
||||
- `fastp` (work in progress)
|
||||
- `rsem/rsem_prepare_reference` (work in progress)
|
||||
- `rsem/rsem_calculate_expression` (work in progress)
|
||||
Reference in New Issue
Block a user