Build branch main with version main (1e1ffb3)
Build pipeline: vsh-ci-dev-jsbwk
Source commit: 1e1ffb315f
Source message: Merge pull request #17 from viash-hub/add_biobox_modules
- Migrate a number of components to biobox
- Fix tests
- Reduce size of test resources
- Prepare for Viash Hub
This commit is contained in:
5
.gitignore
vendored
Normal file
5
.gitignore
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
.nextflow*
|
||||
work
|
||||
testData
|
||||
test_results
|
||||
target
|
||||
136
README.md
Normal file
136
README.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# RNAseq.vsh
|
||||
|
||||
<!-- README.md is generated by running 'quarto render README.qmd' -->
|
||||
|
||||
A version of the [nf-core/rnaseq](https://github.com/nf-core/rnaseq)
|
||||
pipeline (version 3.14.0) in the [Viash framework](http://www.viash.io).
|
||||
|
||||
## Rationale
|
||||
|
||||
We stick to the original nf-core pipeline as much as possible. This also
|
||||
means that we create a subworkflow for the 5 main stages of the pipeline
|
||||
as depicted in the [README](https://github.com/nf-core/rnaseq).
|
||||
|
||||
## Getting started
|
||||
|
||||
As test data, we can use the small dataset nf-core provided with [their
|
||||
`test`
|
||||
profile](https://github.com/nf-core/test-datasets/blob/rnaseq3/samplesheet/v3.10/samplesheet_test.csv):
|
||||
<https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004>.
|
||||
|
||||
A simple script has been provided to fetch those files from the github
|
||||
repository and store them under `testData/minimal_test` (the
|
||||
subdirectory is created to support `full_test` later as well):
|
||||
`bin/get_minimal_test_data.sh`.
|
||||
|
||||
Additionally, a script has been provided to fetch some additional
|
||||
resources for unit testing the components. Thes will be stored under
|
||||
`testData/unit_test_resources`: `bin/get_unit test_data.sh`
|
||||
|
||||
To get started, we need to:
|
||||
|
||||
1. [Install
|
||||
`nextflow`](https://www.nextflow.io/docs/latest/getstarted.html)
|
||||
system-wide
|
||||
|
||||
2. Fetch the test data:
|
||||
|
||||
``` bash
|
||||
bin/minimal_test.sh
|
||||
bin/get_minimal_test_data.sh
|
||||
```
|
||||
|
||||
## Running the pipeline
|
||||
|
||||
To actually run the pipeline, we first need to build the components and
|
||||
pipeline:
|
||||
|
||||
``` bash
|
||||
viash ns build --setup cb --parallel
|
||||
```
|
||||
|
||||
Now we can run the pipeline using the command:
|
||||
|
||||
``` bash
|
||||
nextflow run target/nextflow/workflows/pre_processing/main.nf \
|
||||
-profile docker \
|
||||
--id test \
|
||||
--input testData/minimal_test/SRR6357070_1.fastq.gz \
|
||||
--publish_dir testData/test_output/
|
||||
```
|
||||
|
||||
Alternatively, we can run the pipeline with a sample sheet using the
|
||||
built-in `--param_list` functionality: (Read file paths must be
|
||||
specified relative to the sample sheet’s path)
|
||||
|
||||
``` bash
|
||||
cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
|
||||
id,fastq_1,fastq_2,strandedness
|
||||
WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
|
||||
WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
|
||||
RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
|
||||
HERE
|
||||
|
||||
nextflow run target/nextflow/workflows/rnaseq/main.nf \
|
||||
--param_list testData/minimal_test/input_fastq/sample_sheet.csv \
|
||||
--publish_dir "test_results/full_pipeline_test" \
|
||||
--fasta testData/minimal_test/reference/genome.fasta \
|
||||
--gtf testData/minimal_test/reference/genes.gtf.gz \
|
||||
--transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
|
||||
-profile docker
|
||||
```
|
||||
|
||||
## Pipeline sub-workflows and components
|
||||
|
||||
The pipeline has 5 sub-workflows that can be run separately.
|
||||
|
||||
1. Prepare genome: This is a workflow for preparing all the reference
|
||||
data required for downstream analysis, i.e., uncompress provided
|
||||
reference data or generate the required index files (for STAR,
|
||||
Salmon, Kallisto, RSEM, BBSplit).
|
||||
|
||||
2. Pre-processing: This is a workflow for performing quality control on
|
||||
the input reads It performs FastQC, extracts UMIs, trims adapters,
|
||||
and removes ribosomal RNA reads. Adapters can be trimmed using
|
||||
either Trim galore! or fastp (work in progress).
|
||||
|
||||
3. Genome alignment and quantification: This is a workflow for
|
||||
performing genome alignment using STAR and transcript quantification
|
||||
using Salmon or RSEM (using RSEM’s built-in support for STAR) (work
|
||||
in progress). Alignment sorting and indexing, as well as computation
|
||||
of statistics from the BAM files is performed using Samtools.
|
||||
UMI-based deduplication is also performed.
|
||||
|
||||
4. Post-processing: This is a workflow for duplicate read marking
|
||||
(picard MarkDuplicates), transcript assembly and quantification
|
||||
(StringTie), and creation of bigWig coverage files.
|
||||
|
||||
5. Pseudo alignment and quantification: This is a workflow for
|
||||
performing pseudo alignment and transcript quantification using
|
||||
Salmon or Kallisto.
|
||||
|
||||
6. Final QC: This is a workflow for performing extensive quality
|
||||
control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts).
|
||||
It presents QC for raw reads, alignments, gene biotype, sample
|
||||
similarity, and strand specificity (MultiQC).
|
||||
|
||||
## Reusing components from biobox
|
||||
|
||||
At the moment, this pipeline makes use of the following components from
|
||||
[biobox](https://github.com/viash-hub/biobox):
|
||||
|
||||
- `gffread`
|
||||
- `star/star_genome_generate`
|
||||
- `star/star_align_reads`
|
||||
- `salmon/salmon_index`
|
||||
- `salmon/salmon_quant`
|
||||
- `featurecounts`
|
||||
- `samtools/samtools_sort`
|
||||
- `samtools/samtools_index`
|
||||
- `samtools/samtools_stats`
|
||||
- `samtools/samtools_flagstat`
|
||||
- `samtools/samtools_idxstats`
|
||||
- `multiqc` (work in progress - updating `assets/multiqc_config.yaml`)
|
||||
- `fastp` (work in progress)
|
||||
- `rsem/rsem_prepare_reference` (work in progress)
|
||||
- `rsem/rsem_calculate_expression` (work in progress)
|
||||
107
README.qmd
Normal file
107
README.qmd
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
title: RNAseq.vsh
|
||||
format: gfm
|
||||
---
|
||||
|
||||
<!-- README.md is generated by running 'quarto render README.qmd' -->
|
||||
|
||||
```{r, echo = FALSE, message = FALSE, error = FALSE, warning = FALSE}
|
||||
library(tidyverse)
|
||||
```
|
||||
|
||||
A version of the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline (version 3.14.0) in the [Viash framework](http://www.viash.io).
|
||||
|
||||
## Rationale
|
||||
|
||||
We stick to the original nf-core pipeline as much as possible. This also means that we create a subworkflow for the 5 main stages of the pipeline as depicted in the [README](https://github.com/nf-core/rnaseq).
|
||||
|
||||
## Getting started
|
||||
|
||||
As test data, we can use the small dataset nf-core provided with [their `test` profile](https://github.com/nf-core/test-datasets/blob/rnaseq3/samplesheet/v3.10/samplesheet_test.csv): <https://github.com/nf-core/test-datasets/tree/rnaseq3/testdata/GSE110004>.
|
||||
|
||||
A simple script has been provided to fetch those files from the github repository and store them under `testData/minimal_test` (the subdirectory is created to support `full_test` later as well): `bin/get_minimal_test_data.sh`.
|
||||
|
||||
Additionally, a script has been provided to fetch some additional resources for unit testing the components. Thes will be stored under `testData/unit_test_resources`: `bin/get_unit test_data.sh`
|
||||
|
||||
To get started, we need to:
|
||||
|
||||
1. [Install `nextflow`](https://www.nextflow.io/docs/latest/getstarted.html) system-wide
|
||||
|
||||
2. Fetch the test data:
|
||||
|
||||
``` bash
|
||||
bin/minimal_test.sh
|
||||
bin/get_minimal_test_data.sh
|
||||
```
|
||||
|
||||
## Running the pipeline
|
||||
|
||||
To actually run the pipeline, we first need to build the components and pipeline:
|
||||
|
||||
``` bash
|
||||
viash ns build --setup cb --parallel
|
||||
```
|
||||
|
||||
Now we can run the pipeline using the command:
|
||||
|
||||
``` bash
|
||||
nextflow run target/nextflow/workflows/pre_processing/main.nf \
|
||||
-profile docker \
|
||||
--id test \
|
||||
--fastq_1 testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz \
|
||||
--publish_dir testData/test_output/
|
||||
```
|
||||
|
||||
Alternatively, we can run the pipeline with a sample sheet using the built-in `--param_list` functionality: (Read file paths must be specified relative to the sample sheet's path)
|
||||
|
||||
``` bash
|
||||
cat > testData/minimal_test/input_fastq/sample_sheet.csv << HERE
|
||||
id,fastq_1,fastq_2,strandedness
|
||||
WT_REP1,SRR6357070_1.fastq.gz;SRR6357071_1.fastq.gz,SRR6357070_2.fastq.gz;SRR6357071_2.fastq.gz,reverse
|
||||
WT_REP2,SRR6357072_1.fastq.gz,SRR6357072_2.fastq.gz,reverse
|
||||
RAP1_UNINDUCED_REP1,SRR6357073_1.fastq.gz,,reverse
|
||||
HERE
|
||||
|
||||
nextflow run target/nextflow/workflows/rnaseq/main.nf \
|
||||
--param_list testData/minimal_test/input_fastq/sample_sheet.csv \
|
||||
--publish_dir "test_results/full_pipeline_test" \
|
||||
--fasta testData/minimal_test/reference/genome.fasta \
|
||||
--gtf testData/minimal_test/reference/genes.gtf.gz \
|
||||
--transcript_fasta testData/minimal_test/reference/transcriptome.fasta \
|
||||
-profile docker
|
||||
```
|
||||
|
||||
## Pipeline sub-workflows and components
|
||||
|
||||
The pipeline has 5 sub-workflows that can be run separately.
|
||||
|
||||
1. Prepare genome: This is a workflow for preparing all the reference data required for downstream analysis, i.e., uncompress provided reference data or generate the required index files (for STAR, Salmon, Kallisto, RSEM, BBSplit).
|
||||
|
||||
2. Pre-processing: This is a workflow for performing quality control on the input reads It performs FastQC, extracts UMIs, trims adapters, and removes ribosomal RNA reads. Adapters can be trimmed using either Trim galore! or fastp (work in progress).
|
||||
|
||||
3. Genome alignment and quantification: This is a workflow for performing genome alignment using STAR and transcript quantification using Salmon or RSEM (using RSEM's built-in support for STAR) (work in progress). Alignment sorting and indexing, as well as computation of statistics from the BAM files is performed using Samtools. UMI-based deduplication is also performed.
|
||||
|
||||
4. Post-processing: This is a workflow for duplicate read marking (picard MarkDuplicates), transcript assembly and quantification (StringTie), and creation of bigWig coverage files.
|
||||
|
||||
5. Pseudo alignment and quantification: This is a workflow for performing pseudo alignment and transcript quantification using Salmon or Kallisto.
|
||||
|
||||
6. Final QC: This is a workflow for performing extensive quality control (RSeQC, dupRadar, Qualimap, Preseq, DESeq2, featureCounts). It presents QC for raw reads, alignments, gene biotype, sample similarity, and strand specificity (MultiQC).
|
||||
|
||||
## Reusing components from biobox
|
||||
At the moment, this pipeline makes use of the following components from [biobox](https://github.com/viash-hub/biobox):
|
||||
|
||||
* `gffread`
|
||||
* `star/star_genome_generate`
|
||||
* `star/star_align_reads`
|
||||
* `salmon/salmon_index`
|
||||
* `salmon/salmon_quant`
|
||||
* `featurecounts`
|
||||
* `samtools/samtools_sort`
|
||||
* `samtools/samtools_index`
|
||||
* `samtools/samtools_stats`
|
||||
* `samtools/samtools_flagstat`
|
||||
* `samtools/samtools_idxstats`
|
||||
* `multiqc` (work in progress - updating `assets/multiqc_config.yaml`)
|
||||
* `fastp` (work in progress)
|
||||
* `rsem/rsem_prepare_reference` (work in progress)
|
||||
* `rsem/rsem_calculate_expression` (work in progress)
|
||||
13
_viash.yaml
Normal file
13
_viash.yaml
Normal file
@@ -0,0 +1,13 @@
|
||||
viash_version: 0.9.0
|
||||
|
||||
source: src
|
||||
target: target
|
||||
|
||||
info:
|
||||
test_resources:
|
||||
- path: gs://viash-hub-test-data/rnaseq/v1
|
||||
dest: testData
|
||||
|
||||
config_mods: |
|
||||
.requirements.commands := ['ps']
|
||||
.runners[.type == 'nextflow'].directives.tag := '$id'
|
||||
25
assets/methods_description_template.yml
Normal file
25
assets/methods_description_template.yml
Normal file
@@ -0,0 +1,25 @@
|
||||
id: "rnaseq.vsh-methods-description"
|
||||
description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication."
|
||||
section_name: "nf-core/rnaseq Methods Description"
|
||||
section_href: "https://github.com/nf-core/rnaseq"
|
||||
plot_type: "html"
|
||||
|
||||
data: |
|
||||
<h4>Methods</h4>
|
||||
<p>Data was processed using rnaseq.vsh which is a version of the nf-core/rnaseq (v.3.14.0) workflow wriiten using the Viash framework .</p>
|
||||
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
|
||||
<pre><code>${workflow.commandLine}</code></pre>
|
||||
<h4>References</h4>
|
||||
<ul>
|
||||
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. <a href="https://doi.org/10.1038/nbt.3820">https://doi.org/10.1038/nbt.3820</a></li>
|
||||
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. <a href="https://doi.org/10.1038/s41587-020-0439-x">https://doi.org/10.1038/s41587-020-0439-x</a></li>
|
||||
<li>VIASH</li>
|
||||
</ul>
|
||||
<div class="alert alert-info">
|
||||
<h5>Notes:</h5>
|
||||
<ul>
|
||||
${nodoi_text}
|
||||
<li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
|
||||
<li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
|
||||
</ul>
|
||||
</div>
|
||||
11
assets/multiqc/biotypes_header.txt
Normal file
11
assets/multiqc/biotypes_header.txt
Normal file
@@ -0,0 +1,11 @@
|
||||
# id: 'biotype_counts'
|
||||
# section_name: 'Biotype Counts'
|
||||
# description: "shows reads overlapping genomic features of different biotypes,
|
||||
# counted by <a href='http://bioinf.wehi.edu.au/featureCounts'>featureCounts</a>."
|
||||
# plot_type: 'bargraph'
|
||||
# anchor: 'featurecounts_biotype'
|
||||
# pconfig:
|
||||
# id: "featurecounts_biotype_plot"
|
||||
# title: "featureCounts: Biotypes"
|
||||
# xlab: "# Reads"
|
||||
# cpswitch_counts_label: "Number of Reads"
|
||||
12
assets/multiqc/deseq2_clustering_header.txt
Normal file
12
assets/multiqc/deseq2_clustering_header.txt
Normal file
@@ -0,0 +1,12 @@
|
||||
#id: 'deseq2_clustering'
|
||||
#section_name: 'DESeq2 sample similarity'
|
||||
#description: "is generated from clustering by Euclidean distances between
|
||||
# <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
|
||||
# rlog values for each sample
|
||||
# in the <a href='https://github.com/nf-core/rnaseq/blob/master/bin/deseq2_qc.r'><code>deseq2_qc.r</code></a> script."
|
||||
#plot_type: 'heatmap'
|
||||
#anchor: 'deseq2_clustering'
|
||||
#pconfig:
|
||||
# title: 'DESeq2: Heatmap of the sample-to-sample distances'
|
||||
# xlab: True
|
||||
# reverseColors: True
|
||||
11
assets/multiqc/deseq2_pca_header.txt
Normal file
11
assets/multiqc/deseq2_pca_header.txt
Normal file
@@ -0,0 +1,11 @@
|
||||
#id: 'deseq2_pca'
|
||||
#section_name: 'DESeq2 PCA plot'
|
||||
#description: "PCA plot between samples in the experiment.
|
||||
# These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
|
||||
# in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/deseq2_qc.r'><code>deseq2_qc.r</code></a> script."
|
||||
#plot_type: 'scatter'
|
||||
#anchor: 'deseq2_pca'
|
||||
#pconfig:
|
||||
# title: 'DESeq2: Principal component plot'
|
||||
# xlab: PC1
|
||||
# ylab: PC2
|
||||
167
assets/multiqc_config.yml
Normal file
167
assets/multiqc_config.yml
Normal file
@@ -0,0 +1,167 @@
|
||||
report_comment: >
|
||||
This report has been generated by the <a href="https://github.com/data-intuitive/rnaseq.vsh" </a>
|
||||
analysis pipeline.
|
||||
report_section_order:
|
||||
"rnaseq.vsh-methods-description":
|
||||
order: -1000
|
||||
software_versions:
|
||||
order: -1001
|
||||
"rnaseq.vsh-summary":
|
||||
order: -1002
|
||||
|
||||
export_plots: true
|
||||
|
||||
# Run only these modules
|
||||
run_modules:
|
||||
- custom_content
|
||||
- fastqc
|
||||
- cutadapt
|
||||
- fastp
|
||||
- sortmerna
|
||||
- star
|
||||
# - hisat2
|
||||
- rsem
|
||||
- salmon
|
||||
- kallisto
|
||||
- samtools
|
||||
- picard
|
||||
- preseq
|
||||
- rseqc
|
||||
- qualimap
|
||||
|
||||
# Order of modules
|
||||
top_modules:
|
||||
- "fail_trimming"
|
||||
- "fail_mapping"
|
||||
- "fail_strand"
|
||||
- "star_rsem_deseq2_pca"
|
||||
- "star_rsem_deseq2_clustering"
|
||||
- "star_salmon_deseq2_pca"
|
||||
- "star_salmon_deseq2_clustering"
|
||||
- "salmon_deseq2_pca"
|
||||
- "salmon_deseq2_clustering"
|
||||
- "kallisto_deseq2_pca"
|
||||
- "kallisto_deseq2_clustering"
|
||||
- "biotype_counts"
|
||||
- "dupradar"
|
||||
|
||||
module_order:
|
||||
- fastqc:
|
||||
name: "FastQC (raw)"
|
||||
info: "This section of the report shows FastQC results before adapter trimming."
|
||||
path_filters:
|
||||
- "*.read_*.fastqc.zip"
|
||||
- cutadapt
|
||||
- fastp
|
||||
- fastqc:
|
||||
name: "FastQC (trimmed)"
|
||||
info: "This section of the report shows FastQC results after adapter trimming."
|
||||
path_filters:
|
||||
- "*.trimgalore.read_*.fastqc.zip"
|
||||
|
||||
# Don't show % Dups in the General Stats table (we have this from Picard)
|
||||
table_columns_visible:
|
||||
fastqc:
|
||||
percent_duplicates: False
|
||||
|
||||
extra_fn_clean_exts:
|
||||
- ".salmon_quant"
|
||||
- ".mapping_quality"
|
||||
- ".genome_sorted"
|
||||
- ".MarkDuplicates"
|
||||
- ".MarkDuplicates_flagstat"
|
||||
- ".MarkDuplicates_stats"
|
||||
- ".genome_sorted_MarkDuplicates"
|
||||
- ".star_aligned"
|
||||
- ".read_1"
|
||||
- ".read_2"
|
||||
|
||||
# See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml
|
||||
custom_data:
|
||||
fail_trimming:
|
||||
section_name: "WARNING: Fail Trimming Check"
|
||||
description: "List of samples that failed the minimum trimmed reads threshold specified via the '--min_trimmed_reads' parameter, and hence were ignored for the downstream processing steps."
|
||||
plot_type: "table"
|
||||
pconfig:
|
||||
id: "fail_trimmed_samples_table"
|
||||
table_title: "Samples failed trimming threshold"
|
||||
namespace: "Samples failed trimming threshold"
|
||||
format: "{:.0f}"
|
||||
fail_mapping:
|
||||
section_name: "WARNING: Fail Alignment Check"
|
||||
description: "List of samples that failed the STAR minimum mapped reads threshold specified via the '--min_mapped_reads' parameter, and hence were ignored for the downstream processing steps."
|
||||
plot_type: "table"
|
||||
pconfig:
|
||||
id: "fail_mapped_samples_table"
|
||||
table_title: "Samples failed mapping threshold"
|
||||
namespace: "Samples failed mapping threshold"
|
||||
format: "{:.2f}"
|
||||
fail_strand:
|
||||
section_name: "WARNING: Fail Strand Check"
|
||||
description: "List of samples that failed the strandedness check between that provided in the samplesheet and calculated by the <a href='http://rseqc.sourceforge.net/#infer-experiment-py'>RSeQC infer_experiment.py</a> tool."
|
||||
plot_type: "table"
|
||||
pconfig:
|
||||
id: "fail_strand_check_table"
|
||||
table_title: "Samples failed strandedness check"
|
||||
namespace: "Samples failed strandedness check"
|
||||
format: "{:.2f}"
|
||||
|
||||
# Customise the module search patterns to speed up execution time
|
||||
# - Skip module sub-tools that we are not interested in
|
||||
# - Replace file-content searching with filename pattern searching
|
||||
# - Don't add anything that is the same as the MultiQC default
|
||||
# See https://multiqc.info/docs/#optimise-file-search-patterns for details
|
||||
sp:
|
||||
|
||||
fastqc/zip:
|
||||
fn: "*.fastqc.zip"
|
||||
|
||||
cutadapt:
|
||||
fn: "*.trimming_report.txt"
|
||||
|
||||
fastp:
|
||||
fn: "*.fastp.json"
|
||||
|
||||
sortmerna:
|
||||
fn: "*sortmerna*.log"
|
||||
|
||||
star:
|
||||
fn: "*.star_aligned.log.final.out"
|
||||
|
||||
# hisat2:
|
||||
# fn: "*.hisat2.summary.log"
|
||||
|
||||
salmon/meta:
|
||||
fn: "*meta_info.json"
|
||||
|
||||
preseq:
|
||||
fn: "*.lc_extrap.txt"
|
||||
|
||||
samtools/stats:
|
||||
fn: "*.stats"
|
||||
samtools/flagstat:
|
||||
fn: "*.flagstat"
|
||||
samtools/idxstats:
|
||||
fn: "*.idxstats*"
|
||||
|
||||
rseqc/bam_stat:
|
||||
fn: "*.mapping_quality.txt"
|
||||
rseqc/junction_saturation:
|
||||
fn: "*.junction_saturation_plot.r"
|
||||
rseqc/junction_annotation:
|
||||
fn: "*.junction_annotation.log"
|
||||
rseqc/read_duplication_pos:
|
||||
fn: "*.duplication_rate_mapping.xls"
|
||||
rseqc/read_distribution:
|
||||
fn: "*.read_distribution.txt"
|
||||
rseqc/infer_experiment:
|
||||
fn: "*.strandedness.txt"
|
||||
rseqc/inner_distance:
|
||||
fn: "*.inner_distance_freq.txt"
|
||||
rseqc/tin:
|
||||
fn: "*.tin_summary.txt"
|
||||
|
||||
picard/markdups:
|
||||
fn: "*.MarkDuplicates.metrics.txt"
|
||||
|
||||
skip_versions_section: true
|
||||
8
assets/rrna-db-defaults.txt
Normal file
8
assets/rrna-db-defaults.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/rfam-5.8s-database-id98.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/rfam-5s-database-id98.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-arc-16s-id95.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-arc-23s-id98.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-bac-16s-id90.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-bac-23s-id98.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-euk-18s-id95.fasta
|
||||
https://raw.githubusercontent.com/biocore/sortmerna/v4.3.4/data/rRNA_databases/silva-euk-28s-id98.fasta
|
||||
105
bin/get_minimal_test_data.sh
Executable file
105
bin/get_minimal_test_data.sh
Executable file
@@ -0,0 +1,105 @@
|
||||
#!/bin/bash
|
||||
|
||||
CURR=`pwd`
|
||||
|
||||
### Get input fastq files for the minimal test
|
||||
|
||||
DEST_FASTQ="testData/minimal_test/input_fastq"
|
||||
mkdir -p $DEST_FASTQ
|
||||
cd $DEST_FASTQ
|
||||
|
||||
echo "Fetching FastQ files..."
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357070_1.fastq.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357070_2.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357071_1.fastq.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357071_2.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357072_1.fastq.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357072_2.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357073_1.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357074_1.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357075_1.fastq.gz
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357076_1.fastq.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/testdata/GSE110004/SRR6357076_2.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357070_1.fastq.gz
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357070_2.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357071_1.fastq.gz
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357071_2.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357072_1.fastq.gz
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357072_2.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357073_1.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357074_1.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357075_1.fastq.gz
|
||||
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357076_1.fastq.gz
|
||||
wget https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/GSE110004/SRR6357076_2.fastq.gz
|
||||
|
||||
cd $CURR
|
||||
|
||||
### Get reference files for the minimal test
|
||||
|
||||
DEST_REF="testData/minimal_test/reference"
|
||||
mkdir -p $DEST_REF
|
||||
cd $DEST_REF
|
||||
|
||||
echo "Fetching reference data..."
|
||||
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/bbsplit_fasta_list.txt
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genes.gff.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genes.gtf.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/genome.fasta
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/gfp.fa.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/hisat2.tar.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/rsem.tar.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/salmon.tar.gz
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3/reference/transcriptome.fasta
|
||||
|
||||
wget https://raw.githubusercontent.com/nf-core/rnaseq/3.12.0/assets/rrna-db-defaults.txt
|
||||
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genome.fasta
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gtf.gz
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/genes.gff.gz
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/transcriptome.fasta
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/gfp.fa.gz
|
||||
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/bbsplit_fasta_list.txt
|
||||
# wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/hisat2.tar.gz
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/salmon.tar.gz
|
||||
wget https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/reference/rsem.tar.gz
|
||||
|
||||
cd $CURR
|
||||
|
||||
NEWDEST1_REF="$CURR/testData/minimal_test/reference/rRNA"
|
||||
mkdir -p $NEWDEST1_REF
|
||||
cd $NEWDEST1_REF
|
||||
for LINE in `cat ../rrna-db-defaults.txt`
|
||||
do
|
||||
wget $LINE
|
||||
done
|
||||
cd $CURR
|
||||
find $NEWDEST1_REF -type f > $DEST_REF/rrna-db-defaults.txt
|
||||
|
||||
NEWDEST2_REF="$CURR/testData/minimal_test/reference/bbsplit_fasta"
|
||||
mkdir -p $NEWDEST2_REF
|
||||
while IFS=, read -r -a line; do
|
||||
url="${line[1]}"
|
||||
name="$NEWDEST2_REF/${line[0]}.fa"
|
||||
wget $url -O "$name"
|
||||
line+=("$name")
|
||||
IFS=','
|
||||
echo "${line[*]}" >> "$NEWDEST2_REF/tmp.txt"
|
||||
done < "$DEST_REF/bbsplit_fasta_list.txt"
|
||||
cut -d',' -f1,3 "$NEWDEST2_REF/tmp.txt" > "$DEST_REF/bbsplit_fasta_list.txt"
|
||||
rm "$NEWDEST2_REF/tmp.txt"
|
||||
50
bin/get_unit_test_data.sh
Executable file
50
bin/get_unit_test_data.sh
Executable file
@@ -0,0 +1,50 @@
|
||||
#!/bin/bash
|
||||
|
||||
CURR=`pwd`
|
||||
DEST="testData/unit_test_resources"
|
||||
mkdir -p $DEST
|
||||
cd $DEST
|
||||
|
||||
echo "Fetching unit test resources..."
|
||||
|
||||
## UMI_TOOLS
|
||||
# extract
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/slim.fastq.gz
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.1.gz
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.2.gz
|
||||
# dedup
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/chr19.bam
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/chr19.bam.bai
|
||||
|
||||
# MultiQC
|
||||
wget https://multiqc.info/examples/rna-seq/data.zip
|
||||
|
||||
# dupRadar
|
||||
wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/genes.gtf
|
||||
wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
|
||||
wget https://github.com/ssayols/dupRadar/raw/master/inst/extdata/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam.bai
|
||||
|
||||
|
||||
### Resources from https://github.com/snakemake/snakemake-wrappers/tree/master/bio
|
||||
# DESeq2
|
||||
wget https://github.com/snakemake/snakemake-wrappers/raw/master/bio/deseq2/deseqdataset/test/dataset/counts.tsv
|
||||
|
||||
# preseq lc_extrap
|
||||
wget https://github.com/snakemake/snakemake-wrappers/raw/master/bio/preseq/lc_extrap/test/samples/a.sorted.bed
|
||||
wget https://github.com/smithlabcode/preseq/raw/master/data/SRR1106616_5M_subset.bam
|
||||
|
||||
|
||||
### nf-core test datasets
|
||||
# sarscov2
|
||||
mkdir -p sarscov2
|
||||
wget -O sarscov2/genome.sizes https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.sizes
|
||||
wget -O sarscov2/test.bedgraph https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/bedgraph/test.bedgraph
|
||||
wget -O sarscov2/genome.fasta https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta
|
||||
wget -O sarscov2/genome.fasta.fai https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta.fai
|
||||
wget -O sarscov2/test.paired_end.sorted.bam https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam
|
||||
wget -O sarscov2/test.paired_end.sorted.bam.bai https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam.bai
|
||||
wget -O sarscov2/test.bed https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/bed/test.bed
|
||||
wget -O sarscov2/test.bed12 https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/bed/test.bed12
|
||||
wget -O sarscov2/genome.gtf https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.gtf
|
||||
|
||||
cd $CURR
|
||||
3
main.nf
Normal file
3
main.nf
Normal file
@@ -0,0 +1,3 @@
|
||||
workflow {
|
||||
print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
|
||||
}
|
||||
27
nextflow.config
Normal file
27
nextflow.config
Normal file
@@ -0,0 +1,27 @@
|
||||
// template nextflow.config for nested workflows
|
||||
|
||||
manifest {
|
||||
nextflowVersion = '!>=20.12.1-edge'
|
||||
}
|
||||
|
||||
docker {
|
||||
fixOwnership = true
|
||||
}
|
||||
|
||||
|
||||
// TODO 1: unquote and adapt `rootDir` according to relative path within project
|
||||
// params {
|
||||
// rootDir = "$projectDir/../.."
|
||||
// }
|
||||
//
|
||||
// workflowDir = "${params.rootDir}/workflows"
|
||||
// targetDir = "${params.rootDir}/target/nextflow"
|
||||
|
||||
// TODO 2: insert custom imports here
|
||||
|
||||
// TODO 3: unquote
|
||||
// docker {
|
||||
// runOptions = "-v \$(realpath ${params.rootDir}):\$(realpath ${params.rootDir})"
|
||||
// }
|
||||
|
||||
|
||||
89
src/bbmap_bbsplit/config.vsh.yaml
Normal file
89
src/bbmap_bbsplit/config.vsh.yaml
Normal file
@@ -0,0 +1,89 @@
|
||||
name: "bbmap_bbsplit"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/bbmap/bbsplit/main.nf, modules/nf-core/bbmap/bbsplit/meta.yml]
|
||||
last_sha: 277bd337739a8b8f753fa7b5eda6743b9b6acb89
|
||||
|
||||
description: |
|
||||
Split sequencing reads by mapping them to multiple references simultaneously.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--id"
|
||||
type: string
|
||||
description: Sample ID
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
default: false
|
||||
description: Paired fastq files or not?
|
||||
- name: "--input"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
description: Input fastq files, either one or two (paired)
|
||||
example: sample.fastq
|
||||
- name: "--primary_ref"
|
||||
type: file
|
||||
description: Primary reference FASTA
|
||||
- name: "--bbsplit_fasta_list"
|
||||
type: file
|
||||
description: Path to comma-separated file containing a list of reference genomes to filter reads against with BBSplit.
|
||||
- name: "--only_build_index"
|
||||
type: boolean
|
||||
description: true = only build index; false = mapping
|
||||
- name: "--built_bbsplit_index"
|
||||
type: file
|
||||
description: Directory with index files
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--fastq_1"
|
||||
type: file
|
||||
required: false
|
||||
description: Output file for read 1.
|
||||
direction: output
|
||||
must_exist: false
|
||||
default: $id.$key.read_1.fastq
|
||||
- name: "--fastq_2"
|
||||
type: file
|
||||
required: false
|
||||
must_exist: false
|
||||
description: Output file for read 2.
|
||||
direction: output
|
||||
default: $id.$key.read_2.fastq
|
||||
- name: "--bbsplit_index"
|
||||
type: file
|
||||
description: Directory with index files
|
||||
direction: output
|
||||
must_exist: false
|
||||
default: BBSplit_index
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genome.fasta
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
- path: /testData/minimal_test/reference/bbsplit_fasta/sarscov2.fa
|
||||
- path: /testData/minimal_test/reference/bbsplit_fasta/human.fa
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y build-essential openjdk-17-jdk wget tar && \
|
||||
wget --no-check-certificate https://sourceforge.net/projects/bbmap/files/BBMap_39.01.tar.gz && \
|
||||
tar xzf BBMap_39.01.tar.gz && \
|
||||
cp -r bbmap/* /usr/local/bin
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
65
src/bbmap_bbsplit/script.sh
Executable file
65
src/bbmap_bbsplit/script.sh
Executable file
@@ -0,0 +1,65 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
function clean_up {
|
||||
rm -rf "$tmpdir"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
avail_mem=3072
|
||||
|
||||
if [ ! -d "$par_built_bbsplit_index" ]; then
|
||||
other_refs=()
|
||||
while IFS="," read -r name path
|
||||
do
|
||||
other_refs+=("ref_$name=$path")
|
||||
done < "$par_bbsplit_fasta_list"
|
||||
fi
|
||||
|
||||
if $par_only_build_index; then
|
||||
if [ -f "$par_primary_ref" ] && [ ${#other_refs[@]} -gt 0 ]; then
|
||||
bbsplit.sh \
|
||||
-Xmx${avail_mem}M \
|
||||
ref_primary="$par_primary_ref" ${other_refs[@]} \
|
||||
path=$par_bbsplit_index \
|
||||
threads=${meta_cpus:-1}
|
||||
else
|
||||
echo "ERROR: Please specify as input a primary fasta file along with names and paths to non-primary fasta files."
|
||||
fi
|
||||
else
|
||||
IFS="," read -ra input <<< "$par_input"
|
||||
tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
|
||||
index_files=''
|
||||
if [ -d "$par_built_bbsplit_index" ]; then
|
||||
index_files="path=$par_built_bbsplit_index"
|
||||
elif [ -f "$par_primary_ref" ] && [ ${#other_refs[@]} -gt 0 ]; then
|
||||
index_files="ref_primary=$par_primary_ref ${other_refs[@]}"
|
||||
else
|
||||
echo "ERROR: Please either specify a BBSplit index as input or a primary fasta file along with names and paths to non-primary fasta files."
|
||||
fi
|
||||
if $par_paired; then
|
||||
bbsplit.sh \
|
||||
-Xmx${avail_mem}M \
|
||||
$index_files \
|
||||
threads=${meta_cpus:-1} \
|
||||
in=${input[0]} \
|
||||
in2=${input[1]} \
|
||||
basename=${tmpdir}/%_#.fastq \
|
||||
refstats=bbsplit_stats.txt
|
||||
read1=$(find $tmpdir/ -iname primary_1*)
|
||||
read2=$(find $tmpdir/ -iname primary_2*)
|
||||
cp $read1 $par_fastq_1
|
||||
cp $read2 $par_fastq_2
|
||||
else
|
||||
bbsplit.sh \
|
||||
-Xmx${avail_mem}M \
|
||||
$index_files \
|
||||
threads=${meta_cpus:-1} \
|
||||
in=${input[0]} \
|
||||
basename=${tmpdir}/%.fastq \
|
||||
refstats=bbsplit_stats.txt
|
||||
read1=$(find $tmpdir/ -iname primary*)
|
||||
cp $read1 $par_fastq_1
|
||||
fi
|
||||
fi
|
||||
86
src/bbmap_bbsplit/test.sh
Normal file
86
src/bbmap_bbsplit/test.sh
Normal file
@@ -0,0 +1,86 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Test $meta_functionality_name"
|
||||
|
||||
cat > bbsplit_fasta_list.txt << HERE
|
||||
sarscov2,$meta_resources_dir/sarscov2.fa
|
||||
human,$meta_resources_dir/human.fa
|
||||
HERE
|
||||
|
||||
echo ">>> Building BBSplit index"
|
||||
"$meta_executable" \
|
||||
--primary_ref "$meta_resources_dir/genome.fasta" \
|
||||
--bbsplit_fasta_list "bbsplit_fasta_list.txt" \
|
||||
--only_build_index true \
|
||||
--bbsplit_index "BBSplit_index"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -d "BBSplit_index" ] && echo "BBSplit index does not exist!" && exit 1
|
||||
[ -z "$(ls -A 'BBSplit_index')" ] && echo "BBSplit index is empty!" && exit 1
|
||||
|
||||
echo ">>> Filtering ribosomal RNA reads"
|
||||
|
||||
echo ">>> Testing with single-end reads and primary/non-primary FASTA files"
|
||||
"$meta_executable" \
|
||||
--paired false \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz" \
|
||||
--only_build_index false \
|
||||
--primary_ref "$meta_resources_dir/genome.fasta" \
|
||||
--bbsplit_fasta_list "bbsplit_fasta_list.txt" \
|
||||
--fastq_1 "filtered_SRR6357070_1.fastq.gz"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file is empty!" && exit 1
|
||||
|
||||
rm filtered_SRR6357070_1.fastq.gz
|
||||
|
||||
echo ">>> Testing with paired-end reads and primary/non-primary FASTA files"
|
||||
"$meta_executable" \
|
||||
--paired true \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
|
||||
--only_build_index false \
|
||||
--primary_ref "$meta_resources_dir/genome.fasta" \
|
||||
--bbsplit_fasta_list "bbsplit_fasta_list.txt" \
|
||||
--fastq_1 "filtered_SRR6357070_1.fastq.gz" \
|
||||
--fastq_2 "filtered_SRR6357070_2.fastq.gz"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file is empty!" && exit 1
|
||||
[ ! -f "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file is empty!" && exit 1
|
||||
|
||||
rm filtered_SRR6357070_1.fastq.gz filtered_SRR6357070_2.fastq.gz
|
||||
|
||||
echo ">>> Testing with single-end reads and BBSplit index"
|
||||
"$meta_executable" \
|
||||
--paired false \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz" \
|
||||
--only_build_index false \
|
||||
--built_bbsplit_index "BBSplit_index" \
|
||||
--fastq_1 "filtered_SRR6357070_1.fastq.gz"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered reads file is empty!" && exit 1
|
||||
|
||||
echo ">>> Testing with paired-end reads and BBSplit index"
|
||||
"$meta_executable" \
|
||||
--paired true \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
|
||||
--only_build_index false \
|
||||
--built_bbsplit_index "BBSplit_index" \
|
||||
--fastq_1 "filtered_SRR6357070_1.fastq.gz" \
|
||||
--fastq_2 "filtered_SRR6357070_2.fastq.gz"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_1.fastq.gz" ] && echo "Filtered read 1 file is empty!" && exit 1
|
||||
[ ! -f "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file does not exist!" && exit 1
|
||||
[ ! -s "filtered_SRR6357070_2.fastq.gz" ] && echo "Filtered read 2 file is empty!" && exit 1
|
||||
|
||||
rm filtered_SRR6357070_1.fastq.gz filtered_SRR6357070_2.fastq.gz
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
56
src/bedtools_genomecov/config.vsh.yaml
Normal file
56
src/bedtools_genomecov/config.vsh.yaml
Normal file
@@ -0,0 +1,56 @@
|
||||
name: bedtools_genomecov
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/bedtools_genomecov.nf]
|
||||
last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
|
||||
description: Compute BEDGRAPH (-bg) summaries of feature coverage
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--strandedness"
|
||||
type: string
|
||||
choices: ["unstranded", "forward", "reverse", "auto"]
|
||||
description: Sample strand-specificity.
|
||||
- name: "--bam"
|
||||
type: file
|
||||
description: Genome BAM file
|
||||
- name: "--extra_bedtools_args"
|
||||
type: string
|
||||
default: ''
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--bedgraph_forward"
|
||||
type: file
|
||||
default: $id.forward.bedgraph
|
||||
direction: output
|
||||
- name: "--bedgraph_reverse"
|
||||
type: file
|
||||
default: $id.reverse.bedgraph
|
||||
direction: output
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/chr19.bam
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y build-essential wget && \
|
||||
wget --no-check-certificate https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static && \
|
||||
mv bedtools.static /usr/local/bin/bedtools && \
|
||||
chmod a+x /usr/local/bin/bedtools
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
25
src/bedtools_genomecov/script.sh
Normal file
25
src/bedtools_genomecov/script.sh
Normal file
@@ -0,0 +1,25 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
prefix_forward="forward"
|
||||
prefix_reverse="reverse"
|
||||
if [ $par_strandedness == 'reverse' ]; then
|
||||
prefix_forward="reverse"
|
||||
prefix_reverse="forward"
|
||||
fi
|
||||
|
||||
bedtools genomecov \
|
||||
-ibam $par_bam \
|
||||
-bg \
|
||||
-strand + \
|
||||
$par_extra_bedtools_args | bedtools sort > $prefix_forward.bedGraph
|
||||
|
||||
bedtools genomecov \
|
||||
-ibam $par_bam \
|
||||
-bg \
|
||||
-strand - \
|
||||
$par_extra_bedtools_args | bedtools sort > $prefix_reverse.bedGraph
|
||||
|
||||
mv $prefix_forward.bedGraph $par_bedgraph_forward
|
||||
mv $prefix_reverse.bedGraph $par_bedgraph_reverse
|
||||
22
src/bedtools_genomecov/test.sh
Normal file
22
src/bedtools_genomecov/test.sh
Normal file
@@ -0,0 +1,22 @@
|
||||
#!/bin/bash
|
||||
|
||||
id="SRR6357070"
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--strandedness unstranded \
|
||||
--bam $meta_resources_dir/chr19.bam \
|
||||
--bedgraph_forward chr19_forward.bedgraph \
|
||||
--bedgraph_reverse chr19_reverse.bedgraph
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
# check whether output exists
|
||||
[ ! -f "chr19_forward.bedgraph" ] && echo "File 'chr19_forward.bedgraph' does not exist!" && exit 1
|
||||
[ ! -s "chr19_forward.bedgraph" ] && echo "File 'chr19_forward.bedgraph' is empty!" && exit 1
|
||||
[ ! -f "chr19_reverse.bedgraph" ] && echo "File 'chr19_reverse.bedgraph' does not exist!" && exit 1
|
||||
[ ! -s "chr19_reverse.bedgraph" ] && echo "File 'chr19_reverse.bedgraph' is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
54
src/cat_additional_fasta/config.vsh.yaml
Normal file
54
src/cat_additional_fasta/config.vsh.yaml
Normal file
@@ -0,0 +1,54 @@
|
||||
name: "cat_additional_fasta"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/cat_additional_fasta.nf]
|
||||
last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
|
||||
description: |
|
||||
Concatenate addional fasta file to reference FASTA and GTF files.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--fasta"
|
||||
type: file
|
||||
required: true
|
||||
description: Path to FASTA genome file.
|
||||
- name: "--gtf"
|
||||
type: file
|
||||
description: Path to GTF annotation file.
|
||||
- name: "--additional_fasta"
|
||||
type: file
|
||||
description: FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences.
|
||||
- name: "--biotype"
|
||||
type: string
|
||||
description: Biotype value to use when appending entries to GTF file when additional fasta file is provided.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--fasta_output"
|
||||
type: file
|
||||
direction: output
|
||||
description: Concatenated FASTA file.
|
||||
- name: "--gtf_output"
|
||||
type: file
|
||||
direction: output
|
||||
description: Concatenated GTF file.
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genome.fasta
|
||||
- path: /testData/minimal_test/reference/genes.gtf.gz
|
||||
- path: /testData/minimal_test/reference/gfp.fa.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: python
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
80
src/cat_additional_fasta/script.py
Normal file
80
src/cat_additional_fasta/script.py
Normal file
@@ -0,0 +1,80 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Read a custom fasta file and create a custom GTF containing each entry
|
||||
"""
|
||||
from itertools import groupby
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"fasta": "testData/minimal_test/reference/genome.fasta",
|
||||
"gtf": "testData/minimal_test/reference/genes.gtf",
|
||||
"additional_fasta": "testData/minimal_test/reference/gfp.fa.gz",
|
||||
"biotype": "gene_biotype",
|
||||
"fasta_output": "genome_gfp.fasta",
|
||||
"gtf_output": "genome_gfp.gtf",
|
||||
}
|
||||
meta = {
|
||||
"functionality_name": "cat_additonal_fasta"
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
def fasta_iter(fasta_name):
|
||||
"""
|
||||
modified from Brent Pedersen
|
||||
Correct Way To Parse A Fasta File In Python
|
||||
given a fasta file. yield tuples of header, sequence
|
||||
|
||||
Fasta iterator from https://www.biostars.org/p/710/#120760
|
||||
"""
|
||||
with open(fasta_name) as fh:
|
||||
# ditch the boolean (x[0]) and just keep the header or sequence since
|
||||
# we know they alternate.
|
||||
faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
|
||||
for header in faiter:
|
||||
# drop the ">"
|
||||
headerStr = header.__next__()[1:].strip()
|
||||
# join all sequence lines to one.
|
||||
seq = "".join(s.strip() for s in faiter.__next__())
|
||||
yield (headerStr, seq)
|
||||
|
||||
def fasta2gtf(fasta, output, biotype):
|
||||
fiter = fasta_iter(fasta)
|
||||
# GTF output lines
|
||||
lines = []
|
||||
attributes = 'exon_id "{name}.1"; exon_number "1";{biotype} gene_id "{name}_gene"; gene_name "{name}_gene"; gene_source "custom"; transcript_id "{name}_gene"; transcript_name "{name}_gene";\n'
|
||||
line_template = "{name}\ttransgene\texon\t1\t{length}\t.\t+\t.\t" + attributes
|
||||
for ff in fiter:
|
||||
name, seq = ff
|
||||
# Use first ID as separated by spaces as the "sequence name"
|
||||
# (equivalent to "chromosome" in other cases)
|
||||
seqname = name.split()[0]
|
||||
# Remove all spaces
|
||||
name = seqname.replace(" ", "_")
|
||||
length = len(seq)
|
||||
biotype_attr = ""
|
||||
if biotype:
|
||||
biotype_attr = f' {biotype} "transgene";'
|
||||
line = line_template.format(name=name, length=length, biotype=biotype_attr)
|
||||
lines.append(line)
|
||||
with open(output, "w") as f:
|
||||
f.write("".join(lines))
|
||||
|
||||
add_name = os.path.basename(par['additional_fasta'])
|
||||
output = os.path.splitext(add_name)[0] + ".gtf"
|
||||
fasta2gtf(par['additional_fasta'], output, par['biotype'])
|
||||
|
||||
with open(par['fasta'], 'r') as f1:
|
||||
content1 = f1.read()
|
||||
with open(par['additional_fasta'], 'r') as f2:
|
||||
content2 = f2.read()
|
||||
with open(par['fasta_output'], 'w') as f_out:
|
||||
f_out.write(content1 + content2)
|
||||
with open(par['gtf'], 'r') as g1:
|
||||
g_content1 = g1.read()
|
||||
with open(output, 'r') as g2:
|
||||
g_content2 = g2.read()
|
||||
with open(par['gtf_output'], 'w') as g_out:
|
||||
g_out.write(g_content1 + g_content2)
|
||||
26
src/cat_additional_fasta/test.sh
Normal file
26
src/cat_additional_fasta/test.sh
Normal file
@@ -0,0 +1,26 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
gunzip "$meta_resources_dir/genes.gtf"
|
||||
gunzip "$meta_resources_dir/gfp.fa"
|
||||
|
||||
"$meta_executable" \
|
||||
--fasta "$meta_resources_dir/genome.fasta" \
|
||||
--gtf "$meta_resources_dir/genes.gtf" \
|
||||
--additional_fasta "$meta_resources_dir/gfp.fa" \
|
||||
--biotype gene_biotype \
|
||||
--fasta_output genome_gfp.fasta \
|
||||
--gtf_output genome_gfp.gtf
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "genome_gfp.fasta" ] && echo "File 'genome_gfp.fasta' does not exist!" && exit 1
|
||||
[ ! -s "genome_gfp.fasta" ] && echo "File 'genome_gfp.fasta' is empty!" && exit 1
|
||||
[ ! -f "genome_gfp.gtf" ] && echo "File 'genome_gfp.gtf' does not exist!" && exit 1
|
||||
[ ! -s "genome_gfp.gtf" ] && echo "File 'genome_gfp.gtf' is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
54
src/cat_fastq/config.vsh.yaml
Normal file
54
src/cat_fastq/config.vsh.yaml
Normal file
@@ -0,0 +1,54 @@
|
||||
name: "cat_fastq"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/cat/fastq/main.nf, modules/nf-core/cat/fastq/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: Concatenate multiple fastq files
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--read_1"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
description: Read 1 fastq files to be concatenated
|
||||
- name: "--read_2"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
description: Read 2 fastq files to be concatenated
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--fastq_1"
|
||||
type: file
|
||||
direction: output
|
||||
default: $id.read_1.merged.fastq
|
||||
description: Concatenated read 1 fastq
|
||||
- name: "--fastq_2"
|
||||
type: file
|
||||
direction: output
|
||||
must_exist: false
|
||||
default: $id.read_2.merged.fastq
|
||||
description: Concatenated read 2 fastq
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357071_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357071_2.fastq.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
20
src/cat_fastq/script.sh
Normal file
20
src/cat_fastq/script.sh
Normal file
@@ -0,0 +1,20 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
IFS=";" read -ra read_1 <<< $par_read_1
|
||||
IFS=";" read -ra read_2 <<< $par_read_2
|
||||
|
||||
filename=$(basename -- "${read_1[0]}")
|
||||
if [ ${filename##*.} == "gz" ]; then
|
||||
command="zcat"
|
||||
else
|
||||
command="cat"
|
||||
fi
|
||||
|
||||
if [ ${#read_1[@]} -gt 0 ]; then
|
||||
$command ${read_1[*]} > $par_fastq_1
|
||||
fi
|
||||
if [ ${#read_2[@]} -gt 0 ]; then
|
||||
$command ${read_2[*]} > $par_fastq_2
|
||||
fi
|
||||
44
src/cat_fastq/test.sh
Normal file
44
src/cat_fastq/test.sh
Normal file
@@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
echo ">>> Testing paired-end read samples with multiple replicates"
|
||||
"$meta_executable" \
|
||||
--read_1 $meta_resources_dir/SRR6357070_1.fastq.gz\;$meta_resources_dir/SRR6357071_1.fastq.gz \
|
||||
--read_2 $meta_resources_dir/SRR6357070_2.fastq.gz\;$meta_resources_dir/SRR6357071_2.fastq.gz \
|
||||
--fastq_1 read_1.merged.fastq \
|
||||
--fastq_2 read_2.merged.fastq
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "read_1.merged.fastq" ] && echo "Merged read 1 file does not exist!" && exit 1
|
||||
[ ! -s "read_1.merged.fastq" ] && echo "Merged read 1 file is empty!" && exit 1
|
||||
[ ! -f "read_2.merged.fastq" ] && echo "Merged read 2 file does not exist!" && exit 1
|
||||
[ ! -s "read_2.merged.fastq" ] && echo "Merged read 2 file is empty!" && exit 1
|
||||
|
||||
echo ">>> Check number of reads"
|
||||
rep1_1=$(zcat $meta_resources_dir/SRR6357070_1.fastq.gz | echo $((`wc -l`/4)))
|
||||
rep1_2=$(zcat $meta_resources_dir/SRR6357070_2.fastq.gz | echo $((`wc -l`/4)))
|
||||
rep2_1=$(zcat $meta_resources_dir/SRR6357071_1.fastq.gz | echo $((`wc -l`/4)))
|
||||
rep2_2=$(zcat $meta_resources_dir/SRR6357071_2.fastq.gz | echo $((`wc -l`/4)))
|
||||
merged_1=$(cat read_1.merged.fastq | echo $((`wc -l`/4)))
|
||||
merged_2=$(cat read_2.merged.fastq | echo $((`wc -l`/4)))
|
||||
[[ $(( $rep1_1 + $rep2_1 )) != $merged_1 ]] || [[ $(( $rep1_2 + $rep2_2 )) != $merged_2 ]] && echo "Concatenation unsuccessful!" && exit 1
|
||||
|
||||
rm read_1.merged.fastq read_2.merged.fastq
|
||||
|
||||
echo ">>> Testing single-end read samples with multiple replicates"
|
||||
"$meta_executable" \
|
||||
--read_1 $meta_resources_dir/SRR6357070_1.fastq.gz\;$meta_resources_dir/SRR6357071_1.fastq.gz \
|
||||
--fastq_1 read_1.merged.fastq
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "read_1.merged.fastq" ] && echo "Merged read 1 file does not exist!" && exit 1
|
||||
[ ! -s "read_1.merged.fastq" ] && echo "Merged read 1 file is empty!" && exit 1
|
||||
|
||||
echo ">>> Check number of reads"
|
||||
rep1_1=$(zcat $meta_resources_dir/SRR6357070_1.fastq.gz | echo $((`wc -l`/4)))
|
||||
rep2_1=$(zcat $meta_resources_dir/SRR6357071_1.fastq.gz | echo $((`wc -l`/4)))
|
||||
merged_1=$(cat read_1.merged.fastq | echo $((`wc -l`/4)))
|
||||
[ $(( $rep1_1 + $rep2_1 )) != $merged_1 ] && echo "Concatenation unsuccessful!" && exit 1
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
73
src/deseq2_qc/config.vsh.yaml
Normal file
73
src/deseq2_qc/config.vsh.yaml
Normal file
@@ -0,0 +1,73 @@
|
||||
name: deseq2_qc
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/deseq2_qc.nf]
|
||||
last_sha: 92b2a7857de1dda9d1c19a088941fc81e2976ff7
|
||||
description: |
|
||||
Run DESeq2, perform PCA, generate heatmaps and scatterplots for samples in the counts files
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--counts"
|
||||
type: file
|
||||
description: Count file matrix where rows are genes and columns are samples
|
||||
- name: "--pca_header_multiqc"
|
||||
type: file
|
||||
default: assets/multiqc/deseq2_pca_header.txt
|
||||
- name: "--clustering_header_multiqc"
|
||||
type: file
|
||||
default: assets/multiqc/deseq2_clustering_header.txt
|
||||
- name: "--deseq2_vst"
|
||||
type: boolean
|
||||
default: true
|
||||
description: Use vst transformation instead of rlog with DESeq2
|
||||
- name: "--extra_args"
|
||||
type: string
|
||||
default: "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3"
|
||||
- name: "--extra_args2"
|
||||
type: string
|
||||
default: star_salmon
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--deseq2_output"
|
||||
type: file
|
||||
direction: output
|
||||
default: deseq2
|
||||
- name: "--pca_multiqc"
|
||||
type: file
|
||||
direction: output
|
||||
default: deseq2.pca.vals_mqc.tsv
|
||||
- name: "--dists_multiqc"
|
||||
type: file
|
||||
direction: output
|
||||
default: deseq2.sample.dists_mqc.tsv
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
# copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/deseq2_qc.r
|
||||
- path: deseq2_qc.r
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/counts.tsv
|
||||
- path: /assets/multiqc/deseq2_pca_header.txt
|
||||
- path: /assets/multiqc/deseq2_clustering_header.txt
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ r-base , libcurl4-openssl-dev, libssl-dev, libxml2-dev ]
|
||||
- type: r
|
||||
cran: [ optparse, ggplot2, RColorBrewer, pheatmap ]
|
||||
bioc: [ DESeq2 ]
|
||||
url: https://cran.r-project.org/src/contrib/Archive/matrixStats/matrixStats_1.1.0.tar.gz
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
246
src/deseq2_qc/deseq2_qc.r
Executable file
246
src/deseq2_qc/deseq2_qc.r
Executable file
@@ -0,0 +1,246 @@
|
||||
#!/usr/bin/env Rscript
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## REQUIREMENTS ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
## PCA, HEATMAP AND SCATTERPLOTS FOR SAMPLES IN COUNTS FILE
|
||||
## - SAMPLE NAMES HAVE TO END IN e.g. "_R1" REPRESENTING REPLICATE ID. LAST 3 CHARACTERS OF SAMPLE NAME WILL BE TRIMMED TO OBTAIN GROUP ID FOR DESEQ2 COMPARISONS.
|
||||
## - PACKAGES BELOW NEED TO BE AVAILABLE TO LOAD WHEN RUNNING R
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## LOAD LIBRARIES ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
library(optparse)
|
||||
library(DESeq2)
|
||||
library(ggplot2)
|
||||
library(RColorBrewer)
|
||||
library(pheatmap)
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## PARSE COMMAND-LINE PARAMETERS ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
option_list <- list(
|
||||
make_option(c("-i", "--count_file"), type="character", default=NULL, metavar="path", help="Count file matrix where rows are genes and columns are samples."),
|
||||
make_option(c("-f", "--count_col"), type="integer", default=3, metavar="integer", help="First column containing sample count data."),
|
||||
make_option(c("-d", "--id_col"), type="integer", default=1, metavar="integer", help="Column containing identifiers to be used."),
|
||||
make_option(c("-r", "--sample_suffix"), type="character", default='', metavar="string", help="Suffix to remove after sample name in columns e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'."),
|
||||
make_option(c("-p", "--outprefix"), type="character", default='deseq2', metavar="string" , help="Output prefix."),
|
||||
make_option(c("-v", "--vst"), type="logical", default=FALSE, metavar="boolean", help="Run vst transform instead of rlog."),
|
||||
make_option(c("-c", "--cores"), type="integer", default=1, metavar="integer", help="Number of cores."),
|
||||
make_option(c("-o", "--outdir"), type="character", default="./", metavar="path", help="Output directory.")
|
||||
)
|
||||
|
||||
opt_parser <- OptionParser(option_list=option_list)
|
||||
opt <- parse_args(opt_parser)
|
||||
if (is.null(opt$count_file)){
|
||||
print_help(opt_parser)
|
||||
stop("Please provide a counts file.", call.=FALSE)
|
||||
}
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## READ IN COUNTS FILE ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
count.table <- read.delim(file=opt$count_file,header=TRUE, row.names=NULL)
|
||||
rownames(count.table) <- count.table[,opt$id_col]
|
||||
count.table <- count.table[,opt$count_col:ncol(count.table),drop=FALSE]
|
||||
colnames(count.table) <- gsub(opt$sample_suffix,"",colnames(count.table))
|
||||
colnames(count.table) <- gsub(pattern='\\.$', replacement='', colnames(count.table))
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## RUN DESEQ2 ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
if (file.exists(opt$outdir) == FALSE) {
|
||||
dir.create(opt$outdir, recursive=TRUE)
|
||||
}
|
||||
setwd(opt$outdir)
|
||||
|
||||
samples.vec <- colnames(count.table)
|
||||
name_components <- strsplit(samples.vec, "_")
|
||||
n_components <- length(name_components[[1]])
|
||||
decompose <- n_components!=1 && all(sapply(name_components, length)==n_components)
|
||||
coldata <- data.frame(samples.vec, sample=samples.vec, row.names=1)
|
||||
if (decompose) {
|
||||
groupings <- as.data.frame(lapply(1:n_components, function(i) sapply(name_components, "[[", i)))
|
||||
n_distinct <- sapply(groupings, function(grp) length(unique(grp)))
|
||||
groupings <- groupings[n_distinct!=1 & n_distinct!=length(samples.vec)]
|
||||
if (ncol(groupings)!=0) {
|
||||
names(groupings) <- paste0("Group", 1:ncol(groupings))
|
||||
coldata <- cbind(coldata, groupings)
|
||||
} else {
|
||||
decompose <- FALSE
|
||||
}
|
||||
}
|
||||
|
||||
DDSFile <- paste(opt$outprefix,".dds.RData",sep="")
|
||||
|
||||
counts <- count.table[,samples.vec,drop=FALSE]
|
||||
dds <- DESeqDataSetFromMatrix(countData=round(counts), colData=coldata, design=~ 1)
|
||||
dds <- estimateSizeFactors(dds)
|
||||
if (min(dim(count.table))<=1) { # No point if only one sample, or one gene
|
||||
save(dds,file=DDSFile)
|
||||
saveRDS(dds, file=sub("\\.dds\\.RData$", ".rds", DDSFile))
|
||||
warning("Not enough samples or genes in counts file for PCA.", call.=FALSE)
|
||||
quit(save = "no", status = 0, runLast = FALSE)
|
||||
}
|
||||
if (!opt$vst) {
|
||||
vst_name <- "rlog"
|
||||
rld <- rlog(dds)
|
||||
} else {
|
||||
vst_name <- "vst"
|
||||
rld <- varianceStabilizingTransformation(dds)
|
||||
}
|
||||
|
||||
assay(dds, vst_name) <- assay(rld)
|
||||
save(dds,file=DDSFile)
|
||||
saveRDS(dds, file=sub("\\.dds\\.RData$", ".rds", DDSFile))
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## PLOT QC ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
##' PCA pre-processeor
|
||||
##'
|
||||
##' Generate all the necessary information to plot PCA from a DESeq2 object
|
||||
##' in which an assay containing a variance-stabilised matrix of counts is
|
||||
##' stored. Copied from DESeq2::plotPCA, but with additional ability to
|
||||
##' say which assay to run the PCA on.
|
||||
##'
|
||||
##' @param object The DESeq2DataSet object.
|
||||
##' @param ntop number of top genes to use for principla components, selected by highest row variance.
|
||||
##' @param assay the name or index of the assay that stores the variance-stabilised data.
|
||||
##' @return A data.frame containing the projected data alongside the grouping columns.
|
||||
##' A 'percentVar' attribute is set which includes the percentage of variation each PC explains,
|
||||
##' and additionally how much the variation within that PC is explained by the grouping variable.
|
||||
##' @author Gavin Kelly
|
||||
plotPCA_vst <- function (object, ntop = 500, assay=length(assays(object))) {
|
||||
rv <- rowVars(assay(object, assay))
|
||||
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
|
||||
pca <- prcomp(t(assay(object, assay)[select, ]), center=TRUE, scale=FALSE)
|
||||
percentVar <- pca$sdev^2/sum(pca$sdev^2)
|
||||
df <- cbind( as.data.frame(colData(object)), pca$x)
|
||||
#Order points so extreme samples are more likely to get label
|
||||
ord <- order(abs(rank(df$PC1)-median(df$PC1)), abs(rank(df$PC2)-median(df$PC2)))
|
||||
df <- df[ord,]
|
||||
attr(df, "percentVar") <- data.frame(PC=seq(along=percentVar), percentVar=100*percentVar)
|
||||
return(df)
|
||||
}
|
||||
|
||||
PlotFile <- paste(opt$outprefix,".plots.pdf",sep="")
|
||||
|
||||
pdf(file=PlotFile, onefile=TRUE, width=7, height=7)
|
||||
## PCA
|
||||
ntop <- c(500, Inf)
|
||||
for (n_top_var in ntop) {
|
||||
pca.data <- plotPCA_vst(dds, assay=vst_name, ntop=n_top_var)
|
||||
percentVar <- round(attr(pca.data, "percentVar")$percentVar)
|
||||
plot_subtitle <- ifelse(n_top_var==Inf, "All genes", paste("Top", n_top_var, "genes"))
|
||||
pl <- ggplot(pca.data, aes(PC1, PC2, label=paste0(" ", sample, " "))) +
|
||||
geom_point() +
|
||||
geom_text(check_overlap=TRUE, vjust=0.5, hjust="inward") +
|
||||
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
|
||||
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
|
||||
labs(title = paste0("First PCs on ", vst_name, "-transformed data"), subtitle = plot_subtitle) +
|
||||
theme(legend.position="top",
|
||||
panel.grid.major = element_blank(),
|
||||
panel.grid.minor = element_blank(),
|
||||
panel.background = element_blank(),
|
||||
panel.border = element_rect(colour = "black", fill=NA, size=1))
|
||||
print(pl)
|
||||
|
||||
if (decompose) {
|
||||
pc_names <- paste0("PC", attr(pca.data, "percentVar")$PC)
|
||||
long_pc <- reshape(pca.data, varying=pc_names, direction="long", sep="", timevar="component", idvar="pcrow")
|
||||
long_pc <- subset(long_pc, component<=5)
|
||||
long_pc_grp <- reshape(long_pc, varying=names(groupings), direction="long", sep="", timevar="grouper")
|
||||
long_pc_grp <- subset(long_pc_grp, grouper<=5)
|
||||
long_pc_grp$component <- paste("PC", long_pc_grp$component)
|
||||
long_pc_grp$grouper <- paste0(long_pc_grp$grouper, c("st","nd","rd","th","th")[long_pc_grp$grouper], " prefix")
|
||||
pl <- ggplot(long_pc_grp, aes(x=Group, y=PC)) +
|
||||
geom_point() +
|
||||
stat_summary(fun=mean, geom="line", aes(group = 1)) +
|
||||
labs(x=NULL, y=NULL, subtitle = plot_subtitle, title="PCs split by sample-name prefixes") +
|
||||
facet_grid(component~grouper, scales="free_x") +
|
||||
scale_x_discrete(guide = guide_axis(n.dodge = 3))
|
||||
print(pl)
|
||||
}
|
||||
} # at end of loop, we'll be using the user-defined ntop if any, else all genes
|
||||
|
||||
## WRITE PC1 vs PC2 VALUES TO FILE
|
||||
pca.vals <- pca.data[,c("PC1","PC2")]
|
||||
colnames(pca.vals) <- paste0(colnames(pca.vals), ": ", percentVar[1:2], '% variance')
|
||||
pca.vals <- cbind(sample = rownames(pca.vals), pca.vals)
|
||||
write.table(pca.vals, file = paste(opt$outprefix, ".pca.vals.txt", sep=""),
|
||||
row.names = FALSE, col.names = TRUE, sep = "\t", quote = TRUE)
|
||||
|
||||
## SAMPLE CORRELATION HEATMAP
|
||||
sampleDists <- dist(t(assay(dds, vst_name)))
|
||||
sampleDistMatrix <- as.matrix(sampleDists)
|
||||
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
|
||||
pheatmap(
|
||||
sampleDistMatrix,
|
||||
clustering_distance_rows=sampleDists,
|
||||
clustering_distance_cols=sampleDists,
|
||||
col=colors,
|
||||
main=paste("Euclidean distance between", vst_name, "of samples")
|
||||
)
|
||||
|
||||
## WRITE SAMPLE DISTANCES TO FILE
|
||||
write.table(cbind(sample = rownames(sampleDistMatrix), sampleDistMatrix),file=paste(opt$outprefix, ".sample.dists.txt", sep=""),
|
||||
row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
|
||||
dev.off()
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## SAVE SIZE FACTORS ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
SizeFactorsDir <- "size_factors/"
|
||||
if (file.exists(SizeFactorsDir) == FALSE) {
|
||||
dir.create(SizeFactorsDir, recursive=TRUE)
|
||||
}
|
||||
|
||||
NormFactorsFile <- paste(SizeFactorsDir,opt$outprefix, ".size_factors.RData", sep="")
|
||||
|
||||
normFactors <- sizeFactors(dds)
|
||||
save(normFactors, file=NormFactorsFile)
|
||||
|
||||
for (name in names(sizeFactors(dds))) {
|
||||
sizeFactorFile <- paste(SizeFactorsDir,name, ".txt", sep="")
|
||||
write(as.numeric(sizeFactors(dds)[name]), file=sizeFactorFile)
|
||||
}
|
||||
|
||||
################################################
|
||||
################################################
|
||||
## R SESSION INFO ##
|
||||
################################################
|
||||
################################################
|
||||
|
||||
RLogFile <- "R_sessionInfo.log"
|
||||
|
||||
sink(RLogFile)
|
||||
a <- sessionInfo()
|
||||
print(a)
|
||||
sink()
|
||||
|
||||
################################################
|
||||
################################################
|
||||
################################################
|
||||
################################################
|
||||
48
src/deseq2_qc/script.sh
Executable file
48
src/deseq2_qc/script.sh
Executable file
@@ -0,0 +1,48 @@
|
||||
#!/bin/sh
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
if $par_deseq2_vst; then
|
||||
par_extra_args+=" --vst TRUE"
|
||||
fi
|
||||
|
||||
tolower() {
|
||||
case $1 in
|
||||
*[[:upper:]]*)
|
||||
printf "%s\n" "$1" | tr '[:upper:]' '[:lower:]'
|
||||
;;
|
||||
*)
|
||||
printf "%s\n" "$1"
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
toupper() {
|
||||
case $1 in
|
||||
*[[:lower:]]*)
|
||||
printf "%s\n" "$1" | tr '[:lower:]' '[:upper:]'
|
||||
;;
|
||||
*)
|
||||
printf "%s\n" "$1"
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
label_lower=$(tolower "$par_extra_args2")
|
||||
label_upper=$(toupper "$par_extra_args2")
|
||||
|
||||
Rscript "$meta_resources_dir/deseq2_qc.r" \
|
||||
--count_file $par_counts \
|
||||
--outdir $par_deseq2_output \
|
||||
--cores ${meta_cpus:-1} \
|
||||
$par_extra_args
|
||||
|
||||
if [ -f "$par_deseq2_output/R_sessionInfo.log" ]; then
|
||||
sed "s/deseq2_pca/${label_lower}_deseq2_pca/g" < $par_pca_header_multiqc > tmp.txt
|
||||
sed -i -e "s/DESeq2 PCA/${label_upper} DESeq2 PCA/g" tmp.txt
|
||||
cat tmp.txt $par_deseq2_output/*.pca.vals.txt > $par_pca_multiqc
|
||||
|
||||
sed "s/deseq2_clustering/${label_lower}_deseq2_clustering/g" < $par_clustering_header_multiqc > tmp.txt
|
||||
sed -i -e "s/DESeq2 sample/${label_upper} DESeq2 sample/g" tmp.txt
|
||||
cat tmp.txt $par_deseq2_output/*.sample.dists.txt > $par_dists_multiqc
|
||||
fi
|
||||
28
src/deseq2_qc/test.sh
Normal file
28
src/deseq2_qc/test.sh
Normal file
@@ -0,0 +1,28 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Run executable
|
||||
echo "> Running $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--counts $meta_resources_dir/counts.tsv \
|
||||
--pca_header_multiqc $meta_resources_dir/deseq2_pca_header.txt \
|
||||
--clustering_header_multiqc $meta_resources_dir/deseq2_clustering_header.txt \
|
||||
--extra_args "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 2" \
|
||||
--extra_args2 "test" \
|
||||
--deseq2_output "deseq2/" \
|
||||
--pca_multiqc pca.vals_mqc.tsv \
|
||||
--dists_multiqc sample.dists_mqc.tsv
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Check whether output exists"
|
||||
|
||||
[ ! -d "deseq2" ] && echo "deseq2 was not created" && exit 1
|
||||
[ -z "$(ls -A 'deseq2')" ] && echo "deseq2 is empty" && exit 1
|
||||
[ ! -f "pca.vals_mqc.tsv" ] && echo "pca.vals_mqc.tsv was not created" && exit 1
|
||||
[ ! -s "pca.vals_mqc.tsv" ] && echo "pca.vals_mqc.tsv is empty" && exit 1
|
||||
[ ! -f "sample.dists_mqc.tsv" ] && echo "sample.dists_mqc.tsv was not created" && exit 1
|
||||
[ ! -s "sample.dists_mqc.tsv" ] && echo "sample.dists_mqc.tsv is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
118
src/dupradar/config.vsh.yaml
Normal file
118
src/dupradar/config.vsh.yaml
Normal file
@@ -0,0 +1,118 @@
|
||||
name: "dupradar"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/dupradar.nf]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Assessment of duplication rates in RNA-Seq datasets
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--id"
|
||||
type: string
|
||||
description: Sample ID
|
||||
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: path to input alignment file in BAM format
|
||||
|
||||
- name: "--gtf_annotation"
|
||||
type: file
|
||||
required: true
|
||||
description: path to GTF annotation file.
|
||||
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
description: add flag if input alignment file consists of paired reads
|
||||
|
||||
- name: "--strandedness"
|
||||
type: string
|
||||
required: false
|
||||
choices: ["forward", "reverse", "unstranded"]
|
||||
description: strandedness of input bam file reads (forward, reverse or unstranded (default, applicable to paired reads))
|
||||
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_dupmatrix"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.dup_matrix.txt
|
||||
description: path to output file (txt) of duplicate tag counts
|
||||
|
||||
- name: "--output_dup_intercept_mqc"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.dup_intercept_mqc.txt
|
||||
description: path to output file (txt) of multiqc intercept value DupRadar
|
||||
|
||||
- name: "--output_duprate_exp_boxplot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.duprate_exp_boxplot.pdf
|
||||
description: path to output file (pdf) of distribution of expression box plot
|
||||
|
||||
- name: "--output_duprate_exp_densplot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.duprate_exp_densityplot.pdf
|
||||
description: path to output file (pdf) of 2D density scatter plot of duplicate tag counts
|
||||
|
||||
- name: "--output_duprate_exp_denscurve_mqc"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.duprate_exp_density_curve_mqc.txt
|
||||
description: path to output file (pdf) of density curve of gene duplication multiqc
|
||||
|
||||
- name: "--output_expression_histogram"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.expression_hist.pdf
|
||||
description: path to output file (pdf) of distribution of RPK values per gene histogram
|
||||
|
||||
- name: "--output_intercept_slope"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: true
|
||||
default: $id.intercept_slope.txt
|
||||
description: output file (txt) with progression of duplication rate value
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
# Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/dupradar.r
|
||||
- path: dupradar.r
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
|
||||
- path: /testData/unit_test_resources/genes.gtf
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ r-base ]
|
||||
- type: r
|
||||
bioc: [ dupRadar ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
154
src/dupradar/dupradar.r
Normal file
154
src/dupradar/dupradar.r
Normal file
@@ -0,0 +1,154 @@
|
||||
#!/usr/bin/env Rscript
|
||||
|
||||
# Command line argument processing
|
||||
args = commandArgs(trailingOnly=TRUE)
|
||||
if (length(args) < 5) {
|
||||
stop("Usage: dupRadar.r <input.bam> <sample_id> <annotation.gtf> <strandDirection:0=unstranded/1=forward/2=reverse> <paired/single> <nbThreads> <R-package-location (optional)>", call.=FALSE)
|
||||
}
|
||||
|
||||
message("paired_end is", args[5])
|
||||
message("the type is is", class(args[5]))
|
||||
|
||||
input_bam <- args[1]
|
||||
output_prefix <- args[2]
|
||||
annotation_gtf <- args[3]
|
||||
stranded <- as.numeric(args[4])
|
||||
paired_end <- ifelse(args[5] == "true", TRUE, FALSE)
|
||||
threads <- as.numeric(args[6])
|
||||
|
||||
bamRegex <- "(.+)\\.bam$"
|
||||
|
||||
if(!(grepl(bamRegex, input_bam) && file.exists(input_bam) && (!file.info(input_bam)$isdir))) stop("First argument '<input.bam>' must be an existing file (not a directory) with '.bam' extension...")
|
||||
if(!(file.exists(annotation_gtf) && (!file.info(annotation_gtf)$isdir))) stop("Third argument '<annotation.gtf>' must be an existing file (and not a directory)...")
|
||||
if(is.na(stranded) || (!(stranded %in% (0:2)))) stop("Fourth argument <strandDirection> must be a numeric value in 0(unstranded)/1(forward)/2(reverse)...")
|
||||
if(is.na(threads) || (threads<=0)) stop("Fifth argument <nbThreads> must be a strictly positive numeric value...")
|
||||
|
||||
# Debug messages (stderr)
|
||||
message("Input bam (Arg 1): ", input_bam)
|
||||
message("Output basename(Arg 2): ", output_prefix)
|
||||
message("Input gtf (Arg 3): ", annotation_gtf)
|
||||
message("Strandness (Arg 4): ", c("unstranded", "forward", "reverse")[stranded+1])
|
||||
message("paired_end (Arg 5): ", paired_end)
|
||||
message("Nb threads (Arg 6): ", threads)
|
||||
message("R package loc. (Arg 7): ", ifelse(length(args) > 4, args[5], "Not specified"))
|
||||
|
||||
|
||||
# Load / install packages
|
||||
if (length(args) > 5) { .libPaths( c( args[6], .libPaths() ) ) }
|
||||
if (!require("dupRadar")){
|
||||
source("http://bioconductor.org/biocLite.R")
|
||||
biocLite("dupRadar", suppressUpdates=TRUE)
|
||||
library("dupRadar")
|
||||
}
|
||||
if (!require("parallel")) {
|
||||
install.packages("parallel", dependencies=TRUE, repos='http://cloud.r-project.org/')
|
||||
library("parallel")
|
||||
}
|
||||
|
||||
# Duplicate stats
|
||||
dm <- analyzeDuprates(input_bam, annotation_gtf, stranded, paired_end, threads)
|
||||
write.table(dm, file=paste(output_prefix, "_dupMatrix.txt", sep=""), quote=F, row.name=F, sep="\t")
|
||||
|
||||
# 2D density scatter plot
|
||||
pdf(paste0(output_prefix, "_duprateExpDens.pdf"))
|
||||
duprateExpDensPlot(DupMat=dm)
|
||||
title("Density scatter plot")
|
||||
mtext(output_prefix, side=3)
|
||||
dev.off()
|
||||
fit <- duprateExpFit(DupMat=dm)
|
||||
cat(
|
||||
paste("- dupRadar Int (duprate at low read counts):", fit$intercept),
|
||||
paste("- dupRadar Sl (progression of the duplication rate):", fit$slope),
|
||||
fill=TRUE, labels=output_prefix,
|
||||
file=paste0(output_prefix, "_intercept_slope.txt"), append=FALSE
|
||||
)
|
||||
|
||||
# Create a multiqc file dupInt
|
||||
sample_name <- gsub("Aligned.sortedByCoord.out.markDups", "", output_prefix)
|
||||
line="#id: DupInt
|
||||
#plot_type: 'generalstats'
|
||||
#pconfig:
|
||||
# dupRadar_intercept:
|
||||
# title: 'dupInt'
|
||||
# namespace: 'DupRadar'
|
||||
# description: 'Intercept value from DupRadar'
|
||||
# max: 100
|
||||
# min: 0
|
||||
# scale: 'RdYlGn-rev'
|
||||
# format: '{:.2f}%'
|
||||
Sample dupRadar_intercept"
|
||||
|
||||
write(line,file=paste0(output_prefix, "_dup_intercept_mqc.txt"),append=TRUE)
|
||||
write(paste(sample_name, fit$intercept),file=paste0(output_prefix, "_dup_intercept_mqc.txt"),append=TRUE)
|
||||
|
||||
# Get numbers from dupRadar GLM
|
||||
curve_x <- sort(log10(dm$RPK))
|
||||
curve_y = 100*predict(fit$glm, data.frame(x=curve_x), type="response")
|
||||
# Remove all of the infinite values
|
||||
infs = which(curve_x %in% c(-Inf,Inf))
|
||||
curve_x = curve_x[-infs]
|
||||
curve_y = curve_y[-infs]
|
||||
# Reduce number of data points
|
||||
curve_x <- curve_x[seq(1, length(curve_x), 10)]
|
||||
curve_y <- curve_y[seq(1, length(curve_y), 10)]
|
||||
# Convert x values back to real counts
|
||||
curve_x = 10^curve_x
|
||||
# Write to file
|
||||
line="#id: dupradar
|
||||
#section_name: 'DupRadar'
|
||||
#section_href: 'bioconductor.org/packages/release/bioc/html/dupRadar.html'
|
||||
#description: \"provides duplication rate quality control for RNA-Seq datasets. Highly expressed genes can be expected to have a lot of duplicate reads, but high numbers of duplicates at low read counts can indicate low library complexity with technical duplication.
|
||||
# This plot shows the general linear models - a summary of the gene duplication distributions. \"
|
||||
#pconfig:
|
||||
# title: 'DupRadar General Linear Model'
|
||||
# xLog: True
|
||||
# xlab: 'expression (reads/kbp)'
|
||||
# ylab: '% duplicate reads'
|
||||
# ymax: 100
|
||||
# ymin: 0
|
||||
# tt_label: '<b>{point.x:.1f} reads/kbp</b>: {point.y:,.2f}% duplicates'
|
||||
# xPlotLines:
|
||||
# - color: 'green'
|
||||
# dashStyle: 'LongDash'
|
||||
# label:
|
||||
# style: {color: 'green'}
|
||||
# text: '0.5 RPKM'
|
||||
# verticalAlign: 'bottom'
|
||||
# y: -65
|
||||
# value: 0.5
|
||||
# width: 1
|
||||
# - color: 'red'
|
||||
# dashStyle: 'LongDash'
|
||||
# label:
|
||||
# style: {color: 'red'}
|
||||
# text: '1 read/bp'
|
||||
# verticalAlign: 'bottom'
|
||||
# y: -65
|
||||
# value: 1000
|
||||
# width: 1"
|
||||
|
||||
write(line,file=paste0(output_prefix, "_duprateExpDensCurve_mqc.txt"),append=TRUE)
|
||||
write.table(
|
||||
cbind(curve_x, curve_y),
|
||||
file=paste0(output_prefix, "_duprateExpDensCurve_mqc.txt"),
|
||||
quote=FALSE, row.names=FALSE, col.names=FALSE, append=TRUE,
|
||||
)
|
||||
|
||||
# Distribution of expression box plot
|
||||
pdf(paste0(output_prefix, "_duprateExpBoxplot.pdf"))
|
||||
duprateExpBoxplot(DupMat=dm)
|
||||
title("Percent Duplication by Expression")
|
||||
mtext(output_prefix, side=3)
|
||||
dev.off()
|
||||
|
||||
# Distribution of RPK values per gene
|
||||
pdf(paste0(output_prefix, "_expressionHist.pdf"))
|
||||
expressionHist(DupMat=dm)
|
||||
title("Distribution of RPK values per gene")
|
||||
mtext(output_prefix, side=3)
|
||||
dev.off()
|
||||
|
||||
# Print sessioninfo to standard out
|
||||
print(output_prefix)
|
||||
citation("dupRadar")
|
||||
sessionInfo()
|
||||
28
src/dupradar/script.sh
Normal file
28
src/dupradar/script.sh
Normal file
@@ -0,0 +1,28 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -exo pipefail
|
||||
|
||||
function num_strandness {
|
||||
if [ $par_strandedness == 'unstranded' ]; then echo 0
|
||||
elif [ $par_strandedness == 'forward' ]; then echo 1
|
||||
elif [ $par_strandedness == 'reverse' ]; then echo 2
|
||||
else echo "strandedness must be unstranded, forward or reverse." && \
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
Rscript "$meta_resources_dir/dupradar.r" \
|
||||
$par_input \
|
||||
$par_id \
|
||||
$par_gtf_annotation \
|
||||
$(num_strandness) \
|
||||
$par_paired \
|
||||
${meta_cpus:-1}
|
||||
|
||||
mv "$par_id"_dupMatrix.txt $par_output_dupmatrix
|
||||
mv "$par_id"_dup_intercept_mqc.txt $par_output_dup_intercept_mqc
|
||||
mv "$par_id"_duprateExpBoxplot.pdf $par_output_duprate_exp_boxplot
|
||||
mv "$par_id"_duprateExpDens.pdf $par_output_duprate_exp_densplot
|
||||
mv "$par_id"_duprateExpDensCurve_mqc.txt $par_output_duprate_exp_denscurve_mqc
|
||||
mv "$par_id"_expressionHist.pdf $par_output_expression_histogram
|
||||
mv "$par_id"_intercept_slope.txt $par_output_intercept_slope
|
||||
51
src/dupradar/test.sh
Normal file
51
src/dupradar/test.sh
Normal file
@@ -0,0 +1,51 @@
|
||||
#!/bin/bash
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam"
|
||||
input_gtf="$meta_resources_dir/genes.gtf"
|
||||
|
||||
output_dupmatrix="dup_matrix.txt"
|
||||
output_dup_intercept_mqc="dup_intercept_mqc.txt"
|
||||
output_duprate_exp_boxplot="duprate_exp_boxplot.pdf"
|
||||
output_duprate_exp_densplot="duprate_exp_densityplot.pdf"
|
||||
output_duprate_exp_denscurve_mqc="duprate_exp_density_curve_mqc.pdf"
|
||||
output_expression_histogram="expression_hist.pdf"
|
||||
output_intercept_slope="intercept_slope.txt"
|
||||
|
||||
# Run executable
|
||||
echo "> Running $meta_functionality_name for unpaired reads, writing to tmpdir $tmpdir."
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--id "test" \
|
||||
--gtf_annotation "$input_gtf" \
|
||||
--strandedness "forward" \
|
||||
--paired false \
|
||||
--output_dupmatrix $output_dupmatrix \
|
||||
--output_dup_intercept_mqc $output_dup_intercept_mqc \
|
||||
--output_duprate_exp_boxplot $output_duprate_exp_boxplot \
|
||||
--output_duprate_exp_densplot $output_duprate_exp_densplot \
|
||||
--output_duprate_exp_denscurve_mqc $output_duprate_exp_denscurve_mqc \
|
||||
--output_expression_histogram $output_expression_histogram \
|
||||
--output_intercept_slope $output_intercept_slope
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> asserting output has been created for paired read input"
|
||||
[ ! -f "$output_dupmatrix" ] && echo "$output_dupmatrix was not created" && exit 1
|
||||
[ ! -s "$output_dupmatrix" ] && echo "$output_dupmatrix is empty" && exit 1
|
||||
[ ! -f "$output_dup_intercept_mqc" ] && echo "$output_dup_intercept_mqc was not created" && exit 1
|
||||
[ ! -s "$output_dup_intercept_mqc" ] && echo "$output_dup_intercept_mqc is empty" && exit 1
|
||||
[ ! -f "$output_duprate_exp_boxplot" ] && echo "$output_duprate_exp_boxplot was not created" && exit 1
|
||||
[ ! -s "$output_duprate_exp_boxplot" ] && echo "$output_duprate_exp_boxplot is empty" && exit 1
|
||||
[ ! -f "$output_duprate_exp_densplot" ] && echo "$output_duprate_exp_densplot was not created" && exit 1
|
||||
[ ! -s "$output_duprate_exp_densplot" ] && echo "$output_duprate_exp_densplot is empty" && exit 1
|
||||
[ ! -f "$output_duprate_exp_denscurve_mqc" ] && echo "$output_duprate_exp_denscurve_mqc was not created" && exit 1
|
||||
[ ! -s "$output_duprate_exp_denscurve_mqc" ] && echo "$output_duprate_exp_denscurve_mqc is empty" && exit 1
|
||||
[ ! -f "$output_expression_histogram" ] && echo "$output_expression_histogram was not created" && exit 1
|
||||
[ ! -s "$output_expression_histogram" ] && echo "$output_expression_histogram is empty" && exit 1
|
||||
[ ! -f "$output_intercept_slope" ] && echo "$output_intercept_slope was not created" && exit 1
|
||||
[ ! -s "$output_intercept_slope" ] && echo "$output_intercept_slope is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
71
src/fastqc/config.vsh.yaml
Normal file
71
src/fastqc/config.vsh.yaml
Normal file
@@ -0,0 +1,71 @@
|
||||
name: "fastqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/fastqc/main.nf, modules/nf-core/fastqc/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Fastqc component, please see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. This component can take one or more files (by means of shell globbing) or a complete directory.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
required: false
|
||||
default: false
|
||||
description: Paired fastq files or not?
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
description: Input fastq files, either one or two (paired)
|
||||
example: sample.fastq
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--fastqc_html_1"
|
||||
type: file
|
||||
direction: output
|
||||
description: FastQC HTML report for read 1.
|
||||
default: $id.read_1.fastqc.html
|
||||
- name: "--fastqc_html_2"
|
||||
type: file
|
||||
direction: output
|
||||
description: FastQC HTML report for read 2.
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.read_2.fastqc.html
|
||||
- name: "--fastqc_zip_1"
|
||||
type: file
|
||||
direction: output
|
||||
description: FastQC report archive for read 1.
|
||||
default: $id.read_1.fastqc.zip
|
||||
- name: "--fastqc_zip_2"
|
||||
type: file
|
||||
direction: output
|
||||
description: FastQC report archive for read 2.
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.read_2.fastqc.zip
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ fastqc ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
39
src/fastqc/script.sh
Normal file
39
src/fastqc/script.sh
Normal file
@@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
function clean_up {
|
||||
rm -rf "$tmpdir"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
tmpdir=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXXXX")
|
||||
|
||||
IFS="," read -ra input <<< $par_input
|
||||
count=${#input[@]}
|
||||
|
||||
if $par_paired; then
|
||||
echo "Paired - $count"
|
||||
if [ $count -ne 2 ]; then
|
||||
echo "Paired end input requires two files"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "Not Paired - $count"
|
||||
if [ $count -ne 1 ]; then
|
||||
echo "Single end input requires one file"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
fastqc -o $tmpdir ${input[*]}
|
||||
|
||||
file1=$(basename -- "${input[0]}")
|
||||
read1="${file1%.fastq*}"
|
||||
file2=$(basename -- "${input[1]}")
|
||||
read2="${file2%.fastq*}"
|
||||
|
||||
[[ -e "${tmpdir}/${read1}_fastqc.html" ]] && cp "${tmpdir}/${read1}_fastqc.html" $par_fastqc_html_1
|
||||
[[ -e "${tmpdir}/${read2}_fastqc.html" ]] && cp "${tmpdir}/${read2}_fastqc.html" $par_fastqc_html_2
|
||||
[[ -e "${tmpdir}/${read1}_fastqc.zip" ]] && cp "${tmpdir}/${read1}_fastqc.zip" $par_fastqc_zip_1
|
||||
[[ -e "${tmpdir}/${read2}_fastqc.zip" ]] && cp "${tmpdir}/${read2}_fastqc.zip" $par_fastqc_zip_2
|
||||
35
src/fastqc/test.sh
Normal file
35
src/fastqc/test.sh
Normal file
@@ -0,0 +1,35 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
echo ">>> Testing for paired-end reads"
|
||||
|
||||
"$meta_executable" \
|
||||
--paired true \
|
||||
--input $meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz \
|
||||
--fastqc_html_1 SRR6357070_1.html \
|
||||
--fastqc_html_2 SRR6357070_2.html \
|
||||
--fastqc_zip_1 SRR6357070_1.zip \
|
||||
--fastqc_zip_2 SRR6357070_2.zip
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[[ ! -f "SRR6357070_1.html" ]] || [[ ! -f "SRR6357070_2.html" ]] && echo "Report file missing" && exit 1
|
||||
[[ ! -s "SRR6357070_1.html" ]] || [[ ! -s "SRR6357070_2.html" ]] && echo "Report file empty" && exit 1
|
||||
[[ ! -f "SRR6357070_1.zip" ]] || [[ ! -f "SRR6357070_2.zip" ]] && echo "Zip file missing" && exit 1
|
||||
|
||||
rm SRR6357070_1.html SRR6357070_2.html SRR6357070_1.zip SRR6357070_2.zip
|
||||
|
||||
echo ">>> Testing for single-end reads"
|
||||
"$meta_executable" \
|
||||
--paired false \
|
||||
--input $meta_resources_dir/SRR6357070_1.fastq.gz \
|
||||
--fastqc_html_1 SRR6357070_1.html \
|
||||
--fastqc_zip_1 SRR6357070_1.zip
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[ ! -f "SRR6357070_1.html" ] && echo "Report file missing" && exit 1
|
||||
[ ! -s "SRR6357070_1.html" ] && echo "Report file empty" && exit 1
|
||||
[ ! -f "SRR6357070_1.zip" ] && echo "Zip file missing" && exit 1
|
||||
|
||||
echo ">>> Test finished successfully"
|
||||
exit 0
|
||||
66
src/fq_subsample/config.vsh.yaml
Normal file
66
src/fq_subsample/config.vsh.yaml
Normal file
@@ -0,0 +1,66 @@
|
||||
name: "fq_subsample"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/fq/subsample/main.nf, modules/nf-core/fq/subsample/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
fq subsample outputs a subset of records from single or paired FASTQ files. This requires a seed (--seed) to be set in ext.args
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
description: Input fastq files to subsample
|
||||
multiple: true
|
||||
multiple_sep: ";"
|
||||
- name: "--extra_args"
|
||||
type: string
|
||||
default: ""
|
||||
description: Extra arguments to pass to fq subsample
|
||||
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--output_1"
|
||||
type: file
|
||||
direction: output
|
||||
default: $id.read_1.subsampled.fastq
|
||||
description: Sampled read 1 fastq files
|
||||
- name: "--output_2"
|
||||
type: file
|
||||
must_exist: false
|
||||
direction: output
|
||||
default: $id.read_2.subsampled.fastq
|
||||
description: Sampled read 2 fastq files
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
env:
|
||||
- TZ=Europe/Brussels
|
||||
run: |
|
||||
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends build-essential git-all curl && \
|
||||
curl https://sh.rustup.rs -sSf | sh -s -- -y && \
|
||||
. "$HOME/.cargo/env" && \
|
||||
git clone --depth 1 --branch v0.12.0 https://github.com/stjude-rust-labs/fq.git && \
|
||||
mv fq /usr/local/ && cd /usr/local/fq && \
|
||||
cargo install --locked --path . && \
|
||||
mv /usr/local/fq/target/release/fq /usr/local/bin/
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
23
src/fq_subsample/script.sh
Normal file
23
src/fq_subsample/script.sh
Normal file
@@ -0,0 +1,23 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
IFS=";" read -ra input <<< $par_input
|
||||
n_fastq=${#input[@]}
|
||||
|
||||
required_args=("-p" "--probability" "-n" "--read-count")
|
||||
for arg in "${required_args[@]}"; do
|
||||
if [[ "$par_extra_args" == *"$arg"* ]]; then
|
||||
echo "FQ/SUBSAMPLE requires either --probability (-p) or --record-count (-n) to be specified with --extra_args"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
if [ $n_fastq -eq 1 ]; then
|
||||
fq subsample $par_extra_args ${input[*]} --r1-dst $par_output_1
|
||||
elif [ $n_fastq -eq 2 ]; then
|
||||
fq subsample $par_extra_args ${input[*]} --r1-dst $par_output_1 --r2-dst $par_output_2
|
||||
else
|
||||
echo "FQ/SUBSAMPLE only accepts 1 or 2 FASTQ files!"
|
||||
exit 1
|
||||
fi
|
||||
32
src/fq_subsample/test.sh
Normal file
32
src/fq_subsample/test.sh
Normal file
@@ -0,0 +1,32 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
echo ">>> Testing for paired-end reads"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz;$meta_resources_dir/SRR6357070_2.fastq.gz" \
|
||||
--extra_args '--record-count 1000000 --seed 1' \
|
||||
--output_1 SRR6357070_1.subsampled.fastq.gz \
|
||||
--output_2 SRR6357070_2.subsampled.fastq.gz
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[ ! -f "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file for read 1 is missing!" && exit 1
|
||||
[ ! -s "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty!" && exit 1
|
||||
[ ! -f "SRR6357070_2.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file for read 2 is missing" && exit 1
|
||||
[ ! -s "SRR6357070_2.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty" && exit 1
|
||||
|
||||
rm SRR6357070_1.subsampled.fastq.gz SRR6357070_2.subsampled.fastq.gz
|
||||
|
||||
echo ">>> Testing for single-end reads"
|
||||
"$meta_executable" \
|
||||
--input $meta_resources_dir/SRR6357070_1.fastq.gz \
|
||||
--extra_args '--record-count 1000000 --seed 1' \
|
||||
--output_1 SRR6357070_1.subsampled.fastq.gz
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[ ! -f "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is missing" && exit 1
|
||||
[ ! -s "SRR6357070_1.subsampled.fastq.gz" ] && echo "Subsampled FASTQ file is empty" && exit 1
|
||||
|
||||
echo ">>> Tests finished successfully"
|
||||
exit 0
|
||||
|
||||
57
src/getchromsizes/config.vsh.yaml
Normal file
57
src/getchromsizes/config.vsh.yaml
Normal file
@@ -0,0 +1,57 @@
|
||||
name: "getchromsizes"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/custom/getchromsizes/main.nf, modules/nf-core/custom/getchromsizes/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Generates a FASTA file of chromosome sizes and a fasta index file.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--fasta"
|
||||
type: file
|
||||
description: Genome fasta files
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--sizes"
|
||||
type: file
|
||||
direction: output
|
||||
description: File containing chromosome lengths
|
||||
- name: "--fai"
|
||||
type: file
|
||||
description: FASTA index file
|
||||
direction: output
|
||||
- name: "--gzi" # optional
|
||||
type: file
|
||||
description: Optional gzip index file for compressed inputs
|
||||
direction: output
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genome.fasta
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y autoconf automake make gcc perl zlib1g-dev libbz2-dev liblzma-dev libcurl4-gnutls-dev libssl-dev libncurses5-dev curl bzip2 && \
|
||||
curl -fsSL https://github.com/samtools/samtools/releases/download/1.18/samtools-1.18.tar.bz2 -o samtools-1.18.tar.bz2 && \
|
||||
tar -xjf samtools-1.18.tar.bz2 && \
|
||||
rm samtools-1.18.tar.bz2 && \
|
||||
cd samtools-1.18 && \
|
||||
./configure && \
|
||||
make && \
|
||||
make install
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
9
src/getchromsizes/script.sh
Executable file
9
src/getchromsizes/script.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
filename="$(basename -- $par_fasta)"
|
||||
|
||||
samtools faidx $par_fasta
|
||||
cut -f 1,2 "$par_fasta.fai" > $par_sizes
|
||||
mv "$par_fasta.fai" $par_fai
|
||||
16
src/getchromsizes/test.sh
Normal file
16
src/getchromsizes/test.sh
Normal file
@@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "Testing $meta_functionality_name"
|
||||
"$meta_executable" \
|
||||
--fasta "$meta_resources_dir/genome.fasta" \
|
||||
--sizes genome.fasta.sizes \
|
||||
--fai genome.fasta.fai
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "genome.fasta.sizes" ] && echo "Chromosome lengths file does not exist!" && exit 1
|
||||
[ ! -s "genome.fasta.sizes" ] && echo "Chromosome lengths file is empty!" && exit 1
|
||||
[ ! -f "genome.fasta.fai" ] && echo "FASTA index file does not exist!" && exit 1
|
||||
[ ! -s "genome.fasta.fai" ] && echo "FASTA index file does is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
45
src/gtf2bed/config.vsh.yaml
Normal file
45
src/gtf2bed/config.vsh.yaml
Normal file
@@ -0,0 +1,45 @@
|
||||
name: "gtf2bed"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/gtf2bed.nf]
|
||||
last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
|
||||
description: |
|
||||
Create BED annotation file from GTF.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--gtf"
|
||||
type: file
|
||||
required: true
|
||||
description: A reference file in GTF format.
|
||||
|
||||
- name: " Output"
|
||||
arguments:
|
||||
- name: "--bed_output"
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: BED file resulting from the conversion of the GTF input file.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
# Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/gtf2bed
|
||||
- path: gtf2bed.pl
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genes.gtf.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [perl]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
122
src/gtf2bed/gtf2bed.pl
Executable file
122
src/gtf2bed/gtf2bed.pl
Executable file
@@ -0,0 +1,122 @@
|
||||
#!/usr/bin/env perl
|
||||
|
||||
# Copyright (c) 2011 Erik Aronesty (erik@q32.com)
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to deal
|
||||
# in the Software without restriction, including without limitation the rights
|
||||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
# copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
# THE SOFTWARE.
|
||||
#
|
||||
# ALSO, IT WOULD BE NICE IF YOU LET ME KNOW YOU USED IT.
|
||||
|
||||
use Getopt::Long;
|
||||
|
||||
my $extended;
|
||||
GetOptions("x"=>\$extended);
|
||||
|
||||
$in = shift @ARGV;
|
||||
|
||||
my $in_cmd =($in =~ /\.gz$/ ? "gunzip -c $in|" : $in =~ /\.zip$/ ? "unzip -p $in|" : "$in") || die "Can't open $in: $!\n";
|
||||
open IN, $in_cmd;
|
||||
|
||||
while (<IN>) {
|
||||
$gff = 2 if /^##gff-version 2/;
|
||||
$gff = 3 if /^##gff-version 3/;
|
||||
next if /^#/ && $gff;
|
||||
|
||||
s/\s+$//;
|
||||
# 0-chr 1-src 2-feat 3-beg 4-end 5-scor 6-dir 7-fram 8-attr
|
||||
my @f = split /\t/;
|
||||
if ($gff) {
|
||||
# most ver 2's stick gene names in the id field
|
||||
($id) = $f[8]=~ /\bID="([^"]+)"/;
|
||||
# most ver 3's stick unquoted names in the name field
|
||||
($id) = $f[8]=~ /\bName=([^";]+)/ if !$id && $gff == 3;
|
||||
} else {
|
||||
($id) = $f[8]=~ /transcript_id "([^"]+)"/;
|
||||
}
|
||||
|
||||
next unless $id && $f[0];
|
||||
|
||||
if ($f[2] eq 'exon') {
|
||||
die "no position at exon on line $." if ! $f[3];
|
||||
# gff3 puts :\d in exons sometimes
|
||||
$id =~ s/:\d+$// if $gff == 3;
|
||||
push @{$exons{$id}}, \@f;
|
||||
# save lowest start
|
||||
$trans{$id} = \@f if !$trans{$id};
|
||||
} elsif ($f[2] eq 'start_codon') {
|
||||
#optional, output codon start/stop as "thick" region in bed
|
||||
$sc{$id}->[0] = $f[3];
|
||||
} elsif ($f[2] eq 'stop_codon') {
|
||||
$sc{$id}->[1] = $f[4];
|
||||
} elsif ($f[2] eq 'miRNA' ) {
|
||||
$trans{$id} = \@f if !$trans{$id};
|
||||
push @{$exons{$id}}, \@f;
|
||||
}
|
||||
}
|
||||
|
||||
for $id (
|
||||
# sort by chr then pos
|
||||
sort {
|
||||
$trans{$a}->[0] eq $trans{$b}->[0] ?
|
||||
$trans{$a}->[3] <=> $trans{$b}->[3] :
|
||||
$trans{$a}->[0] cmp $trans{$b}->[0]
|
||||
} (keys(%trans)) ) {
|
||||
my ($chr, undef, undef, undef, undef, undef, $dir, undef, $attr, undef, $cds, $cde) = @{$trans{$id}};
|
||||
my ($cds, $cde);
|
||||
($cds, $cde) = @{$sc{$id}} if $sc{$id};
|
||||
|
||||
# sort by pos
|
||||
my @ex = sort {
|
||||
$a->[3] <=> $b->[3]
|
||||
} @{$exons{$id}};
|
||||
|
||||
my $beg = $ex[0][3];
|
||||
my $end = $ex[-1][4];
|
||||
|
||||
if ($dir eq '-') {
|
||||
# swap
|
||||
$tmp=$cds;
|
||||
$cds=$cde;
|
||||
$cde=$tmp;
|
||||
$cds -= 2 if $cds;
|
||||
$cde += 2 if $cde;
|
||||
}
|
||||
|
||||
# not specified, just use exons
|
||||
$cds = $beg if !$cds;
|
||||
$cde = $end if !$cde;
|
||||
|
||||
# adjust start for bed
|
||||
--$beg; --$cds;
|
||||
|
||||
my $exn = @ex; # exon count
|
||||
my $exst = join ",", map {$_->[3]-$beg-1} @ex; # exon start
|
||||
my $exsz = join ",", map {$_->[4]-$_->[3]+1} @ex; # exon size
|
||||
|
||||
my $gene_id;
|
||||
my $extend = "";
|
||||
if ($extended) {
|
||||
($gene_id) = $attr =~ /gene_name "([^"]+)"/;
|
||||
($gene_id) = $attr =~ /gene_id "([^"]+)"/ unless $gene_id;
|
||||
$extend="\t$gene_id";
|
||||
}
|
||||
# added an extra comma to make it look exactly like ucsc's beds
|
||||
print "$chr\t$beg\t$end\t$id\t0\t$dir\t$cds\t$cde\t0\t$exn\t$exsz,\t$exst,$extend\n";
|
||||
}
|
||||
|
||||
close IN;
|
||||
5
src/gtf2bed/script.sh
Executable file
5
src/gtf2bed/script.sh
Executable file
@@ -0,0 +1,5 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
perl "$meta_resources_dir/gtf2bed.pl" $par_gtf > $par_bed_output
|
||||
15
src/gtf2bed/test.sh
Normal file
15
src/gtf2bed/test.sh
Normal file
@@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
|
||||
gunzip "$meta_resources_dir/genes.gtf.gz"
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
"$meta_executable" \
|
||||
--gtf "$meta_resources_dir/genes.gtf" \
|
||||
--bed_output genes.bed
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "genes.bed" ] && echo "BED output file does not exist!" && exit 1
|
||||
[ ! -s "genes.bed" ] && echo "BED output file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
45
src/gtf_filter/config.vsh.yaml
Normal file
45
src/gtf_filter/config.vsh.yaml
Normal file
@@ -0,0 +1,45 @@
|
||||
name: "gtf_filter"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/gtf_filter.nf]
|
||||
last_sha: 1c6012ecbb087014ea4b8f0f3d39b874850277a8
|
||||
description: |
|
||||
Filters a GTF file based on sequence names in a FASTA file.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--fasta"
|
||||
type: file
|
||||
description: Genome fasta file
|
||||
- name: "--gtf"
|
||||
type: file
|
||||
description: GTF file
|
||||
- name: "--skip_transcript_id_check"
|
||||
type: boolean_true
|
||||
description: Skip checking for transcript IDs in the GTF file.
|
||||
|
||||
- name: " Output"
|
||||
arguments:
|
||||
- name: "--filtered_gtf"
|
||||
type: file
|
||||
direction: output
|
||||
description: Filtered GTF file containing only sequences in the FASTA file
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genome.fasta
|
||||
- path: /testData/minimal_test/reference/genes.gtf.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: python
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
47
src/gtf_filter/script.py
Normal file
47
src/gtf_filter/script.py
Normal file
@@ -0,0 +1,47 @@
|
||||
# Adapted from https://github.com/nf-core/rnaseq/blob/3.14.0/bin/filter_gtf.py
|
||||
|
||||
import os
|
||||
import sys
|
||||
import re
|
||||
import statistics
|
||||
from typing import Set
|
||||
|
||||
def extract_fasta_seq_names(fasta_name: str) -> Set[str]:
|
||||
"""Extracts the sequence names from a FASTA file."""
|
||||
with open(fasta_name) as fasta:
|
||||
return {line[1:].split(None, 1)[0] for line in fasta if line.startswith(">")}
|
||||
|
||||
def tab_delimited(file: str) -> float:
|
||||
"""Check if file is tab-delimited and return median number of tabs."""
|
||||
with open(file, "r") as f:
|
||||
data = f.read(102400)
|
||||
return statistics.median(line.count("\t") for line in data.split("\n"))
|
||||
|
||||
def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
|
||||
"""Filter GTF file based on FASTA sequence names."""
|
||||
if tab_delimited(gtf_in) != 8:
|
||||
raise ValueError("Invalid GTF file: Expected 9 tab-separated columns.")
|
||||
seq_names_in_genome = extract_fasta_seq_names(fasta)
|
||||
print(f"Extracted chromosome sequence names from {fasta}")
|
||||
print("All sequence IDs from FASTA: " + ", ".join(sorted(seq_names_in_genome)))
|
||||
seq_names_in_gtf = set()
|
||||
try:
|
||||
with open(gtf_in) as gtf, open(filtered_gtf_out, "w") as out:
|
||||
line_count = 0
|
||||
for line in gtf:
|
||||
seq_name = line.split("\t")[0]
|
||||
seq_names_in_gtf.add(seq_name) # Add sequence name to the set
|
||||
if seq_name in seq_names_in_genome:
|
||||
if skip_transcript_id_check or re.search(r'transcript_id "([^"]+)"', line):
|
||||
out.write(line)
|
||||
line_count += 1
|
||||
if line_count == 0:
|
||||
raise ValueError("All GTF lines removed by filters")
|
||||
except IOError as e:
|
||||
print(f"File operation failed: {e}")
|
||||
return
|
||||
|
||||
print("All sequence IDs from GTF: " + ", ".join(sorted(seq_names_in_gtf)))
|
||||
print(f"Extracted {line_count} matching sequences from {gtf_in} into {filtered_gtf_out}")
|
||||
|
||||
filter_gtf(par["fasta"], par["gtf"], par["filtered_gtf"], par["skip_transcript_id_check"])
|
||||
16
src/gtf_filter/test.sh
Normal file
16
src/gtf_filter/test.sh
Normal file
@@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
|
||||
gunzip "$meta_resources_dir/genes.gtf.gz"
|
||||
|
||||
echo ">>>Testing $metat_functionality_name"
|
||||
"$meta_executable" \
|
||||
--fasta "$meta_resources_dir/genome.fasta" \
|
||||
--gtf "$meta_resources_dir/genes.gtf" \
|
||||
--filtered_gtf filtered_genes.gtf
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "filtered_genes.gtf" ] && echo "Filtered GTF file does not exist!" && exit 1
|
||||
[ ! -s "filtered_genes.gtf" ] && echo "Filtered GTF file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
42
src/gunzip/config.vsh.yaml
Normal file
42
src/gunzip/config.vsh.yaml
Normal file
@@ -0,0 +1,42 @@
|
||||
name: "gunzip"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/gunzip/main.nf, modules/nf-core/gunzip/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Compress or uncompress a file or list of files.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: Path of file to be uncompressed
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: Decompressed file.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/genes.gff.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ gzip ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
11
src/gunzip/script.sh
Executable file
11
src/gunzip/script.sh
Executable file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
filename="$(basename -- "$par_input")"
|
||||
|
||||
if [ ${filename##*.} == "gz" ]; then
|
||||
gunzip -c $par_input > $par_output
|
||||
else
|
||||
cat $par_input > $par_output
|
||||
fi
|
||||
22
src/gunzip/test.sh
Normal file
22
src/gunzip/test.sh
Normal file
@@ -0,0 +1,22 @@
|
||||
#!/bin/bash
|
||||
|
||||
# define input and output for script
|
||||
input="$meta_resources_dir/genes.gff.gz"
|
||||
output="genes.gff"
|
||||
|
||||
# run executable and tests
|
||||
echo "> Running $meta_functionality_name."
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input" \
|
||||
--output "$output"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Checking whether output can be found and has content"
|
||||
|
||||
[ ! -f "$output" ] && echo "$output file missing" && exit 1
|
||||
[ ! -s "$output" ] && echo "$output file is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
49
src/kallisto/kallisto_index/config.vsh.yaml
Normal file
49
src/kallisto/kallisto_index/config.vsh.yaml
Normal file
@@ -0,0 +1,49 @@
|
||||
name: kallisto_index
|
||||
namespace: kallisto
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/kallisto/index/main.nf, modules/nf-core/kallisto/index/meta.yml]
|
||||
last_sha: c0816976384d5e7ee6079c29c45958df1ffa0ee4
|
||||
description: |
|
||||
Create Kallisto index.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--transcriptome_fasta"
|
||||
type: file
|
||||
- name: "--pseudo_aligner_kmer_size"
|
||||
type: integer
|
||||
description: Kmer length passed to indexing step of pseudoaligners.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--kallisto_index"
|
||||
type: file
|
||||
direction: output
|
||||
default: Kallisto_index
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/transcriptome.fasta
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends wget && \
|
||||
wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
|
||||
tar -xzf kallisto_linux-v0.50.1.tar.gz && \
|
||||
mv kallisto/kallisto /usr/local/bin/
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
8
src/kallisto/kallisto_index/script.sh
Normal file
8
src/kallisto/kallisto_index/script.sh
Normal file
@@ -0,0 +1,8 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
kallisto index \
|
||||
${par_pseudo_aligner_kmer_size:+-k $par_pseudo_aligner_kmer_size} \
|
||||
-i $par_kallisto_index \
|
||||
$par_transcriptome_fasta
|
||||
14
src/kallisto/kallisto_index/test.sh
Normal file
14
src/kallisto/kallisto_index/test.sh
Normal file
@@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--transcriptome_fasta "$meta_resources_dir/transcriptome.fasta" \
|
||||
--kallisto_index Kallisto
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "Kallisto" ] && echo "Kallisto index does not exist!" && exit 1
|
||||
[ ! -s "Kallisto" ] && echo "Kallisto index is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
88
src/kallisto/kallisto_quant/config.vsh.yaml
Normal file
88
src/kallisto/kallisto_quant/config.vsh.yaml
Normal file
@@ -0,0 +1,88 @@
|
||||
name: kallisto_quant
|
||||
namespace: kallisto
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/kallisto/quant/main.nf, modules/nf-core/kallisto/quant/meta.yml]
|
||||
last_sha: aff1d2e02717247831644769fc3ba84868c3fdde
|
||||
description: |
|
||||
Computes equivalence classes for reads and quantifies abundances.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
description: List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
description: Paired reads or not.
|
||||
- name: "--strandedness"
|
||||
type: string
|
||||
description: Sample strand-specificity.
|
||||
- name: "--index"
|
||||
type: file
|
||||
description: Kallisto genome index.
|
||||
- name: "--gtf"
|
||||
type: file
|
||||
description: Optional gtf file for translation of transcripts into genomic coordinates.
|
||||
- name: "--chromosomes"
|
||||
type: file
|
||||
description: Optional tab separated file with chromosome names and lengths.
|
||||
- name: "--fragment_length"
|
||||
type: integer
|
||||
description: For single-end mode only, the estimated average fragment length.
|
||||
- name: "--fragment_length_sd"
|
||||
type: integer
|
||||
description: For single-end mode only, the estimated standard deviation of the fragment length.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
description: Kallisto quant results
|
||||
default: "$id.kallisto_quant_results"
|
||||
direction: output
|
||||
- name: "--log"
|
||||
type: file
|
||||
description: File containing log information from running kallisto quant
|
||||
default: "$id.kallisto_quant.log.txt"
|
||||
direction: output
|
||||
- name: "--run_info"
|
||||
type: file
|
||||
description: A json file containing information about the run
|
||||
default: "$id.run_info.json"
|
||||
direction: output
|
||||
- name: "--quant_results_file"
|
||||
type: file
|
||||
description: TSV file containing abundance estimates from Kallisto
|
||||
direction: output
|
||||
default: $id.abundance.tsv
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/transcriptome.fasta
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends wget && \
|
||||
wget --no-check-certificate https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz && \
|
||||
tar -xzf kallisto_linux-v0.50.1.tar.gz && \
|
||||
mv kallisto/kallisto /usr/local/bin/
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
49
src/kallisto/kallisto_quant/script.sh
Normal file
49
src/kallisto/kallisto_quant/script.sh
Normal file
@@ -0,0 +1,49 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
IFS="," read -ra input <<< $par_input
|
||||
|
||||
single_end_params=''
|
||||
if [ $par_paired == "false" ]; then
|
||||
if [[ $par_fragment_length < 0 ]] || [[ ! $fragment_length_sd < 0 ]]; then
|
||||
echo "fragment_length and fragment_length_sd must be set for single-end data"
|
||||
exit 1
|
||||
fi
|
||||
single_end_params="--single --fragment-length $par_fragment_length --sd $par_fragment_length_sd"
|
||||
fi
|
||||
|
||||
strandedness=''
|
||||
if [[ "$par_extra_args" != *"--fr-stranded"* ]] && [[ "$par_extra_args" != *"--rf-stranded"* ]]; then
|
||||
if [ "$par_strandedness" == 'forward' ]; then
|
||||
strandedness='--fr-stranded'
|
||||
elif [ "$par_strandedness" == 'reverse' ]; then
|
||||
strandedness='--rf-stranded'
|
||||
fi
|
||||
fi
|
||||
|
||||
mkdir -p $par_output
|
||||
echo "kallisto quant \
|
||||
${meta_cpus:+--threads $meta_cpus} \
|
||||
--index $par_index \
|
||||
${par_gtf:+--gtf $par_gtf} \
|
||||
${par_chromosomes:+--chromosomes $par_chromosomes} \
|
||||
$single_end_params \
|
||||
$strandedness \
|
||||
$par_extra_args \
|
||||
-o $par_output \
|
||||
${input[*]} 2> >(tee -a ${par_output}/kallisto_quant.log >&2)"
|
||||
kallisto quant \
|
||||
${meta_cpus:+--threads $meta_cpus} \
|
||||
--index $par_index \
|
||||
${par_gtf:+--gtf $par_gtf} \
|
||||
${par_chromosomes:+--chromosomes $par_chromosomes} \
|
||||
$single_end_params \
|
||||
$strandedness \
|
||||
$par_extra_args \
|
||||
-o $par_output \
|
||||
${input[*]} 2> >(tee -a ${par_output}/kallisto_quant.log >&2)
|
||||
|
||||
mv ${par_output}/kallisto_quant.log ${par_log}
|
||||
mv ${par_output}/run_info.json ${par_run_info}
|
||||
cp ${par_output}/abundance.tsv ${par_quant_results_file}
|
||||
55
src/kallisto/kallisto_quant/test.sh
Normal file
55
src/kallisto/kallisto_quant/test.sh
Normal file
@@ -0,0 +1,55 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
echo ">>> Generating Kallisto index"
|
||||
kallisto index \
|
||||
-i index \
|
||||
$meta_resources_dir/transcriptome.fasta
|
||||
|
||||
echo ">>> Testing for paired-end reads"
|
||||
"$meta_executable" \
|
||||
--index index \
|
||||
--paired true \
|
||||
--strandedness reverse \
|
||||
--output paired_end_test \
|
||||
--input "SRR6357070_1.fastq.gz,SRR6357070_2.fastq.gz" \
|
||||
--log quant_pe.log \
|
||||
--run_info pe_run_info.json
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -d "paired_end_test" ] && echo "Kallisto results do not exist!" && exit 1
|
||||
[ ! -f "quant_pe.log" ] && echo "quant_pe.log does not exist!" && exit 1
|
||||
[ ! -s "quant_pe.log" ] && echo "quant_pe.log is empty!" && exit 1
|
||||
[ ! -f "pe_run_info.json" ] && echo "pe_run_info.json does not exist!" && exit 1
|
||||
[ ! -s "pe_run_info.json" ] && echo "pe_run_info.json is empty!" && exit 1
|
||||
[ ! -f "paired_end_test/abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
|
||||
[ ! -s "paired_end_test/abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
|
||||
[ ! -f "paired_end_test/abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
|
||||
[ ! -s "paired_end_test/abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
|
||||
|
||||
echo ">>> Testing for single-end reads"
|
||||
"$meta_executable" \
|
||||
--index index \
|
||||
--paired false \
|
||||
--strandedness "reverse" \
|
||||
--output single_end_test \
|
||||
--input "SRR6357070_1.fastq.gz" \
|
||||
--log quant_se.log \
|
||||
--run_info se_run_info.json \
|
||||
--fragment_length 101 \
|
||||
--fragment_length_sd 50
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -d "single_end_test" ] && echo "Kallisto results do not exist!" && exit 1
|
||||
[ ! -f "quant_se.log" ] && echo "quant_se.log does not exist!" && exit 1
|
||||
[ ! -s "quant_se.log" ] && echo "quant_se.log is empty!" && exit 1
|
||||
[ ! -f "se_run_info.json" ] && echo "se_run_info.json does not exist!" && exit 1
|
||||
[ ! -s "se_run_info.json" ] && echo "se_run_info.json is empty!" && exit 1
|
||||
[ ! -f "single_end_test/abundance.tsv" ] && echo "abundance.tsv does not exist!" && exit 1
|
||||
[ ! -s "single_end_test/abundance.tsv" ] && echo "abundance.tsv is empty!" && exit 1
|
||||
[ ! -f "single_end_test/abundance.h5" ] && echo "abundance.h5 does not exist!" && exit 1
|
||||
[ ! -s "single_end_test/abundance.h5" ] && echo "abundance.h5 is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
46
src/multiqc_custom_biotype/config.vsh.yaml
Normal file
46
src/multiqc_custom_biotype/config.vsh.yaml
Normal file
@@ -0,0 +1,46 @@
|
||||
name: "multiqc_custom_biotype"
|
||||
info:
|
||||
migration_info:
|
||||
description: Calculate features percentage for biotype counts
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--biocounts"
|
||||
type: file
|
||||
description: File with all biocounts
|
||||
- name: "--id"
|
||||
type: string
|
||||
description: Sample name
|
||||
default: $id
|
||||
- name: "--biotypes_header"
|
||||
type: file
|
||||
default: assets/multiqc/biotypes_header.txt
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: '--featurecounts_multiqc'
|
||||
type: file
|
||||
direction: output
|
||||
default: $id.biotype_counts_mqc.tsv
|
||||
- name: '--featurecounts_rrna_multiqc'
|
||||
type: file
|
||||
direction: output
|
||||
default: $id.biotype_counts_rrna_mqc.tsv
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
# Copied from https://github.com/nf-core/rnaseq/blob/3.12.0/bin/mqc_features_stat.py
|
||||
- path: mqc_features_stat.py
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [pip]
|
||||
- type: python
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
89
src/multiqc_custom_biotype/mqc_features_stat.py
Executable file
89
src/multiqc_custom_biotype/mqc_features_stat.py
Executable file
@@ -0,0 +1,89 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
|
||||
# Create a logger
|
||||
logging.basicConfig(format="%(name)s - %(asctime)s %(levelname)s: %(message)s")
|
||||
logger = logging.getLogger(__file__)
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
mqc_main = """#id: 'biotype-gs'
|
||||
#plot_type: 'generalstats'
|
||||
#pconfig:"""
|
||||
|
||||
mqc_pconf = """# percent_{ft}:
|
||||
# title: '% {ft}'
|
||||
# namespace: 'Biotype Counts'
|
||||
# description: '% reads overlapping {ft} features'
|
||||
# max: 100
|
||||
# min: 0
|
||||
# scale: 'RdYlGn-rev'
|
||||
# format: '{{:.2f}}%'"""
|
||||
|
||||
|
||||
def mqc_feature_stat(bfile, features, outfile, sname=None):
|
||||
# If sample name not given use file name
|
||||
if not sname:
|
||||
sname = os.path.splitext(os.path.basename(bfile))[0]
|
||||
|
||||
# Try to parse and read biocount file
|
||||
fcounts = {}
|
||||
try:
|
||||
with open(bfile, "r") as bfl:
|
||||
for ln in bfl:
|
||||
if ln.startswith("#"):
|
||||
continue
|
||||
ft, cn = ln.strip().split("\t")
|
||||
fcounts[ft] = float(cn)
|
||||
except:
|
||||
logger.error("Trouble reading the biocount file {}".format(bfile))
|
||||
return
|
||||
|
||||
total_count = sum(fcounts.values())
|
||||
if total_count == 0:
|
||||
logger.error("No biocounts found, exiting")
|
||||
return
|
||||
|
||||
# Calculate percentage for each requested feature
|
||||
fpercent = {f: (fcounts[f] / total_count) * 100 if f in fcounts else 0 for f in features}
|
||||
if len(fpercent) == 0:
|
||||
logger.error("Any of given features '{}' not found in the biocount file".format(", ".join(features), bfile))
|
||||
return
|
||||
|
||||
# Prepare the output strings
|
||||
out_head, out_value, out_mqc = ("Sample", "'{}'".format(sname), mqc_main)
|
||||
for ft, pt in fpercent.items():
|
||||
out_head = "{}\tpercent_{}".format(out_head, ft)
|
||||
out_value = "{}\t{}".format(out_value, pt)
|
||||
out_mqc = "{}\n{}".format(out_mqc, mqc_pconf.format(ft=ft))
|
||||
|
||||
# Write the output to a file
|
||||
with open(outfile, "w") as ofl:
|
||||
out_final = "\n".join([out_mqc, out_head, out_value]).strip()
|
||||
ofl.write(out_final + "\n")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="""Calculate features percentage for biotype counts""")
|
||||
parser.add_argument("biocount", type=str, help="File with all biocounts")
|
||||
parser.add_argument(
|
||||
"-f",
|
||||
"--features",
|
||||
dest="features",
|
||||
required=True,
|
||||
nargs="+",
|
||||
help="Features to count",
|
||||
)
|
||||
parser.add_argument("-s", "--sample", dest="sample", type=str, help="Sample Name")
|
||||
parser.add_argument(
|
||||
"-o",
|
||||
"--output",
|
||||
dest="output",
|
||||
default="biocount_percent.tsv",
|
||||
type=str,
|
||||
help="Sample Name",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
mqc_feature_stat(args.biocount, args.features, args.output, args.sample)
|
||||
11
src/multiqc_custom_biotype/script.sh
Normal file
11
src/multiqc_custom_biotype/script.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
cut -f 1,7 $par_biocounts | tail -n +3 | cat $par_biotypes_header - >> $par_featurecounts_multiqc
|
||||
|
||||
python3 "$meta_resources_dir/mqc_features_stat.py" \
|
||||
$par_featurecounts_multiqc \
|
||||
-s $par_id \
|
||||
-f rRNA \
|
||||
-o $par_featurecounts_rrna_multiqc
|
||||
69
src/picard_markduplicates/config.vsh.yaml
Normal file
69
src/picard_markduplicates/config.vsh.yaml
Normal file
@@ -0,0 +1,69 @@
|
||||
name: "picard_markduplicates"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/picard/markduplicates/main.nf, modules/nf-core/picard/markduplicates/meta.yml]
|
||||
last_sha: 55398de6ab7577acfe9b1180016a93d7af7eb859
|
||||
description: |
|
||||
Locate and tag duplicate reads in a BAM file
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--bam"
|
||||
type: file
|
||||
description: Input BAM file
|
||||
- name: "--fasta"
|
||||
type: file
|
||||
description: Reference genome FASTA file
|
||||
- name: "--fai"
|
||||
type: file
|
||||
description: Reference genome FASTA index
|
||||
- name: "--extra_picard_args"
|
||||
type: string
|
||||
description: Additional argument to be passed to Picard MarkDuplicates
|
||||
default: '--ASSUME_SORTED true --REMOVE_DUPLICATES false --VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp'
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_bam"
|
||||
type: file
|
||||
direction: output
|
||||
description: BAM file with duplicate reads marked/removed
|
||||
default: $id.MarkDuplicates.bam
|
||||
- name: "--bai"
|
||||
type: file
|
||||
direction: output
|
||||
description: An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag
|
||||
default: $id.MarkDuplicates.bam.bai
|
||||
must_exist: false
|
||||
- name: "--metrics"
|
||||
type: file
|
||||
direction: output
|
||||
description: Duplicate metrics file generated by picard
|
||||
default: $id.MarkDuplicates.metrics.txt
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/genome.fasta
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
apt-get update && \
|
||||
apt-get install -y build-essential openjdk-17-jdk wget && \
|
||||
wget --no-check-certificate https://github.com/broadinstitute/picard/releases/download/3.1.1/picard.jar && \
|
||||
mv picard.jar /usr/local/bin
|
||||
env: [ PICARD=/usr/local/bin/picard.jar ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
17
src/picard_markduplicates/script.sh
Executable file
17
src/picard_markduplicates/script.sh
Executable file
@@ -0,0 +1,17 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
avail_mem=3072
|
||||
if [ ! $meta_memory_mb ]; then
|
||||
echo '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.'
|
||||
else
|
||||
avail_mem=$(( $meta_memory_mb*0.8 ))
|
||||
fi
|
||||
|
||||
java -Xmx${avail_mem}M -jar $PICARD MarkDuplicates \
|
||||
$par_extra_picard_args \
|
||||
--INPUT $par_bam \
|
||||
--OUTPUT $par_output_bam \
|
||||
--REFERENCE_SEQUENCE $par_fasta \
|
||||
--METRICS_FILE $par_metrics
|
||||
19
src/picard_markduplicates/test.sh
Normal file
19
src/picard_markduplicates/test.sh
Normal file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--bam "$meta_resources_dir/test.paired_end.sorted.bam" \
|
||||
--fasta "$meta_resources_dir/genome.fasta" \
|
||||
--extra_picard_args "--REMOVE_DUPLICATES false" \
|
||||
--output_bam "test.MarkDuplicates.genome.bam" \
|
||||
--metrics "test.MarkDuplicates.metrics.txt"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "test.MarkDuplicates.genome.bam" ] && echo "MarkDuplicates output BAM file does not exist!" && exit 1
|
||||
[ ! -s "test.MarkDuplicates.genome.bam" ] && echo "MarkDuplicates output BAM file is empty!" && exit 1
|
||||
[ ! -f "test.MarkDuplicates.metrics.txt" ] && echo "MarkDuplicates output metrics file does not exist!" && exit 1
|
||||
[ ! -s "test.MarkDuplicates.metrics.txt" ] && echo "MarkDuplicates output metrics file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
146
src/prepare_multiqc_input/config.vsh.yaml
Normal file
146
src/prepare_multiqc_input/config.vsh.yaml
Normal file
@@ -0,0 +1,146 @@
|
||||
name: "prepare_multiqc_input"
|
||||
description: |
|
||||
Prepare directory with all the input files for MultiQC.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--fail_trimming_multiqc"
|
||||
type: string
|
||||
- name: "--fail_mapping_multiqc"
|
||||
type: string
|
||||
- name: "--fail_strand_multiqc"
|
||||
type: string
|
||||
- name: "--fastqc_raw_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--fastqc_trim_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--trim_log_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--sortmerna_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--star_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
# - name: "--hisat2_multiqc"
|
||||
# type: file
|
||||
# - name: "--rsem_multiqc"
|
||||
# type: file
|
||||
- name: "--salmon_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--samtools_stats"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--samtools_flagstat"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--samtools_idxstats"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--markduplicates_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--pseudo_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--featurecounts_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--featurecounts_rrna_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--aligner_pca_multiqc"
|
||||
type: file
|
||||
- name: "--aligner_clustering_multiqc"
|
||||
type: file
|
||||
- name: "--pseudo_aligner_pca_multiqc"
|
||||
type: file
|
||||
- name: "--pseudo_aligner_clustering_multiqc"
|
||||
type: file
|
||||
- name: "--preseq_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--qualimap_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--dupradar_output_dup_intercept_mqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--dupradar_output_duprate_exp_denscurve_mqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--bamstat_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--inferexperiment_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--innerdistance_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--junctionannotation_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--junctionsaturation_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--readdistribution_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--readduplication_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--tin_multiqc"
|
||||
type: file
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--multiqc_config"
|
||||
type: file
|
||||
|
||||
|
||||
- name: "Ouput"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
default: multiqc_input
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
74
src/prepare_multiqc_input/script.sh
Normal file
74
src/prepare_multiqc_input/script.sh
Normal file
@@ -0,0 +1,74 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
mkdir -p $par_output
|
||||
|
||||
echo $par_fail_trimming_multiqc > $par_output/fail_trimming_mqc.tsv
|
||||
echo $par_fail_mapping_multiqc > $par_output/fail_mapping_mqc.tsv
|
||||
echo $par_fail_strand_multiqc > $par_output/fail_strand_mqc.tsv
|
||||
|
||||
IFS="," read -ra fastqc_raw_multiqc <<< $par_fastqc_raw_multiqc && for file in "${fastqc_raw_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra fastqc_trim_multiqc <<< $par_fastqc_trim_multiqc && for file in "${fastqc_trim_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra trim_log_multiqc <<< $par_trim_log_multiqc && for file in "${trim_log_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra sortmerna_multiqc <<< $par_sortmerna_multiqc && for file in "${sortmerna_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra star_multiqc <<< $par_star_multiqc && for file in "${star_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
# IFS="," read -ra hisat2_multiqc <<< $par_hisat2_multiqc && for file in "${hisat2_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra rsem_multiqc <<< $par_rsem_multiqc && for file in "${rsem_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra salmon_multiqc <<< $par_salmon_multiqc && for file in "${salmon_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra samtools_stats <<< $par_samtools_stats && for file in "${samtools_stats[@]}"; do [ -e "$file" ] && cp -r "$file" $par_output/; done
|
||||
|
||||
IFS="," read -ra samtools_flagstat <<< $par_samtools_flagstat && for file in "${samtools_flagstat[@]}"; do [ -e "$file" ] && cp -r "$file" $par_output/; done
|
||||
|
||||
IFS="," read -ra samtools_idxstats <<< $par_samtools_idxstats && for file in "${samtools_idxstats[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra markduplicates_multiqc <<< $par_markduplicates_multiqc && for file in "${markduplicates_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra pseudo_multiqc <<< $par_pseudo_multiqc && for file in "${pseudo_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
|
||||
IFS="," read -ra featurecounts_multiqc <<< $par_featurecounts_multiqc && for file in "${featurecounts_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra featurecounts_rrna_multiqc <<< $par_featurecounts_rrna_multiqc&& for file in "${featurecounts_rrna_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
[ -e "$par_aligner_pca_multiqc" ] && cp -r "$par_aligner_pca_multiqc" "$par_output/"
|
||||
|
||||
[ -e "$par_aligner_clustering_multiqc" ] && cp -r $par_aligner_clustering_multiqc "$par_output/"
|
||||
|
||||
[ -e "$par_pseudo_aligner_pca_multiqc" ] && cp -r $par_pseudo_aligner_pca_multiqc "$par_output/"
|
||||
|
||||
[ -e "$par_pseudo_aligner_clustering_multiqc" ] && cp -r $par_pseudo_aligner_clustering_multiqc "$par_output/"
|
||||
|
||||
IFS="," read -ra preseq_multiqc <<< $par_preseq_multiqc && for file in "${preseq_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra qualimap_multiqc <<< $par_qualimap_multiqc && for file in "${qualimap_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra dupradar_output_dup_intercept_mqc <<< $par_dupradar_output_dup_intercept_mqc && for file in "${dupradar_output_dup_intercept_mqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra dupradar_output_duprate_exp_denscurve_mqc <<< $par_dupradar_output_duprate_exp_denscurve_mqc && for file in "${dupradar_output_duprate_exp_denscurve_mqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra bamstat_multiqc <<< $par_bamstat_multiqc && for file in "${bamstat_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra inferexperiment_multiqc <<< $par_inferexperiment_multiqc && for file in "${inferexperiment_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra innerdistance_multiqc <<< $par_innerdistance_multiqc && for file in "${innerdistance_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra junctionannotation_multiqc <<< $par_junctionannotation_multiqc && for file in "${junctionannotation_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra junctionsaturation_multiqc <<< $par_junctionsaturation_multiqc && for file in "${junctionsaturation_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra readdistribution_multiqc <<< $par_readdistribution_multiqc && for file in "${readdistribution_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra readduplication_multiqc <<< $par_readduplication_multiqc && for file in "${readduplication_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
IFS="," read -ra tin_multiqc <<< $par_tin_multiqc && for file in "${tin_multiqc[@]}"; do [ -e "$file" ] && cp -r "$file" "$par_output/"; done
|
||||
|
||||
[ -e "$par_multiqc_config" ] && cp -r $par_multiqc_config "$par_output/"
|
||||
40
src/preprocess_transcripts_fasta/config.vsh.yaml
Normal file
40
src/preprocess_transcripts_fasta/config.vsh.yaml
Normal file
@@ -0,0 +1,40 @@
|
||||
name: "preprocess_transcripts_fasta"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/preprocess_transcripts_fasta_gencode.nf]
|
||||
last_sha: 0a1bdcfbb498987643b74e9fccab85ccd9f2a17d
|
||||
description: |
|
||||
Process transcripts FASTA if GTF file is GENOCODE format
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--transcript_fasta"
|
||||
type: file
|
||||
required: true
|
||||
description: Path of transcripts FASTA file
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
description: Path of processed output FASTA file.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/reference/transcriptome.fasta
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
11
src/preprocess_transcripts_fasta/script.sh
Normal file
11
src/preprocess_transcripts_fasta/script.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
filename="$(basename -- "$par_transcript_fasta")"
|
||||
|
||||
if [ ${filename##*.} == "gz" ]; then
|
||||
zcat $par_transcript_fasta | cut -d "|" -f1 > $par_output
|
||||
else
|
||||
cat $par_transcript_fasta | cut -d "|" -f1 > $par_output
|
||||
fi
|
||||
14
src/preprocess_transcripts_fasta/test.sh
Normal file
14
src/preprocess_transcripts_fasta/test.sh
Normal file
@@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--transcript_fasta "$meta_resources_dir/transcriptome.fasta" \
|
||||
--output "processed_transcriptome.fasta"
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "processed_transcriptome.fasta" ] && echo "Processed FASTA file does not exist!" && exit 1
|
||||
[ ! -s "processed_transcriptome.fasta" ] && echo "Processed FASTA file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
68
src/preseq_lcextrap/config.vsh.yaml
Normal file
68
src/preseq_lcextrap/config.vsh.yaml
Normal file
@@ -0,0 +1,68 @@
|
||||
name: "preseq_lcextrap"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/preseq/lcextrap/main.nf, modules/nf-core/preseq/lcextrap/meta.yml]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: Computing the expected future yield of distinct reads and bounds on the number of total distinct reads in the library and the associated confidence intervals.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
description: Input genome BAM/BED file
|
||||
- name: "--extra_preseq_args"
|
||||
type: string
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
description: Paired-end reads?
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
default: $id.lc_extrap.txt
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/a.sorted.bed
|
||||
- path: /testData/unit_test_resources/SRR1106616_5M_subset.bam
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ curl, bzip2, build-essential, wget, gcc, autoconf, automake, make, libz-dev, libbz2-dev, zlib1g-dev, libncurses5-dev, libncursesw5-dev, liblzma-dev, pip ]
|
||||
- type: docker
|
||||
run: |
|
||||
cd /usr/bin && \
|
||||
wget --no-check-certificate https://github.com/smithlabcode/preseq/releases/download/v3.2.0/preseq-3.2.0.tar.gz && \
|
||||
wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 && \
|
||||
wget --no-check-certificate https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static && \
|
||||
curl -fsSL https://github.com/samtools/samtools/releases/download/1.18/samtools-1.18.tar.bz2 -o samtools-1.18.tar.bz2 && \
|
||||
tar -xjf samtools-1.18.tar.bz2 && rm samtools-1.18.tar.bz2 && \
|
||||
tar -xzf preseq-3.2.0.tar.gz && rm preseq-3.2.0.tar.gz && \
|
||||
tar -vxjf htslib-1.9.tar.bz2 && rm htslib-1.9.tar.bz2 && \
|
||||
mv bedtools.static /usr/local/bin/bedtools && \
|
||||
chmod a+x /usr/local/bin/bedtools && \
|
||||
cd samtools-1.18 && \
|
||||
./configure && \
|
||||
make && \
|
||||
make install && \
|
||||
cd /usr/bin && cd htslib-1.9 && \
|
||||
make && \
|
||||
cd /usr/bin && cd preseq-3.2.0 && \
|
||||
mkdir build && cd build && \
|
||||
../configure && \
|
||||
make && make install && make HAVE_HTSLIB=1 all
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
29
src/preseq_lcextrap/script.sh
Normal file
29
src/preseq_lcextrap/script.sh
Normal file
@@ -0,0 +1,29 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
file=$(basename -- "$par_input")
|
||||
filename="${file%.*}"
|
||||
|
||||
if [ "${file##*.}" == "bam" ]; then
|
||||
samtools sort -o sorted_$filename.bam -n $par_input
|
||||
bedtools bamtobed -i sorted_$filename.bam > $filename.bed
|
||||
bedtools sort -i $filename.bed > sorted_$filename.bed
|
||||
elif [ "${file##*.}" == "bed" ]; then
|
||||
bedtools sort -i $par_input > sorted_$filename.bed
|
||||
else
|
||||
echo "Invalid input file format!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if $par_paired; then
|
||||
paired="-pe"
|
||||
else
|
||||
paired=""
|
||||
fi
|
||||
|
||||
preseq lc_extrap \
|
||||
sorted_$filename.bed \
|
||||
$paired \
|
||||
$par_extra_preseq_args \
|
||||
-o $par_output
|
||||
28
src/preseq_lcextrap/test.sh
Normal file
28
src/preseq_lcextrap/test.sh
Normal file
@@ -0,0 +1,28 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
echo ">>> Testing with BAM input"
|
||||
"$meta_executable" \
|
||||
--paired false \
|
||||
--input "$meta_resources_dir/SRR1106616_5M_subset.bam" \
|
||||
--output lc_extrap.txt
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "lc_extrap.txt" ] && echo "Output file does not exist!" && exit 1
|
||||
[ ! -s "lc_extrap.txt" ] && echo "Output file is empty!" && exit 1
|
||||
|
||||
rm lc_extrap.txt
|
||||
|
||||
echo ">>> Testing with BED input"
|
||||
"$meta_executable" \
|
||||
--paired false \
|
||||
--input "$meta_resources_dir/a.sorted.bed" \
|
||||
--output lc_extrap.txt
|
||||
|
||||
echo ">>> Check whether output exists"
|
||||
[ ! -f "lc_extrap.txt" ] && echo "Output file does not exist!" && exit 1
|
||||
[ ! -s "lc_extrap.txt" ] && echo "Output file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
118
src/qualimap/config.vsh.yaml
Normal file
118
src/qualimap/config.vsh.yaml
Normal file
@@ -0,0 +1,118 @@
|
||||
name: "qualimap"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/qualimap/rnaseq/main.nf]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
RNA-seq QC analysis using the qualimap
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: path to input mapping file in BAM format.
|
||||
|
||||
- name: "--gtf"
|
||||
type: file
|
||||
required: true
|
||||
description: path to annotations file in Ensembl GTF format.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_dir"
|
||||
direction: output
|
||||
type: file
|
||||
required: false
|
||||
default: $id.qualimap_output
|
||||
description: path to output directory for raw data and report.
|
||||
|
||||
- name: "--output_pdf"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.report.pdf
|
||||
description: path to output file for pdf report.
|
||||
|
||||
- name: "--output_format"
|
||||
type: string
|
||||
required: false
|
||||
default: html
|
||||
description: Format of the output report (PDF or HTML, default is HTML)
|
||||
|
||||
- name: "Optional"
|
||||
arguments:
|
||||
- name: "--pr_bases"
|
||||
type: integer
|
||||
required: false
|
||||
default: 100
|
||||
min: 1
|
||||
description: Number of upstream/downstream nucleotide bases to compute 5'-3' bias (default = 100).
|
||||
|
||||
- name: "--tr_bias"
|
||||
type: integer
|
||||
required: false
|
||||
default: 1000
|
||||
min: 1
|
||||
description: Number of top highly expressed transcripts to compute 5'-3' bias (default = 1000).
|
||||
|
||||
- name: "--algorithm"
|
||||
type: string
|
||||
required: false
|
||||
default: uniquely-mapped-reads
|
||||
description: Counting algorithm (uniquely-mapped-reads (default) or proportional).
|
||||
|
||||
- name: "--sequencing_protocol"
|
||||
type: string
|
||||
required: false
|
||||
choices: ["non-strand-specific", "strand-specific-reverse", "strand-specific-forward"]
|
||||
default: non-strand-specific
|
||||
description: Sequencing library protocol (strand-specific-forward, strand-specific-reverse or non-strand-specific (default)).
|
||||
|
||||
- name: "--paired"
|
||||
type: boolean_true
|
||||
description: Setting this flag for paired-end experiments will result in counting fragments instead of reads.
|
||||
|
||||
- name: "--sorted"
|
||||
type: boolean_true
|
||||
description: Setting this flag indicates that the input file is already sorted by name. If flag is not set, additional sorting by name will be performed. Only requiredfor paired-end analysis.
|
||||
|
||||
- name: "--java_memory_size"
|
||||
type: string
|
||||
required: false
|
||||
default: 4G
|
||||
description: maximum Java heap memory size, default = 4G.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam
|
||||
- path: /testData/unit_test_resources/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam.bai
|
||||
- path: /testData/unit_test_resources/genes.gtf
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ r-base, unzip, wget, openjdk-8-jdk, libxml2-dev, libcurl4-openssl-dev ]
|
||||
- type: docker
|
||||
run: |
|
||||
wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.3.zip && \
|
||||
unzip qualimap_v2.3.zip && \
|
||||
cp -a qualimap_v2.3/. usr/bin && \
|
||||
unset DISPLAY && \
|
||||
mkdir -p tmp && \
|
||||
export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp
|
||||
- type: r
|
||||
bioc: [ NOISeqr ]
|
||||
cran: [ optparse ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
19
src/qualimap/script.sh
Normal file
19
src/qualimap/script.sh
Normal file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
mkdir -p $par_output_dir
|
||||
|
||||
qualimap rnaseq \
|
||||
--java-mem-size=$par_java_memory_size \
|
||||
--algorithm $par_algorithm \
|
||||
--num-pr-bases $par_pr_bases \
|
||||
--num-tr-bias $par_tr_bias \
|
||||
--sequencing-protocol $par_sequencing_protocol \
|
||||
-bam $par_input \
|
||||
-gtf $par_gtf \
|
||||
${par_paired:+-pe} \
|
||||
${par_sorted:+-s} \
|
||||
-outdir $par_output_dir \
|
||||
-outformat $par_output_format
|
||||
|
||||
24
src/qualimap/test.sh
Normal file
24
src/qualimap/test.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
echo "> Running $meta_functionality_name."
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2.bam"
|
||||
input_gtf="$meta_resources_dir/genes.gtf"
|
||||
output_dir="qualimap_output"
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--gtf "$input_gtf" \
|
||||
--output_dir "$output_dir"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Checking whether output dir and files exists"
|
||||
|
||||
[ ! -d "$output_dir" ] && echo "Output dir could not be found!" && exit 1
|
||||
[ ! -d "$output_dir/raw_data_qualimapReport" ] && echo "Raw data folder could not be found!" && exit 1
|
||||
[ -z $(ls -A "$output_dir/raw_data_qualimapReport") ] && echo "Raw data folder is missing output files" && exit 1
|
||||
[ ! -f "$output_dir/qualimapReport.html" ] && echo "Qualimap report was not found" && exit 1
|
||||
[ ! -s "$output_dir/qualimapReport.html" ] && echo "Qualimap report is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
138
src/rsem/rsem_calculate_expression/config.vsh.yaml
Normal file
138
src/rsem/rsem_calculate_expression/config.vsh.yaml
Normal file
@@ -0,0 +1,138 @@
|
||||
name: "rsem_calculate_expression"
|
||||
namespace: "rsem"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rsem/calculateexpression/main.nf, modules/nf-core/rsem/calculateexpression/meta.yml]
|
||||
last_sha: 92b2a7857de1dda9d1c19a088941fc81e2976ff7
|
||||
|
||||
description: |
|
||||
Calculate expression with RSEM.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--id"
|
||||
type: string
|
||||
description: Sample ID.
|
||||
- name: "--strandedness"
|
||||
type: string
|
||||
description: Sample strand-specificity. Must be one of unstranded, forward, reverse
|
||||
choices: [forward, reverse, unstranded]
|
||||
- name: "--paired"
|
||||
type: boolean
|
||||
description: Paired-end reads or not?
|
||||
- name: "--input"
|
||||
type: file
|
||||
description: Input reads for quantification.
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
- name: "--index"
|
||||
type: file
|
||||
description: RSEM index.
|
||||
- name: "--extra_args"
|
||||
type: string
|
||||
description: Extra rsem-calculate-expression arguments in addition to the defaults.
|
||||
- name: "--versions"
|
||||
type: file
|
||||
must_exist: false
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--counts_gene"
|
||||
type: file
|
||||
description: Expression counts on gene level
|
||||
example: sample.genes.results
|
||||
direction: output
|
||||
- name: "--counts_transcripts"
|
||||
type: file
|
||||
description: Expression counts on transcript level
|
||||
example: sample.isoforms.results
|
||||
direction: output
|
||||
- name: "--stat"
|
||||
type: file
|
||||
description: RSEM statistics
|
||||
example: sample.stat
|
||||
direction: output
|
||||
- name: "--logs"
|
||||
type: file
|
||||
description: RSEM logs
|
||||
example: sample.log
|
||||
direction: output
|
||||
- name: "--bam_star"
|
||||
type: file
|
||||
description: BAM file generated by STAR (optional)
|
||||
example: sample.STAR.genome.bam
|
||||
direction: output
|
||||
- name: "--bam_genome"
|
||||
type: file
|
||||
description: Genome BAM file (optional)
|
||||
example: sample.genome.bam
|
||||
direction: output
|
||||
- name: "--bam_transcript"
|
||||
type: file
|
||||
description: Transcript BAM file (optional)
|
||||
example: sample.transcript.bam
|
||||
direction: output
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
- path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
- path: /testData/minimal_test/reference/rsem.tar.gz
|
||||
|
||||
# TODO: Install bowtie/bowtie2
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages:
|
||||
- build-essential
|
||||
- gcc
|
||||
- g++
|
||||
- make
|
||||
- wget
|
||||
- zlib1g-dev
|
||||
- unzip
|
||||
- xxd
|
||||
- perl
|
||||
- r-base
|
||||
- bowtie2
|
||||
- python3-pip
|
||||
- git
|
||||
- type: docker
|
||||
env:
|
||||
- STAR_VERSION=2.7.11b
|
||||
- RSEM_VERSION=1.3.3
|
||||
- TZ=Europe/Brussels
|
||||
run: |
|
||||
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \
|
||||
cd /tmp && \
|
||||
wget --no-check-certificate https://github.com/alexdobin/STAR/archive/refs/tags/${STAR_VERSION}.zip && \
|
||||
unzip ${STAR_VERSION}.zip && \
|
||||
cd STAR-${STAR_VERSION}/source && \
|
||||
make STARstatic CXXFLAGS_SIMD=-std=c++11 && \
|
||||
cp STAR /usr/local/bin && \
|
||||
cd /tmp && \
|
||||
wget --no-check-certificate https://github.com/deweylab/RSEM/archive/refs/tags/v${RSEM_VERSION}.zip && \
|
||||
unzip v${RSEM_VERSION}.zip && \
|
||||
cd RSEM-${RSEM_VERSION} && \
|
||||
make && \
|
||||
make install && \
|
||||
rm -rf /tmp/STAR-${STAR_VERSION} /tmp/${STAR_VERSION}.zip && \
|
||||
rm -rf /tmp/RSEM-${RSEM_VERSION} /tmp/v${RSEM_VERSION}.zip && \
|
||||
cd && \
|
||||
apt-get clean && \
|
||||
echo 'export PATH=$PATH:/usr/local/bin' >> /etc/profile && \
|
||||
echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc && \
|
||||
/bin/bash -c "source /etc/profile && source ~/.bashrc && echo $PATH && which STAR"
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
32
src/rsem/rsem_calculate_expression/script.sh
Executable file
32
src/rsem/rsem_calculate_expression/script.sh
Executable file
@@ -0,0 +1,32 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
function clean_up {
|
||||
rm -rf "$tmpdir"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
tmpdir=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXXXX")
|
||||
|
||||
if [ $par_strandedness == 'forward' ]; then
|
||||
strandedness='--strandedness forward'
|
||||
elif [ $par_strandedness == 'reverse' ]; then
|
||||
strandedness='--strandedness reverse'
|
||||
else
|
||||
strandedness=''
|
||||
fi
|
||||
|
||||
IFS="," read -ra input <<< $par_input
|
||||
|
||||
INDEX=`find -L $meta_resources_dir/ -name "*.grp" | sed 's/\.grp$//'`
|
||||
|
||||
rsem-calculate-expression \
|
||||
${meta_cpus:+--num-theads $meta_cpus} \
|
||||
$strandedness \
|
||||
${par_paired:+--paired-end} \
|
||||
$par_extra_args \
|
||||
${input[*]} \
|
||||
$INDEX \
|
||||
$par_id
|
||||
|
||||
26
src/rsem/rsem_calculate_expression/test.sh
Normal file
26
src/rsem/rsem_calculate_expression/test.sh
Normal file
@@ -0,0 +1,26 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
tar -xavf $meta_resources_dir/rsem.tar.gz
|
||||
|
||||
echo ">>> Calculating expression"
|
||||
"$meta_executable" \
|
||||
--id WT_REP1 \
|
||||
--strandedness reverse \
|
||||
--paired true \
|
||||
--input "$meta_resources_dir/SRR6357070_1.fastq.gz,$meta_resources_dir/SRR6357070_2.fastq.gz" \
|
||||
--index rsem \
|
||||
--extra_args "--star --star-output-genome-bam --star-gzipped-read-file --estimate-rspd --seed 1" \
|
||||
--counts_gene WT_REP1.genes.results \
|
||||
--counts_transctips WT_REP1.isoforms.results \
|
||||
--logs WT_REP1.log
|
||||
|
||||
echo ">>> Checking whether output exists"
|
||||
[ ! -f "WT_REP1.genes.results" ] && echo "Gene level expression counts file does not exist!" && exit 1
|
||||
[ ! -s "WT_REP1.genes.results" ] && echo "Gene level expression counts file is empty!" && exit 1
|
||||
[ ! -f "WT_REP1.log" ] && echo "Log file does not exist!" && exit 1
|
||||
[ ! -s "WT_REP1.log" ] && echo "Log file is empty!" && exit 1
|
||||
|
||||
echo "All tests succeeded!"
|
||||
exit 0
|
||||
68
src/rsem/rsem_merge_counts/config.vsh.yaml
Normal file
68
src/rsem/rsem_merge_counts/config.vsh.yaml
Normal file
@@ -0,0 +1,68 @@
|
||||
name: "rsem_merge_counts"
|
||||
namespace: "rsem"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/local/rsem_merge_counts/main.nf]
|
||||
last_sha: 311279532694ce7520164ce4d65a388c0cd11f60
|
||||
|
||||
description: |
|
||||
Merge the transcript quantification results obtained from rsem calculate-expression across all samples.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--counts_gene"
|
||||
type: file
|
||||
description: Expression counts on gene level (genes)
|
||||
- name: "--counts_transcripts"
|
||||
type: file
|
||||
description: Expression counts on transcript level (isoforms)
|
||||
- name: "--versions"
|
||||
type: file
|
||||
must_exist: false
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--merged_gene_counts"
|
||||
type: file
|
||||
description: File containing gene counts across all samples.
|
||||
default: rsem.merged.gene_counts.tsv
|
||||
direction: output
|
||||
- name: "--merged_gene_tpm"
|
||||
type: file
|
||||
description: File containing gene TPM across all samples.
|
||||
default: rsem.merged.gene_tpm.tsv
|
||||
direction: output
|
||||
- name: "--merged_transcript_counts"
|
||||
type: file
|
||||
description: File containing transcript counts across all samples.
|
||||
default: rsem.merged.transcript_counts.tsv
|
||||
direction: output
|
||||
- name: "--merged_transcript_tpm"
|
||||
type: file
|
||||
description: File containing transcript TPM across all samples.
|
||||
default: rsem.merged.transcript_tpm.tsv
|
||||
direction: output
|
||||
- name: "--updated_versions"
|
||||
type: file
|
||||
default: versions.yml
|
||||
direction: output
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
# test_resources:
|
||||
# - type: bash_script
|
||||
# path: test.sh
|
||||
# - path: /testData/minimal_test/input_fastq/SRR6357070_1.fastq.gz
|
||||
# - path: /testData/minimal_test/input_fastq/SRR6357070_2.fastq.gz
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
28
src/rsem/rsem_merge_counts/script.sh
Normal file
28
src/rsem/rsem_merge_counts/script.sh
Normal file
@@ -0,0 +1,28 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -ep pipefail
|
||||
|
||||
mkdir -p tmp/genes
|
||||
# cut -f 1,2 `ls $par_count_genes/*` | head -n 1` > gene_ids.txt
|
||||
for file_id in ${par_count_genes[*]}; do
|
||||
samplename=`basename $file_id | sed s/\\.genes.results\$//g`
|
||||
echo $samplename > tmp/genes/${samplename}.counts.txt
|
||||
cut -f 5 ${file_id} | tail -n+2 >> tmp/genes/${samplename}.counts.txt
|
||||
echo $samplename > tmp/genes/${samplename}.tpm.txt
|
||||
cut -f 6 ${file_id} | tail -n+2 >> tmp/genes/${samplename}.tpm.txt
|
||||
done
|
||||
|
||||
mkdir -p tmp/isoforms
|
||||
# cut -f 1,2 `ls $par_counts_transcripts/*` | head -n 1` > transcript_ids.txt
|
||||
for file_id in ${par_counts_transcripts[*]}; do
|
||||
samplename=`basename $file_id | sed s/\\.isoforms.results\$//g`
|
||||
echo $samplename > tmp/isoforms/${samplename}.counts.txt
|
||||
cut -f 5 ${file_id} | tail -n+2 >> tmp/isoforms/${samplename}.counts.txt
|
||||
echo $samplename > tmp/isoforms/${samplename}.tpm.txt
|
||||
cut -f 6 ${file_id} | tail -n+2 >> tmp/isoforms/${samplename}.tpm.txt
|
||||
done
|
||||
|
||||
paste gene_ids.txt tmp/genes/*.counts.txt > $par_merged_gene_counts
|
||||
paste gene_ids.txt tmp/genes/*.tpm.txt > $par_merged_gene_tpm
|
||||
paste transcript_ids.txt tmp/isoforms/*.counts.txt > $par_merged_transcript_counts
|
||||
paste transcript_ids.txt tmp/isoforms/*.tpm.txt > $par_merged_transcript_tpm
|
||||
53
src/rseqc/rseqc_bamstat/config.vsh.yaml
Normal file
53
src/rseqc/rseqc_bamstat/config.vsh.yaml
Normal file
@@ -0,0 +1,53 @@
|
||||
name: "rseqc_bamstat"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/bamstat/main.nf]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Generate statistics from a bam file.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.mapping_quality.txt
|
||||
description: output file (txt) with mapping quality statistics
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ python3-pip ]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
8
src/rseqc/rseqc_bamstat/script.sh
Normal file
8
src/rseqc/rseqc_bamstat/script.sh
Normal file
@@ -0,0 +1,8 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
bam_stat.py \
|
||||
--input $par_input \
|
||||
--mapq $par_map_qual \
|
||||
> $par_output
|
||||
23
src/rseqc/rseqc_bamstat/test.sh
Normal file
23
src/rseqc/rseqc_bamstat/test.sh
Normal file
@@ -0,0 +1,23 @@
|
||||
#!/bin/bash
|
||||
|
||||
# define input and output for script
|
||||
|
||||
input_bam="test.paired_end.sorted.bam"
|
||||
output_summary="mapping_quality.txt"
|
||||
|
||||
# run executable and tests
|
||||
echo "> Running $meta_functionality_name."
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/$input_bam" \
|
||||
--output "$output_summary"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Checking whether output can be found and has content"
|
||||
|
||||
[ ! -f "$output_summary" ] && echo "$output_summary file missing" && exit 1
|
||||
[ ! -s "$output_summary" ] && echo "$output_summary file is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
67
src/rseqc/rseqc_inferexperiment/config.vsh.yaml
Normal file
67
src/rseqc/rseqc_inferexperiment/config.vsh.yaml
Normal file
@@ -0,0 +1,67 @@
|
||||
name: "rseqc_inferexperiment"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/inferexperiment/main.nf]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Infer strandedness from sequencing reads
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--refgene"
|
||||
type: file
|
||||
required: true
|
||||
description: Reference gene model in bed format
|
||||
|
||||
- name: "--sample_size"
|
||||
type: integer
|
||||
required: false
|
||||
default: 200000
|
||||
min: 1
|
||||
description: Numer of reads sampled from SAM/BAM file, default = 200000.
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.strandedness.txt
|
||||
description: output file (txt) of strandness report
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/test.bed12
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ python3-pip ]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
10
src/rseqc/rseqc_inferexperiment/script.sh
Normal file
10
src/rseqc/rseqc_inferexperiment/script.sh
Normal file
@@ -0,0 +1,10 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
infer_experiment.py \
|
||||
-i $par_input \
|
||||
-r $par_refgene \
|
||||
-s $par_sample_size \
|
||||
-q $par_map_qual \
|
||||
> $par_output
|
||||
24
src/rseqc/rseqc_inferexperiment/test.sh
Normal file
24
src/rseqc/rseqc_inferexperiment/test.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
#!/bin/bash
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
|
||||
input_bed="$meta_resources_dir/test.bed12"
|
||||
output="strandedness.txt"
|
||||
|
||||
# run executable and tests
|
||||
echo "> Running $meta_functionality_name."
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--refgene "$input_bed" \
|
||||
--output "$output"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Checking whether output can be found and has content"
|
||||
|
||||
[ ! -f "$output" ] && echo "$output is missing" && exit 1
|
||||
[ ! -s "$output" ] && echo "$output is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
117
src/rseqc/rseqc_innerdistance/config.vsh.yaml
Normal file
117
src/rseqc/rseqc_innerdistance/config.vsh.yaml
Normal file
@@ -0,0 +1,117 @@
|
||||
name: "rseqc_innerdistance"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/innerdistance/main.nf]
|
||||
last_sha: 54721c6946daf6d602d7069dc127deef9cbe6b33
|
||||
description: |
|
||||
Calculate inner distance between read pairs.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--refgene"
|
||||
type: file
|
||||
required: true
|
||||
description: Reference gene model in bed format
|
||||
|
||||
- name: "--sample_size"
|
||||
type: integer
|
||||
required: false
|
||||
default: 200000
|
||||
min: 1
|
||||
description: Numer of reads sampled from SAM/BAM file, default = 200000.
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "--lower_bound_size"
|
||||
type: integer
|
||||
required: false
|
||||
default: -250
|
||||
description: Lower bound of inner distance (bp). This option is used for ploting histograme, default=-250.
|
||||
|
||||
- name: "--upper_bound_size"
|
||||
type: integer
|
||||
required: false
|
||||
default: 250
|
||||
description: Upper bound of inner distance (bp). This option is used for ploting histograme, default=250.
|
||||
|
||||
- name: "--step_size"
|
||||
type: integer
|
||||
required: false
|
||||
default: 5
|
||||
description: Step size (bp) of histograme. This option is used for plotting histogram, default=5.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_stats"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.inner_distance.stats
|
||||
description: output file (txt) with summary statistics of inner distances of paired reads
|
||||
|
||||
- name: "--output_dist"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.inner_distance.txt
|
||||
description: output file (txt) with inner distances of all paired reads
|
||||
|
||||
- name: "--output_freq"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.inner_distance_freq.txt
|
||||
description: output file (txt) with frequencies of inner distances of all paired reads
|
||||
|
||||
- name: "--output_plot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.inner_distance_plot.pdf
|
||||
description: output file (pdf) with histogram plot of of inner distances of all paired reads
|
||||
|
||||
- name: "--output_plot_r"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
must_exist: false
|
||||
default: $id.inner_distance_plot.r
|
||||
description: output file (R) with script of histogram plot of of inner distances of all paired reads
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/test.bed12
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [python3-pip, r-base]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
23
src/rseqc/rseqc_innerdistance/script.sh
Normal file
23
src/rseqc/rseqc_innerdistance/script.sh
Normal file
@@ -0,0 +1,23 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -exo pipefail
|
||||
|
||||
prefix=$(openssl rand -hex 8)
|
||||
|
||||
inner_distance.py \
|
||||
-i $par_input \
|
||||
-r $par_refgene \
|
||||
-o $prefix \
|
||||
-k $par_sample_size \
|
||||
-l $par_lower_bound_size \
|
||||
-u $par_upper_bound_size \
|
||||
-s $par_step_size \
|
||||
-q $par_map_qual \
|
||||
> stdout.txt
|
||||
|
||||
head -n 2 stdout.txt > $par_output_stats
|
||||
|
||||
[[ -f "$prefix.inner_distance.txt" ]] && mv $prefix.inner_distance.txt $par_output_dist
|
||||
[[ -f "$prefix.inner_distance_plot.pdf" ]] && mv $prefix.inner_distance_plot.pdf $par_output_plot
|
||||
[[ -f "$prefix.inner_distance_plot.r" ]] && mv $prefix.inner_distance_plot.r $par_output_plot_r
|
||||
[[ -f "$prefix.inner_distance_freq.txt" ]] && mv $prefix.inner_distance_freq.txt $par_output_freq
|
||||
43
src/rseqc/rseqc_innerdistance/test.sh
Normal file
43
src/rseqc/rseqc_innerdistance/test.sh
Normal file
@@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
|
||||
gunzip "$meta_resources_dir/hg19_RefSeq.bed.gz"
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
|
||||
input_bed="$meta_resources_dir/test.bed12"
|
||||
|
||||
output_stats="inner_distance_stats.txt"
|
||||
output_dist="inner_distance.txt"
|
||||
output_plot="inner_distance_plot.pdf"
|
||||
output_plot_r="inner_distance_plot.r"
|
||||
output_freq="inner_distance_freq.txt"
|
||||
|
||||
# Run executable
|
||||
echo "> Running $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--input $input_bam \
|
||||
--refgene $input_bed \
|
||||
--output_stats $output_stats \
|
||||
--output_dist $output_dist \
|
||||
--output_plot $output_plot \
|
||||
--output_plot_r $output_plot_r \
|
||||
--output_freq $output_freq
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> asserting output has been created for paired read input"
|
||||
|
||||
[ ! -f "$output_stats" ] && echo "$output_stats was not created" && exit 1
|
||||
[ ! -s "$output_stats" ] && echo "$output_stats is empty" && exit 1
|
||||
[ ! -f "$output_dist" ] && echo "$output_dist was not created" && exit 1
|
||||
[ ! -s "$output_dist" ] && echo "$output_dist is empty" && exit 1
|
||||
[ ! -f "$output_plot" ] && echo "$output_plot was not created" && exit 1
|
||||
[ ! -s "$output_plot" ] && echo "$output_plot is empty" && exit 1
|
||||
[ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
|
||||
[ ! -s "$output_plot_r" ] && echo "$output_plot_r is empty" && exit 1
|
||||
[ ! -f "$output_freq" ] && echo "$output_freq was not created" && exit 1
|
||||
[ ! -s "$output_freq" ] && echo "$output_freq is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
108
src/rseqc/rseqc_junctionannotation/config.vsh.yaml
Normal file
108
src/rseqc/rseqc_junctionannotation/config.vsh.yaml
Normal file
@@ -0,0 +1,108 @@
|
||||
name: "rseqc_junctionannotation"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/junctionannotation/main.nf]
|
||||
last_sha:
|
||||
description: |
|
||||
Compare detected splice junctions to reference gene model.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--refgene"
|
||||
type: file
|
||||
required: true
|
||||
description: Reference gene model in bed format
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "--min_intron"
|
||||
type: integer
|
||||
required: false
|
||||
default: 50
|
||||
min: 1
|
||||
description: Minimum intron length (bp), default = 50.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_log"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_annotation.log
|
||||
description: output log of junction annotation script
|
||||
|
||||
- name: "--output_plot_r"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_annotation_plot.r
|
||||
description: r script to generate splice_junction and splice_events plot
|
||||
|
||||
- name: "--output_junction_bed"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_annotation.bed
|
||||
description: junction annotation file (bed format)
|
||||
|
||||
- name: "--output_junction_interact"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_annotation.Interact.bed
|
||||
description: interact file (bed format) of junctions. Can be uploaded to UCSC genome browser or converted to bigInteract (using bedToBigBed program) for visualization.
|
||||
|
||||
- name: "--output_junction_sheet"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_annotation.xls
|
||||
description: junction annotation file (xls format)
|
||||
|
||||
- name: "--output_splice_events_plot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.splice_events.pdf
|
||||
description: plot of splice events (pdf)
|
||||
|
||||
- name: "--output_splice_junctions_plot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.splice_junctions_plot.pdf
|
||||
description: plot of junctions (pdf)
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/test.bed12
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ python3-pip, r-base]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
20
src/rseqc/rseqc_junctionannotation/script.sh
Normal file
20
src/rseqc/rseqc_junctionannotation/script.sh
Normal file
@@ -0,0 +1,20 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
prefix=$(openssl rand -hex 8)
|
||||
input="testData/unit_test_resources/test.paired_end.sorted.bam"
|
||||
refgene="testData/unit_test_resources/test.bed"
|
||||
junction_annotation.py \
|
||||
-i $par_input \
|
||||
-r $par_refgene \
|
||||
-o $prefix \
|
||||
-m $par_min_intron \
|
||||
-q $par_map_qual > $par_output_log
|
||||
|
||||
[[ -f "$prefix.junction.bed" ]] && mv $prefix.junction.bed $par_output_junction_bed
|
||||
[[ -f "$prefix.junction.Interact.bed" ]] && mv $prefix.junction.Interact.bed $par_output_junction_interact
|
||||
[[ -f "$prefix.junction.xls" ]] && mv $prefix.junction.xls $par_output_junction_sheet
|
||||
[[ -f "$prefix.junction_plot.r" ]] && mv $prefix.junction_plot.r $par_output_plot_r
|
||||
[[ -f "$prefix.splice_events.pdf" ]] && mv $prefix.splice_events.pdf $par_output_splice_events_plot
|
||||
[[ -f "$prefix.splice_junction.pdf" ]] && mv $prefix.splice_junction.pdf $par_output_splice_junctions_plot
|
||||
48
src/rseqc/rseqc_junctionannotation/test.sh
Normal file
48
src/rseqc/rseqc_junctionannotation/test.sh
Normal file
@@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
|
||||
input_bed="$meta_resources_dir/test.bed12"
|
||||
|
||||
output_junction_bed="junction_annotation.bed"
|
||||
output_junction_interact="junction_annotation.Interact.bed"
|
||||
output_junction_sheet="junction_annotation.xls"
|
||||
output_plot_r="junction_annotation_plot.r"
|
||||
output_splice_events_plot="splice_events.pdf"
|
||||
output_splice_junctions_plot="splice_junctions_plot.pdf"
|
||||
output_log="junction_annotation.log"
|
||||
|
||||
# run executable and test
|
||||
echo "> Running $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--refgene "$input_bed" \
|
||||
--output_log "$output_log" \
|
||||
--output_plot_r "$output_plot_r" \
|
||||
--output_junction_bed "$output_junction_bed" \
|
||||
--output_junction_interact "$output_junction_interact" \
|
||||
--output_junction_sheet "$output_junction_sheet" \
|
||||
--output_splice_events_plot "$output_splice_events_plot" \
|
||||
--output_splice_junctions_plot "$output_splice_junctions_plot"
|
||||
|
||||
# exit_code=$?
|
||||
# [[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Check if all output files were created"
|
||||
|
||||
[ ! -f "$output_log" ] && echo "$output_log was not created" && exit 1
|
||||
[ ! -f "$output_junction_sheet" ] && echo "$output_junction_sheet was not created" && exit 1
|
||||
[ -s "$output_junction_sheet" ] && echo "$output_junction_sheet is not empty but should be" && exit 1
|
||||
[ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
|
||||
[ -s "$output_plot_r" ] && echo "$output_plot_r is not empty but should be" && exit 1
|
||||
# [ ! -f "$output_junction_bed" ] && echo "$output_junction_bed was not created" && exit 1
|
||||
# [ ! -s "$output_junction_bed" ] && echo "$output_junction_bed is empty" && exit 1
|
||||
# [ ! -f "$output_junction_interact" ] && echo "$output_junction_interact was not created" && exit 1
|
||||
# [ ! -s "$output_junction_interact" ] && echo "$output_junction_interact is empty" && exit 1
|
||||
# [ ! -f "$output_splice_events_plot" ] && echo "$output_splice_events_plot was not created" && exit 1
|
||||
# [ ! -s "$output_splice_events_plot" ] && echo "$output_splice_events_plot is empty" && exit 1
|
||||
# [ ! -f "$output_splice_junctions_plot" ] && echo "$output_splice_junctions_plot was not created" && exit 1
|
||||
# [ ! -s "$output_splice_junctions_plot" ] && echo "$output_splice_junctions_plot is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
105
src/rseqc/rseqc_junctionsaturation/config.vsh.yaml
Normal file
105
src/rseqc/rseqc_junctionsaturation/config.vsh.yaml
Normal file
@@ -0,0 +1,105 @@
|
||||
name: "rseqc_junctionsaturation"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/junctionsaturation/main.nf]
|
||||
last_sha:
|
||||
description: |
|
||||
Compare detected splice junctions to reference gene model.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--refgene"
|
||||
type: file
|
||||
required: true
|
||||
description: Reference gene model in bed format
|
||||
|
||||
- name: "--sampling_percentile_lower_bound"
|
||||
type: integer
|
||||
required: false
|
||||
default: 5
|
||||
description: Sampling starts from this percentile, must be an integer between 0 and 100, default =5.
|
||||
min: 0
|
||||
max: 100
|
||||
|
||||
- name: "--sampling_percentile_upper_bound"
|
||||
type: integer
|
||||
required: false
|
||||
default: 100
|
||||
description: Sampling ends at this percentile, must be an integer between 0 and 100, default =5.
|
||||
min: 0
|
||||
max: 100
|
||||
|
||||
- name: "--sampling_percentile_step"
|
||||
type: integer
|
||||
required: false
|
||||
default: 5
|
||||
description: Sampling frequency in %. Smaller value means more sampling times. Must be an integer between 0 and 100, default = 5.
|
||||
min: 0
|
||||
max: 100
|
||||
|
||||
- name: "--min_intron"
|
||||
type: integer
|
||||
required: false
|
||||
default: 50
|
||||
min: 1
|
||||
description: Minimum intron length (bp), default = 50.
|
||||
|
||||
- name: "--min_splice_read"
|
||||
type: integer
|
||||
required: false
|
||||
default: 1
|
||||
min: 1
|
||||
description: Minimum number of supporting reads to call a junction, default = 1.
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_plot_r"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_saturation_plot.r
|
||||
description: r script to generate junction_saturation_plot plot
|
||||
|
||||
- name: "--output_plot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.junction_saturation_plot.pdf
|
||||
description: plot of junction saturation (pdf)
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/test.bed
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ python3-pip, r-base]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
19
src/rseqc/rseqc_junctionsaturation/script.sh
Normal file
19
src/rseqc/rseqc_junctionsaturation/script.sh
Normal file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
prefix=$(openssl rand -hex 8)
|
||||
|
||||
junction_saturation.py \
|
||||
-i $par_input \
|
||||
-r $par_refgene \
|
||||
-o $prefix \
|
||||
-l $par_sampling_percentile_lower_bound \
|
||||
-u $par_sampling_percentile_upper_bound \
|
||||
-s $par_sampling_percentile_step \
|
||||
-m $par_min_intron \
|
||||
-v $par_min_splice_read \
|
||||
-q $par_map_qual
|
||||
|
||||
[[ -f "$prefix.junctionSaturation_plot.pdf" ]] && mv $prefix.junctionSaturation_plot.pdf $par_output_plot
|
||||
[[ -f "$prefix.junctionSaturation_plot.r" ]] && mv $prefix.junctionSaturation_plot.r $par_output_plot_r
|
||||
30
src/rseqc/rseqc_junctionsaturation/test.sh
Normal file
30
src/rseqc/rseqc_junctionsaturation/test.sh
Normal file
@@ -0,0 +1,30 @@
|
||||
#!/bin/bash
|
||||
|
||||
gunzip "$meta_resources_dir/hg19_RefSeq.bed.gz"
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
|
||||
input_bed="$meta_resources_dir/test.bed"
|
||||
|
||||
output_plot="junction_saturation_plot.pdf"
|
||||
output_plot_r="junction_saturation_plot.r"
|
||||
|
||||
# run executable and test
|
||||
echo "> Running $meta_functionality_name"
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--refgene "$input_bed" \
|
||||
--output_plot_r "$output_plot_r" \
|
||||
--output_plot "$output_plot"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> asserting all output files were created"
|
||||
|
||||
[ ! -f "$output_plot_r" ] && echo "$output_plot_r was not created" && exit 1
|
||||
[ ! -s "$output_plot_r" ] && echo "$output_plot_r is empty" && exit 1
|
||||
[ ! -f "$output_plot" ] && echo "$output_plot was not created" && exit 1
|
||||
[ ! -s "$output_plot" ] && echo "$output_plot is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
52
src/rseqc/rseqc_readdistribution/config.vsh.yaml
Normal file
52
src/rseqc/rseqc_readdistribution/config.vsh.yaml
Normal file
@@ -0,0 +1,52 @@
|
||||
name: "rseqc_readdistribution"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/readdistribution/main.nf]
|
||||
last_sha:
|
||||
description: |
|
||||
Calculate how mapped reads are distributed over genomic features.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--refgene"
|
||||
type: file
|
||||
required: true
|
||||
description: Reference gene model in bed format
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.read_distribution.txt
|
||||
description: output file (txt) of read distribution analysis.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
- path: /testData/unit_test_resources/sarscov2/test.bed12
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [ python3-pip ]
|
||||
- type: python
|
||||
packages: [ RSeQC ]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
8
src/rseqc/rseqc_readdistribution/script.sh
Normal file
8
src/rseqc/rseqc_readdistribution/script.sh
Normal file
@@ -0,0 +1,8 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
read_distribution.py \
|
||||
-i $par_input \
|
||||
-r $par_refgene \
|
||||
> $par_output
|
||||
24
src/rseqc/rseqc_readdistribution/test.sh
Normal file
24
src/rseqc/rseqc_readdistribution/test.sh
Normal file
@@ -0,0 +1,24 @@
|
||||
#!/bin/bash
|
||||
|
||||
|
||||
# define input and output for script
|
||||
input_bam="$meta_resources_dir/test.paired_end.sorted.bam"
|
||||
input_bed="$meta_resources_dir/test.bed12"
|
||||
output="read_distribution.txt"
|
||||
|
||||
# run executable and test
|
||||
echo "> Running $meta_functionality_name"
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$input_bam" \
|
||||
--refgene "$input_bed" \
|
||||
--output "$output"
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
echo ">> Asserting output file was created"
|
||||
[ ! -f "$output" ] && echo "$output was not created" && exit 1
|
||||
[ ! -f "$output" ] && echo "$output is empty" && exit 1
|
||||
|
||||
exit 0
|
||||
82
src/rseqc/rseqc_readduplication/config.vsh.yaml
Normal file
82
src/rseqc/rseqc_readduplication/config.vsh.yaml
Normal file
@@ -0,0 +1,82 @@
|
||||
name: "rseqc_readduplication"
|
||||
namespace: "rseqc"
|
||||
info:
|
||||
migration_info:
|
||||
git_repo: https://github.com/nf-core/rnaseq.git
|
||||
paths: [modules/nf-core/rseqc/readduplication/main.nf]
|
||||
last_sha:
|
||||
description: |
|
||||
Calculate read duplication rate.
|
||||
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
required: true
|
||||
description: input alignment file in BAM or SAM format
|
||||
|
||||
- name: "--read_count_upper_limit"
|
||||
type: integer
|
||||
required: false
|
||||
default: 500
|
||||
description: Upper limit of reads' occurence. Only used for plotting, default = 500 (times).
|
||||
min: 1
|
||||
|
||||
- name: "--map_qual"
|
||||
type: integer
|
||||
required: false
|
||||
default: 30
|
||||
description: Minimum mapping quality (phred scaled) to determine uniquely mapped reads, default=30.
|
||||
min: 0
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: "--output_duplication_rate_plot_r"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.duplication_rate_plot.r
|
||||
description: R script for generating duplication rate plot
|
||||
|
||||
- name: "--output_duplication_rate_plot"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.duplication_rate_plot.pdf
|
||||
description: duplication rate plot (pdf)
|
||||
|
||||
- name: "--output_duplication_rate_mapping"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.duplication_rate_mapping.xls
|
||||
description: Summary of mapping-based read duplication
|
||||
|
||||
- name: "--output_duplication_rate_sequence"
|
||||
type: file
|
||||
direction: output
|
||||
required: false
|
||||
default: $id.duplication_rate_sequencing.xls
|
||||
description: Summary of sequencing-based read duplication
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: /testData/unit_test_resources/sarscov2/test.paired_end.sorted.bam
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: ubuntu:22.04
|
||||
setup:
|
||||
- type: "apt"
|
||||
packages: [python3-pip, r-base]
|
||||
- type: python
|
||||
packages: [RSeQC]
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user