Build branch qualimap with version qualimap (28cd122)
Build pipeline: viash-hub.biobox.qualimap-6tqq7
Source commit: 28cd122935
Source message: Merge branch 'main' into qualimap
This commit is contained in:
62
CHANGELOG.md
62
CHANGELOG.md
@@ -1,18 +1,29 @@
|
||||
# biobox x.x.x
|
||||
|
||||
## BUG FIXES
|
||||
## BREAKING CHANGES
|
||||
|
||||
* `pear`: fix component not exiting with the correct exitcode when PEAR fails.
|
||||
* `star/star_align_reads`: Change all arguments from `--camelCase` to `--snake_case` (PR #62).
|
||||
|
||||
* `cutadapt`: fix `--par_quality_cutoff_r2` argument.
|
||||
* `star/star_genome_generate`: Change all arguments from `--camelCase` to `--snake_case` (PR #62).
|
||||
|
||||
* `cutadapt`: demultiplexing is now disabled by default. It can be re-enabled by using `demultiplex_mode`.
|
||||
## NEW FUNCTIONALITY
|
||||
|
||||
* `multiqc`: update multiple separator to `;` (PR #81).
|
||||
* `star/star_align_reads`: Add star solo related arguments (PR #62).
|
||||
|
||||
* `bd_rhapsody/bd_rhapsody_make_reference`: Create a reference for the BD Rhapsody pipeline (PR #75).
|
||||
|
||||
* `umitools/umitools_dedup`: Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read (PR #54).
|
||||
|
||||
* `seqtk`:
|
||||
- `seqtk/seqtk_sample`: Subsamples sequences from FASTA/Q files (PR #68).
|
||||
- `seqtk/seqtk_subseq`: Extract the sequences (complete or subsequence) from the FASTA/FASTQ files
|
||||
based on a provided sequence IDs or region coordinates file (PR #85).
|
||||
|
||||
* `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
|
||||
|
||||
## MINOR CHANGES
|
||||
|
||||
* `busco` components: update BUSCO to `5.7.1`.
|
||||
* `busco` components: update BUSCO to `5.7.1` (PR #72).
|
||||
|
||||
## NEW FEATURES
|
||||
|
||||
@@ -20,12 +31,36 @@
|
||||
|
||||
# biobox 0.1.0
|
||||
|
||||
## BREAKING CHANGES
|
||||
* Update CI to reusable workflow in `viash-io/viash-actions` (PR #86).
|
||||
|
||||
* Change default `multiple_sep` to `;` (PR #25). This aligns with an upcoming breaking change in
|
||||
Viash 0.9.0 in order to avoid issues with the current default separator `:` unintentionally
|
||||
splitting up certain file paths.
|
||||
## DOCUMENTATION
|
||||
|
||||
* Extend the contributing guidelines (PR #82):
|
||||
|
||||
- Update format to Viash 0.9.
|
||||
|
||||
- Descriptions should be formatted in markdown.
|
||||
|
||||
- Add defaults to descriptions, not as a default of the argument.
|
||||
|
||||
- Explain parameter expansion.
|
||||
|
||||
- Mention that the contents of the output of components in tests should be checked.
|
||||
|
||||
* Add authorship to existing components (PR #88).
|
||||
|
||||
## BUG FIXES
|
||||
|
||||
* `pear`: fix component not exiting with the correct exitcode when PEAR fails (PR #70).
|
||||
|
||||
* `cutadapt`: fix `--par_quality_cutoff_r2` argument (PR #69).
|
||||
|
||||
* `cutadapt`: demultiplexing is now disabled by default. It can be re-enabled by using `demultiplex_mode` (PR #69).
|
||||
|
||||
* `multiqc`: update multiple separator to `;` (PR #81).
|
||||
|
||||
|
||||
# biobox 0.1.0
|
||||
|
||||
## NEW FEATURES
|
||||
|
||||
@@ -74,12 +109,11 @@
|
||||
- `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTQ (PR #52).
|
||||
- `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTA (PR #53).
|
||||
|
||||
* `umi_tools`:
|
||||
-`umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).
|
||||
|
||||
* `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).
|
||||
|
||||
* `umitools`:
|
||||
- `umitools_dedup`: Deduplicate reads based on the mapping co-ordinate and the UMI attached to the read (PR #54).
|
||||
|
||||
* `bedtools`:
|
||||
- `bedtools_getfasta`: extract sequences from a FASTA file for each of the
|
||||
intervals defined in a BED/GFF/VCF file (PR #59).
|
||||
@@ -104,4 +138,4 @@
|
||||
|
||||
* Add escaping character before leading hashtag in the description field of the config file (PR #50).
|
||||
|
||||
* Format URL in biobase/bcl_convert description (PR #55).
|
||||
* Format URL in biobase/bcl_convert description (PR #55).
|
||||
|
||||
157
CONTRIBUTING.md
157
CONTRIBUTING.md
@@ -65,22 +65,21 @@ runners:
|
||||
Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
|
||||
|
||||
```yaml
|
||||
functionality:
|
||||
name: arriba
|
||||
description: Detect gene fusions from RNA-Seq data
|
||||
keywords: [Gene fusion, RNA-Seq]
|
||||
links:
|
||||
homepage: https://arriba.readthedocs.io/en/latest/
|
||||
documentation: https://arriba.readthedocs.io/en/latest/
|
||||
repository: https://github.com/suhrig/arriba
|
||||
issue_tracker: https://github.com/suhrig/arriba/issues
|
||||
references:
|
||||
doi: 10.1101/gr.257246.119
|
||||
bibtex: |
|
||||
@article{
|
||||
... a bibtex entry in case the doi is not available ...
|
||||
}
|
||||
license: MIT
|
||||
name: arriba
|
||||
description: Detect gene fusions from RNA-Seq data
|
||||
keywords: [Gene fusion, RNA-Seq]
|
||||
links:
|
||||
homepage: https://arriba.readthedocs.io/en/latest/
|
||||
documentation: https://arriba.readthedocs.io/en/latest/
|
||||
repository: https://github.com/suhrig/arriba
|
||||
issue_tracker: https://github.com/suhrig/arriba/issues
|
||||
references:
|
||||
doi: 10.1101/gr.257246.119
|
||||
bibtex: |
|
||||
@article{
|
||||
... a bibtex entry in case the doi is not available ...
|
||||
}
|
||||
license: MIT
|
||||
```
|
||||
|
||||
### Step 4: Find a suitable container
|
||||
@@ -162,7 +161,7 @@ argument_groups:
|
||||
type: file
|
||||
description: |
|
||||
File in SAM/BAM/CRAM format with main alignments as generated by STAR
|
||||
(Aligned.out.sam). Arriba extracts candidate reads from this file.
|
||||
(`Aligned.out.sam`). Arriba extracts candidate reads from this file.
|
||||
required: true
|
||||
example: Aligned.out.bam
|
||||
```
|
||||
@@ -175,7 +174,7 @@ Several notes:
|
||||
|
||||
* Input arguments can have `multiple: true` to allow the user to specify multiple files.
|
||||
|
||||
|
||||
* The description should be formatted in markdown.
|
||||
|
||||
### Step 8: Add arguments for the output files
|
||||
|
||||
@@ -220,7 +219,7 @@ argument_groups:
|
||||
|
||||
Note:
|
||||
|
||||
* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
|
||||
* Preferably, these outputs should not be directories but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
|
||||
|
||||
### Step 9: Add arguments for the other arguments
|
||||
|
||||
@@ -230,6 +229,8 @@ Finally, add all other arguments to the config file. There are a few exceptions:
|
||||
|
||||
* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.
|
||||
|
||||
* If the help lists defaults, do not add them as defaults but to the description. Example: `description: <Explanation of parameter>. Default: 10.`
|
||||
|
||||
|
||||
### Step 10: Add a Docker engine
|
||||
|
||||
@@ -275,10 +276,13 @@ Next, we need to write a runner script that runs the tool with the input argumen
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
# unset flags
|
||||
[[ "$par_option" == "false" ]] && unset par_option
|
||||
|
||||
xxx \
|
||||
--input "$par_input" \
|
||||
--output "$par_output" \
|
||||
$([ "$par_option" = "true" ] && echo "--option")
|
||||
${par_option:+--option}
|
||||
```
|
||||
|
||||
When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
|
||||
@@ -291,6 +295,11 @@ As an example, this is what the Bash script for the `arriba` component looks lik
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
# unset flags
|
||||
[[ "$par_skip_duplicate_marking" == "false" ]] && unset par_skip_duplicate_marking
|
||||
[[ "$par_extra_information" == "false" ]] && unset par_extra_information
|
||||
[[ "$par_fill_gaps" == "false" ]] && unset par_fill_gaps
|
||||
|
||||
arriba \
|
||||
-x "$par_bam" \
|
||||
-a "$par_genome" \
|
||||
@@ -298,26 +307,30 @@ arriba \
|
||||
-o "$par_fusions" \
|
||||
${par_known_fusions:+-k "${par_known_fusions}"} \
|
||||
${par_blacklist:+-b "${par_blacklist}"} \
|
||||
${par_structural_variants:+-d "${par_structural_variants}"} \
|
||||
$([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
|
||||
$([ "$par_extra_information" = "true" ] && echo "-X") \
|
||||
$([ "$par_fill_gaps" = "true" ] && echo "-I")
|
||||
# ...
|
||||
${par_extra_information:+-X} \
|
||||
${par_fill_gaps:+-I}
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
* If your arguments can contain special variables (e.g. `$`), you can use quoting (need to find a documentation page for this) to make sure you can use the string as input. Example: `-x ${par_bam@Q}`.
|
||||
|
||||
* Optional arguments can be passed to the command conditionally using Bash [parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html). For example: `${par_known_fusions:+-k ${par_known_fusions@Q}}`
|
||||
|
||||
* If your tool allows for multiple inputs using a separator other than `;` (which is the default Viash multiple separator), you can substitute these values with a command like: `par_disable_filters=$(echo $par_disable_filters | tr ';' ',')`.
|
||||
|
||||
|
||||
### Step 12: Create test script
|
||||
|
||||
|
||||
If the unit test requires test resources, these should be provided in the `test_resources` section of the component.
|
||||
|
||||
```yaml
|
||||
functionality:
|
||||
# ...
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
```
|
||||
|
||||
Create a test script at `src/xxx/test.sh` that runs the component with the test data. This script should run the component (available with `$meta_executable`) with the test data and check if the output is as expected. The script should exit with a non-zero exit code if the output is not as expected. For example:
|
||||
@@ -325,48 +338,64 @@ Create a test script at `src/xxx/test.sh` that runs the component with the test
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
set -e
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
echo "> Run xxx with test data"
|
||||
#############################################
|
||||
# helper functions
|
||||
assert_file_exists() {
|
||||
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
|
||||
}
|
||||
assert_file_doesnt_exist() {
|
||||
[ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_empty() {
|
||||
[ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
|
||||
}
|
||||
assert_file_not_empty() {
|
||||
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
|
||||
}
|
||||
assert_file_contains() {
|
||||
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains() {
|
||||
grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_contains_regex() {
|
||||
grep -q -E "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains_regex() {
|
||||
grep -q -E "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
#############################################
|
||||
|
||||
echo "> Run $meta_name with test data"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/input.txt" \
|
||||
--input "$meta_resources_dir/test_data/reads_R1.fastq" \
|
||||
--output "output.txt" \
|
||||
--option
|
||||
|
||||
echo ">> Checking output"
|
||||
[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
|
||||
```
|
||||
|
||||
|
||||
For example, this is what the test script for the `arriba` component looks like:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
echo "> Run arriba with blacklist"
|
||||
"$meta_executable" \
|
||||
--bam "$meta_resources_dir/test_data/A.bam" \
|
||||
--genome "$meta_resources_dir/test_data/genome.fasta" \
|
||||
--gene_annotation "$meta_resources_dir/test_data/annotation.gtf" \
|
||||
--blacklist "$meta_resources_dir/test_data/blacklist.tsv" \
|
||||
--fusions "fusions.tsv" \
|
||||
--fusions_discarded "fusions_discarded.tsv" \
|
||||
--interesting_contigs "1,2"
|
||||
|
||||
echo ">> Checking output"
|
||||
[ ! -f "fusions.tsv" ] && echo "Output file fusions.tsv does not exist" && exit 1
|
||||
[ ! -f "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv does not exist" && exit 1
|
||||
echo ">> Check if output exists"
|
||||
assert_file_exists "output.txt"
|
||||
|
||||
echo ">> Check if output is empty"
|
||||
[ ! -s "fusions.tsv" ] && echo "Output file fusions.tsv is empty" && exit 1
|
||||
[ ! -s "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv is empty" && exit 1
|
||||
assert_file_not_empty "output.txt"
|
||||
|
||||
echo ">> Check if output is correct"
|
||||
assert_file_contains "output.txt" "some expected output"
|
||||
|
||||
echo "> All tests succeeded!"
|
||||
```
|
||||
|
||||
### Step 12: Create a `/var/software_versions.txt` file
|
||||
Notes:
|
||||
|
||||
* Do always check the contents of the output file. If the output is not deterministic, you can use regular expressions to check the output.
|
||||
|
||||
* If possible, generate your own test data instead of copying it from an external resource.
|
||||
|
||||
### Step 13: Create a `/var/software_versions.txt` file
|
||||
|
||||
For the sake of transparency and reproducibility, we require that the versions of the software used in the component are documented.
|
||||
|
||||
@@ -378,6 +407,8 @@ engines:
|
||||
image: quay.io/biocontainers/xxx:0.1.0--py_0
|
||||
setup:
|
||||
- type: docker
|
||||
# note: /var/software_versions.txt should contain:
|
||||
# arriba: "2.4.0"
|
||||
run: |
|
||||
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
|
||||
```
|
||||
|
||||
14
src/_authors/angela_o_pisco.yaml
Normal file
14
src/_authors/angela_o_pisco.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
name: Angela Oliveira Pisco
|
||||
info:
|
||||
role: Contributor
|
||||
links:
|
||||
github: aopisco
|
||||
orcid: "0000-0003-0142-2355"
|
||||
linkedin: aopisco
|
||||
organizations:
|
||||
- name: Insitro
|
||||
href: https://insitro.com
|
||||
role: Director of Computational Biology
|
||||
- name: Open Problems
|
||||
href: https://openproblems.bio
|
||||
role: Core Member
|
||||
10
src/_authors/dorien_roosen.yaml
Normal file
10
src/_authors/dorien_roosen.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Dorien Roosen
|
||||
info:
|
||||
links:
|
||||
email: dorien@data-intuitive.com
|
||||
github: dorien-er
|
||||
linkedin: dorien-roosen
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist
|
||||
11
src/_authors/dries_schaumont.yaml
Normal file
11
src/_authors/dries_schaumont.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
name: Dries Schaumont
|
||||
info:
|
||||
links:
|
||||
email: dries@data-intuitive.com
|
||||
github: DriesSchaumont
|
||||
orcid: "0000-0002-4389-0440"
|
||||
linkedin: dries-schaumont
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist
|
||||
10
src/_authors/emma_rousseau.yaml
Normal file
10
src/_authors/emma_rousseau.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Emma Rousseau
|
||||
info:
|
||||
links:
|
||||
email: emma@data-intuitive.com
|
||||
github: emmarousseau
|
||||
linkedin: emmarousseau1
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatician
|
||||
10
src/_authors/jakub_majercik.yaml
Normal file
10
src/_authors/jakub_majercik.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Jakub Majercik
|
||||
info:
|
||||
links:
|
||||
email: jakub@data-intuitive.com
|
||||
github: jakubmajercik
|
||||
linkedin: jakubmajercik
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatics Engineer
|
||||
14
src/_authors/kai_waldrant.yaml
Normal file
14
src/_authors/kai_waldrant.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
name: Kai Waldrant
|
||||
info:
|
||||
links:
|
||||
email: kai@data-intuitive.com
|
||||
github: KaiWaldrant
|
||||
orcid: "0009-0003-8555-1361"
|
||||
linkedin: kaiwaldrant
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatician
|
||||
- name: Open Problems
|
||||
href: https://openproblems.bio
|
||||
role: Contributor
|
||||
10
src/_authors/leila_paquay.yaml
Normal file
10
src/_authors/leila_paquay.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Leïla Paquay
|
||||
info:
|
||||
links:
|
||||
email: leila@data-intuitive.com
|
||||
github: Leila011
|
||||
linkedin: leilapaquay
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Software Developer
|
||||
14
src/_authors/robrecht_cannoodt.yaml
Normal file
14
src/_authors/robrecht_cannoodt.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
name: Robrecht Cannoodt
|
||||
info:
|
||||
links:
|
||||
email: robrecht@data-intuitive.com
|
||||
github: rcannood
|
||||
orcid: "0000-0003-3641-729X"
|
||||
linkedin: robrechtcannoodt
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Science Engineer
|
||||
- name: Open Problems
|
||||
href: https://openproblems.bio
|
||||
role: Core Member
|
||||
10
src/_authors/sai_nirmayi_yasa.yaml
Normal file
10
src/_authors/sai_nirmayi_yasa.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Sai Nirmayi Yasa
|
||||
info:
|
||||
links:
|
||||
email: nirmayi@data-intuitive.com
|
||||
github: sainirmayi
|
||||
linkedin: sai-nirmayi-yasa
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Junior Bioinformatics Researcher
|
||||
10
src/_authors/theodoro_gasperin.yaml
Normal file
10
src/_authors/theodoro_gasperin.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
name: Theodoro Gasperin Terra Camargo
|
||||
info:
|
||||
links:
|
||||
email: theodorogtc@gmail.com
|
||||
github: tgaspe
|
||||
linkedin: theodoro-gasperin-terra-camargo
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Bioinformatician
|
||||
9
src/_authors/toni_verbeiren.yaml
Normal file
9
src/_authors/toni_verbeiren.yaml
Normal file
@@ -0,0 +1,9 @@
|
||||
name: Toni Verbeiren
|
||||
info:
|
||||
links:
|
||||
github: tverbeiren
|
||||
linkedin: verbeiren
|
||||
organizations:
|
||||
- name: Data Intuitive
|
||||
href: https://www.data-intuitive.com
|
||||
role: Data Scientist and CEO
|
||||
5
src/_authors/weiwei_schultz.yaml
Normal file
5
src/_authors/weiwei_schultz.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
name: Weiwei Schultz
|
||||
info:
|
||||
organizations:
|
||||
- name: Janssen R&D US
|
||||
role: Associate Director Data Sciences
|
||||
94
src/agat/agat_convert_sp_gff2gtf/config.vsh.yaml
Normal file
94
src/agat/agat_convert_sp_gff2gtf/config.vsh.yaml
Normal file
@@ -0,0 +1,94 @@
|
||||
name: agat_convert_sp_gff2gtf
|
||||
namespace: agat
|
||||
description: |
|
||||
The script aims to convert any GTF/GFF file into a proper GTF file. Full
|
||||
information about the format can be found here:
|
||||
https://agat.readthedocs.io/en/latest/gxf.html You can choose among 7
|
||||
different GTF types (1, 2, 2.1, 2.2, 2.5, 3 or relax). Depending the
|
||||
version selected the script will filter out the features that are not
|
||||
accepted. For GTF2.5 and 3, every level1 feature (e.g nc_gene
|
||||
pseudogene) will be converted into gene feature and every level2 feature
|
||||
(e.g mRNA ncRNA) will be converted into transcript feature. Using the
|
||||
"relax" option you will produce a GTF-like output keeping all original
|
||||
feature types (3rd column). No modification will occur e.g. mRNA to
|
||||
transcript.
|
||||
|
||||
To be fully GTF compliant all feature have a gene_id and a transcript_id
|
||||
attribute. The gene_id is unique identifier for the genomic source of
|
||||
the transcript, which is used to group transcripts into genes. The
|
||||
transcript_id is a unique identifier for the predicted transcript, which
|
||||
is used to group features into transcripts.
|
||||
keywords: [gene annotations, GTF conversion]
|
||||
links:
|
||||
homepage: https://github.com/NBISweden/AGAT
|
||||
documentation: https://agat.readthedocs.io/
|
||||
issue_tracker: https://github.com/NBISweden/AGAT/issues
|
||||
repository: https://github.com/NBISweden/AGAT
|
||||
references:
|
||||
doi: 10.5281/zenodo.3552717
|
||||
license: GPL-3.0
|
||||
authors:
|
||||
- __merge__: /src/_authors/leila_paquay.yaml
|
||||
roles: [ author, maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: --gff
|
||||
alternatives: [-i]
|
||||
description: Input GFF/GTF file that will be read
|
||||
type: file
|
||||
required: true
|
||||
direction: input
|
||||
example: input.gff
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: --output
|
||||
alternatives: [-o, --out, --outfile, --gtf]
|
||||
description: Output GTF file. If no output file is specified, the output will be written to STDOUT.
|
||||
type: file
|
||||
direction: output
|
||||
required: true
|
||||
example: output.gtf
|
||||
- name: Arguments
|
||||
arguments:
|
||||
- name: --gtf_version
|
||||
description: |
|
||||
Version of the GTF output (1,2,2.1,2.2,2.5,3 or relax). Default value from AGAT config file (relax for the default config). The script option has the higher priority.
|
||||
|
||||
* relax: all feature types are accepted.
|
||||
* GTF3 (9 feature types accepted): gene, transcript, exon, CDS, Selenocysteine, start_codon, stop_codon, three_prime_utr and five_prime_utr.
|
||||
* GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS, UTR, start_codon, stop_codon, Selenocysteine.
|
||||
* GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon, 5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon.
|
||||
* GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon, exon, 5UTR, 3UTR.
|
||||
* GTF2 (4 feature types accepted): CDS, start_codon, stop_codon, exon.
|
||||
* GTF1 (5 feature types accepted): CDS, start_codon, stop_codon, exon, intron.
|
||||
type: string
|
||||
choices: [relax, "1", "2", "2.1", "2.2", "2.5", "3"]
|
||||
required: false
|
||||
example: "3"
|
||||
- name: --config
|
||||
alternatives: [-c]
|
||||
description: |
|
||||
Input agat config file. By default AGAT takes as input agat_config.yaml file from the working directory if any, otherwise it takes the orignal agat_config.yaml shipped with AGAT. To get the agat_config.yaml locally type: "agat config --expose". The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
|
||||
type: file
|
||||
required: false
|
||||
example: custom_agat_config.yaml
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
102
src/agat/agat_convert_sp_gff2gtf/help.txt
Normal file
102
src/agat/agat_convert_sp_gff2gtf/help.txt
Normal file
@@ -0,0 +1,102 @@
|
||||
```sh
|
||||
agat_convert_sp_gff2gtf.pl --help
|
||||
```
|
||||
------------------------------------------------------------------------------
|
||||
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
|
||||
| https://github.com/NBISweden/AGAT |
|
||||
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
Name:
|
||||
agat_convert_sp_gff2gtf.pl
|
||||
|
||||
Description:
|
||||
The script aims to convert any GTF/GFF file into a proper GTF file. Full
|
||||
information about the format can be found here:
|
||||
https://agat.readthedocs.io/en/latest/gxf.html You can choose among 7
|
||||
different GTF types (1, 2, 2.1, 2.2, 2.5, 3 or relax). Depending the
|
||||
version selected the script will filter out the features that are not
|
||||
accepted. For GTF2.5 and 3, every level1 feature (e.g nc_gene
|
||||
pseudogene) will be converted into gene feature and every level2 feature
|
||||
(e.g mRNA ncRNA) will be converted into transcript feature. Using the
|
||||
"relax" option you will produce a GTF-like output keeping all original
|
||||
feature types (3rd column). No modification will occur e.g. mRNA to
|
||||
transcript.
|
||||
|
||||
To be fully GTF compliant all feature have a gene_id and a transcript_id
|
||||
attribute. The gene_id is unique identifier for the genomic source of
|
||||
the transcript, which is used to group transcripts into genes. The
|
||||
transcript_id is a unique identifier for the predicted transcript, which
|
||||
is used to group features into transcripts.
|
||||
|
||||
Usage:
|
||||
agat_convert_sp_gff2gtf.pl --gff infile.gff [ -o outfile ]
|
||||
agat_convert_sp_gff2gtf -h
|
||||
|
||||
Options:
|
||||
--gff, --gtf or -i
|
||||
Input GFF/GTF file that will be read
|
||||
|
||||
--gtf_version version of the GTF output (1,2,2.1,2.2,2.5,3 or relax).
|
||||
Default value from AGAT config file (relax for the default config). The
|
||||
script option has the higher priority.
|
||||
relax: all feature types are accepted.
|
||||
|
||||
GTF3 (9 feature types accepted): gene, transcript, exon, CDS,
|
||||
Selenocysteine, start_codon, stop_codon, three_prime_utr and
|
||||
five_prime_utr
|
||||
|
||||
GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS,
|
||||
UTR, start_codon, stop_codon, Selenocysteine
|
||||
|
||||
GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon,
|
||||
5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon
|
||||
|
||||
GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon,
|
||||
exon, 5UTR, 3UTR
|
||||
|
||||
GTF2 (4 feature types accepted): CDS, start_codon, stop_codon,
|
||||
exon
|
||||
|
||||
GTF1 (5 feature types accepted): CDS, start_codon, stop_codon,
|
||||
exon, intron
|
||||
|
||||
-o , --output , --out , --outfile or --gtf
|
||||
Output GTF file. If no output file is specified, the output will
|
||||
be written to STDOUT.
|
||||
|
||||
-c or --config
|
||||
String - Input agat config file. By default AGAT takes as input
|
||||
agat_config.yaml file from the working directory if any,
|
||||
otherwise it takes the orignal agat_config.yaml shipped with
|
||||
AGAT. To get the agat_config.yaml locally type: "agat config
|
||||
--expose". The --config option gives you the possibility to use
|
||||
your own AGAT config file (located elsewhere or named
|
||||
differently).
|
||||
|
||||
-h or --help
|
||||
Display this helpful text.
|
||||
|
||||
Feedback:
|
||||
Did you find a bug?:
|
||||
Do not hesitate to report bugs to help us keep track of the bugs and
|
||||
their resolution. Please use the GitHub issue tracking system available
|
||||
at this address:
|
||||
|
||||
https://github.com/NBISweden/AGAT/issues
|
||||
|
||||
Ensure that the bug was not already reported by searching under Issues.
|
||||
If you're unable to find an (open) issue addressing the problem, open a new one.
|
||||
Try as much as possible to include in the issue when relevant:
|
||||
- a clear description,
|
||||
- as much relevant information as possible,
|
||||
- the command used,
|
||||
- a data sample,
|
||||
- an explanation of the expected behaviour that is not occurring.
|
||||
|
||||
Do you want to contribute?:
|
||||
You are very welcome, visit this address for the Contributing
|
||||
guidelines:
|
||||
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
|
||||
|
||||
10
src/agat/agat_convert_sp_gff2gtf/script.sh
Normal file
10
src/agat/agat_convert_sp_gff2gtf/script.sh
Normal file
@@ -0,0 +1,10 @@
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
agat_convert_sp_gff2gtf.pl \
|
||||
-i "$par_gff" \
|
||||
-o "$par_output" \
|
||||
${par_gtf_version:+--gtf_version "${par_gtf_version}"} \
|
||||
${par_config:+--config "${par_config}"}
|
||||
37
src/agat/agat_convert_sp_gff2gtf/test.sh
Normal file
37
src/agat/agat_convert_sp_gff2gtf/test.sh
Normal file
@@ -0,0 +1,37 @@
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
test_dir="${meta_resources_dir}/test_data"
|
||||
|
||||
echo "> Run $meta_name with test data"
|
||||
"$meta_executable" \
|
||||
--gff "$test_dir/0_test.gff" \
|
||||
--output "output.gtf"
|
||||
|
||||
echo ">> Checking output"
|
||||
[ ! -f "output.gtf" ] && echo "Output file output.gtf does not exist" && exit 1
|
||||
|
||||
echo ">> Check if output is empty"
|
||||
[ ! -s "output.gtf" ] && echo "Output file output.gtf is empty" && exit 1
|
||||
|
||||
echo ">> Check if the conversion resulted in the right GTF format"
|
||||
idGFF=$(head -n 2 "$test_dir/0_test.gff" | grep -o 'ID=[^;]*' | cut -d '=' -f 2-)
|
||||
expectedGTF="gene_id \"$idGFF\"; ID \"$idGFF\";"
|
||||
extractedGTF=$(head -n 3 "output.gtf" | grep -o 'gene_id "[^"]*"; ID "[^"]*";')
|
||||
[ "$extractedGTF" != "$expectedGTF" ] && echo "Output file output.gtf does not have the right format" && exit 1
|
||||
|
||||
rm output.gtf
|
||||
|
||||
echo "> Run $meta_name with test data and GTF version 2.5"
|
||||
"$meta_executable" \
|
||||
--gff "$test_dir/0_test.gff" \
|
||||
--output "output.gtf" \
|
||||
--gtf_version "2.5"
|
||||
|
||||
echo ">> Check if the output file header display the right GTF version"
|
||||
grep -q "##gtf-version 2.5" "output.gtf"
|
||||
[ $? -ne 0 ] && echo "Output file output.gtf header does not display the right GTF version" && exit 1
|
||||
|
||||
echo "> Test successful"
|
||||
36
src/agat/agat_convert_sp_gff2gtf/test_data/0_test.gff
Normal file
36
src/agat/agat_convert_sp_gff2gtf/test_data/0_test.gff
Normal file
@@ -0,0 +1,36 @@
|
||||
##gff-version 3
|
||||
scaffold625 maker gene 337818 343277 . + . ID=CLUHARG00000005458;Name=TUBB3_2
|
||||
scaffold625 maker mRNA 337818 343277 . + . ID=CLUHART00000008717;Parent=CLUHARG00000005458
|
||||
scaffold625 maker exon 337818 337971 . + . ID=CLUHART00000008717:exon:1404;Parent=CLUHART00000008717
|
||||
scaffold625 maker exon 340733 340841 . + . ID=CLUHART00000008717:exon:1405;Parent=CLUHART00000008717
|
||||
scaffold625 maker exon 341518 341628 . + . ID=CLUHART00000008717:exon:1406;Parent=CLUHART00000008717
|
||||
scaffold625 maker exon 341964 343277 . + . ID=CLUHART00000008717:exon:1407;Parent=CLUHART00000008717
|
||||
scaffold625 maker CDS 337915 337971 . + 0 ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
|
||||
scaffold625 maker CDS 340733 340841 . + 0 ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
|
||||
scaffold625 maker CDS 341518 341628 . + 2 ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
|
||||
scaffold625 maker CDS 341964 343033 . + 2 ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
|
||||
scaffold625 maker five_prime_UTR 337818 337914 . + . ID=CLUHART00000008717:five_prime_utr;Parent=CLUHART00000008717
|
||||
scaffold625 maker three_prime_UTR 343034 343277 . + . ID=CLUHART00000008717:three_prime_utr;Parent=CLUHART00000008717
|
||||
scaffold789 maker gene 558184 564780 . + . ID=CLUHARG00000003852;Name=PF11_0240
|
||||
scaffold789 maker mRNA 558184 564780 . + . ID=CLUHART00000006146;Parent=CLUHARG00000003852
|
||||
scaffold789 maker exon 558184 560123 . + . ID=CLUHART00000006146:exon:995;Parent=CLUHART00000006146
|
||||
scaffold789 maker exon 561401 561519 . + . ID=CLUHART00000006146:exon:996;Parent=CLUHART00000006146
|
||||
scaffold789 maker exon 564171 564235 . + . ID=CLUHART00000006146:exon:997;Parent=CLUHART00000006146
|
||||
scaffold789 maker exon 564372 564780 . + . ID=CLUHART00000006146:exon:998;Parent=CLUHART00000006146
|
||||
scaffold789 maker CDS 558191 560123 . + 0 ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
|
||||
scaffold789 maker CDS 561401 561519 . + 2 ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
|
||||
scaffold789 maker CDS 564171 564235 . + 0 ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
|
||||
scaffold789 maker CDS 564372 564588 . + 1 ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
|
||||
scaffold789 maker five_prime_UTR 558184 558190 . + . ID=CLUHART00000006146:five_prime_utr;Parent=CLUHART00000006146
|
||||
scaffold789 maker three_prime_UTR 564589 564780 . + . ID=CLUHART00000006146:three_prime_utr;Parent=CLUHART00000006146
|
||||
scaffold789 maker mRNA 558184 564780 . + . ID=CLUHART00000006147;Parent=CLUHARG00000003852
|
||||
scaffold789 maker exon 558184 560123 . + . ID=CLUHART00000006147:exon:997;Parent=CLUHART00000006147
|
||||
scaffold789 maker exon 561401 561519 . + . ID=CLUHART00000006147:exon:998;Parent=CLUHART00000006147
|
||||
scaffold789 maker exon 562057 562121 . + . ID=CLUHART00000006147:exon:999;Parent=CLUHART00000006147
|
||||
scaffold789 maker exon 564372 564780 . + . ID=CLUHART00000006147:exon:1000;Parent=CLUHART00000006147
|
||||
scaffold789 maker CDS 558191 560123 . + 0 ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
|
||||
scaffold789 maker CDS 561401 561519 . + 2 ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
|
||||
scaffold789 maker CDS 562057 562121 . + 0 ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
|
||||
scaffold789 maker CDS 564372 564588 . + 1 ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
|
||||
scaffold789 maker five_prime_UTR 558184 558190 . + . ID=CLUHART00000006147:five_prime_utr;Parent=CLUHART00000006147
|
||||
scaffold789 maker three_prime_UTR 564589 564780 . + . ID=CLUHART00000006147:three_prime_utr;Parent=CLUHART00000006147
|
||||
9
src/agat/agat_convert_sp_gff2gtf/test_data/script.sh
Executable file
9
src/agat/agat_convert_sp_gff2gtf/test_data/script.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
|
||||
# clone repo
|
||||
if [ ! -d /tmp/agat_source ]; then
|
||||
git clone --depth 1 --single-branch --branch master https://github.com/NBISweden/AGAT /tmp/agat_source
|
||||
fi
|
||||
|
||||
# copy test data
|
||||
cp -r /tmp/agat_source/t/gff_syntax/in/0_test.gff src/agat/agat_convert_sp_gff2gtf/test_data
|
||||
@@ -11,6 +11,9 @@ license: MIT
|
||||
requirements:
|
||||
cpus: 1
|
||||
commands: [ arriba ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/robrecht_cannoodt.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -4,6 +4,17 @@ description: |
|
||||
Information about upgrading from bcl2fastq via
|
||||
[Upgrading from bcl2fastq to BCL Convert](https://emea.support.illumina.com/bulletins/2020/10/upgrading-from-bcl2fastq-to-bcl-convert.html)
|
||||
and [BCL Convert Compatible Products](https://support.illumina.com/sequencing/sequencing_software/bcl-convert/compatibility.html)
|
||||
keywords: [demultiplex, fastq, bcl, illumina]
|
||||
links:
|
||||
homepage: https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html
|
||||
documentation: https://support.illumina.com/downloads/bcl-convert-user-guide.html
|
||||
license: Proprietary
|
||||
authors:
|
||||
- __merge__: /src/_authors/toni_verbeiren.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/_authors/dorien_roosen.yaml
|
||||
roles: [ author ]
|
||||
|
||||
argument_groups:
|
||||
- name: Input arguments
|
||||
arguments:
|
||||
|
||||
143
src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
Normal file
143
src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
Normal file
@@ -0,0 +1,143 @@
|
||||
name: bd_rhapsody_make_reference
|
||||
namespace: bd_rhapsody
|
||||
description: |
|
||||
The Reference Files Generator creates an archive containing Genome Index
|
||||
and Transcriptome annotation files needed for the BD Rhapsody Sequencing
|
||||
Analysis Pipeline. The app takes as input one or more FASTA and GTF files
|
||||
and produces a compressed archive in the form of a tar.gz file. The
|
||||
archive contains:
|
||||
|
||||
- STAR index
|
||||
- Filtered GTF file
|
||||
keywords: [genome, reference, index, align]
|
||||
links:
|
||||
repository: https://bitbucket.org/CRSwDev/cwl/src/master/v2.2.1/Extra_Utilities/
|
||||
documentation: https://bd-rhapsody-bioinfo-docs.genomics.bd.com/resources/extra_utilities.html#make-rhapsody-reference
|
||||
license: Unknown
|
||||
authors:
|
||||
- __merge__: /src/_authors/robrecht_cannoodt.yaml
|
||||
roles: [ author, maintainer ]
|
||||
- __merge__: /src/_authors/weiwei_schultz.yaml
|
||||
roles: [ contributor ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- type: file
|
||||
name: --genome_fasta
|
||||
required: true
|
||||
description: Reference genome file in FASTA or FASTA.GZ format. The BD Rhapsody Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse.
|
||||
example: genome_sequence.fa.gz
|
||||
multiple: true
|
||||
info:
|
||||
config_key: Genome_fasta
|
||||
- type: file
|
||||
name: --gtf
|
||||
required: true
|
||||
description: |
|
||||
File path to the transcript annotation files in GTF or GTF.GZ format. The Sequence Analysis Pipeline requires the 'gene_name' or
|
||||
'gene_id' attribute to be set on each gene and exon feature. Gene and exon feature lines must have the same attribute, and exons
|
||||
must have a corresponding gene with the same value. For TCR/BCR assays, the TCR or BCR gene segments must have the 'gene_type' or
|
||||
'gene_biotype' attribute set, and the value should begin with 'TR' or 'IG', respectively.
|
||||
example: transcriptome_annotation.gtf.gz
|
||||
multiple: true
|
||||
info:
|
||||
config_key: Gtf
|
||||
- type: file
|
||||
name: --extra_sequences
|
||||
description: |
|
||||
File path to additional sequences in FASTA format to use when building the STAR index. (e.g. transgenes or CRISPR guide barcodes).
|
||||
GTF lines for these sequences will be automatically generated and combined with the main GTF.
|
||||
required: false
|
||||
multiple: true
|
||||
info:
|
||||
config_key: Extra_sequences
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- type: file
|
||||
name: --reference_archive
|
||||
direction: output
|
||||
required: true
|
||||
description: |
|
||||
A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an
|
||||
input in the BD Rhapsody Sequencing Analysis Pipeline.
|
||||
example: star_index.tar.gz
|
||||
- name: Arguments
|
||||
arguments:
|
||||
- type: string
|
||||
name: --mitochondrial_contigs
|
||||
description: |
|
||||
Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are
|
||||
identified as 'nuclear fragments' in the ATACseq analysis pipeline.
|
||||
required: false
|
||||
multiple: true
|
||||
default: [chrM, chrMT, M, MT]
|
||||
info:
|
||||
config_key: Mitochondrial_contigs
|
||||
- type: boolean_true
|
||||
name: --filtering_off
|
||||
description: |
|
||||
By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features
|
||||
having the following attribute values are kept:
|
||||
|
||||
- protein_coding
|
||||
- lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97)
|
||||
- IG_LV_gene
|
||||
- IG_V_gene
|
||||
- IG_V_pseudogene
|
||||
- IG_D_gene
|
||||
- IG_J_gene
|
||||
- IG_J_pseudogene
|
||||
- IG_C_gene
|
||||
- IG_C_pseudogene
|
||||
- TR_V_gene
|
||||
- TR_V_pseudogene
|
||||
- TR_D_gene
|
||||
- TR_J_gene
|
||||
- TR_J_pseudogene
|
||||
- TR_C_gene
|
||||
|
||||
If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.
|
||||
info:
|
||||
config_key: Filtering_off
|
||||
- type: boolean_true
|
||||
name: --wta_only_index
|
||||
description: Build a WTA only index, otherwise builds a WTA + ATAC index.
|
||||
info:
|
||||
config_key: Wta_Only
|
||||
- type: string
|
||||
name: --extra_star_params
|
||||
description: Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.
|
||||
example: --limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11
|
||||
required: false
|
||||
info:
|
||||
config_key: Extra_STAR_params
|
||||
|
||||
resources:
|
||||
- type: python_script
|
||||
path: script.py
|
||||
- path: make_rhap_reference_2.2.1_nodocker.cwl
|
||||
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- path: test_data
|
||||
|
||||
requirements:
|
||||
commands: [ "cwl-runner" ]
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: bdgenomics/rhapsody:2.2.1
|
||||
setup:
|
||||
- type: apt
|
||||
packages: [procps]
|
||||
- type: python
|
||||
packages: [cwlref-runner, cwl-runner]
|
||||
- type: docker
|
||||
run: |
|
||||
echo "bdgenomics/rhapsody: 2.2.1" > /var/software_versions.txt
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
66
src/bd_rhapsody/bd_rhapsody_make_reference/help.txt
Normal file
66
src/bd_rhapsody/bd_rhapsody_make_reference/help.txt
Normal file
@@ -0,0 +1,66 @@
|
||||
```bash
|
||||
cwl-runner src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl --help
|
||||
```
|
||||
|
||||
usage: src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
|
||||
[-h] [--Archive_prefix ARCHIVE_PREFIX]
|
||||
[--Extra_STAR_params EXTRA_STAR_PARAMS]
|
||||
[--Extra_sequences EXTRA_SEQUENCES] [--Filtering_off] --Genome_fasta
|
||||
GENOME_FASTA --Gtf GTF [--Maximum_threads MAXIMUM_THREADS]
|
||||
[--Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS] [--WTA_Only]
|
||||
[job_order]
|
||||
|
||||
The Reference Files Generator creates an archive containing Genome Index and
|
||||
Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing
|
||||
Analysis Pipeline. The app takes as input one or more FASTA and GTF files and
|
||||
produces a compressed archive in the form of a tar.gz file. The archive
|
||||
contains:\n - STAR index\n - Filtered GTF file
|
||||
|
||||
positional arguments:
|
||||
job_order Job input json file
|
||||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
--Archive_prefix ARCHIVE_PREFIX
|
||||
A prefix for naming the compressed archive file
|
||||
containing the Reference genome index and annotation
|
||||
files. The default value is constructed based on the
|
||||
input Reference files.
|
||||
--Extra_STAR_params EXTRA_STAR_PARAMS
|
||||
Additional parameters to pass to STAR when building
|
||||
the genome index. Specify exactly like how you would
|
||||
on the command line. Example: --limitGenomeGenerateRAM
|
||||
48000 --genomeSAindexNbases 11
|
||||
--Extra_sequences EXTRA_SEQUENCES
|
||||
Additional sequences in FASTA format to use when
|
||||
building the STAR index. (E.g. phiX genome)
|
||||
--Filtering_off By default the input Transcript Annotation files are
|
||||
filtered based on the gene_type/gene_biotype
|
||||
attribute. Only features having the following
|
||||
attribute values are are kept: - protein_coding -
|
||||
lncRNA (lincRNA and antisense for Gencode <
|
||||
v31/M22/Ensembl97) - IG_LV_gene - IG_V_gene -
|
||||
IG_V_pseudogene - IG_D_gene - IG_J_gene -
|
||||
IG_J_pseudogene - IG_C_gene - IG_C_pseudogene -
|
||||
TR_V_gene - TR_V_pseudogene - TR_D_gene - TR_J_gene -
|
||||
TR_J_pseudogene - TR_C_gene If you have already pre-
|
||||
filtered the input Annotation files and/or wish to
|
||||
turn-off the filtering, please set this option to
|
||||
True.
|
||||
--Genome_fasta GENOME_FASTA
|
||||
Reference genome file in FASTA format. The BD
|
||||
Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38
|
||||
for Human and GRCm39 for Mouse.
|
||||
--Gtf GTF Transcript annotation files in GTF format. The BD
|
||||
Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode
|
||||
v42 for Human and M31 for Mouse.
|
||||
--Maximum_threads MAXIMUM_THREADS
|
||||
The maximum number of threads to use in the pipeline.
|
||||
By default, all available cores are used.
|
||||
--Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS
|
||||
Names of the Mitochondrial contigs in the provided
|
||||
Reference Genome. Fragments originating from contigs
|
||||
other than these are identified as 'nuclear fragments'
|
||||
in the ATACseq analysis pipeline.
|
||||
--WTA_Only Build a WTA only index, otherwise builds a WTA + ATAC
|
||||
index.
|
||||
@@ -0,0 +1,115 @@
|
||||
requirements:
|
||||
InlineJavascriptRequirement: {}
|
||||
class: CommandLineTool
|
||||
label: Reference Files Generator for BD Rhapsodyâ„¢ Sequencing Analysis Pipeline
|
||||
cwlVersion: v1.2
|
||||
doc: >-
|
||||
The Reference Files Generator creates an archive containing Genome Index and Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline. The app takes as input one or more FASTA and GTF files and produces a compressed archive in the form of a tar.gz file. The archive contains:\n - STAR index\n - Filtered GTF file
|
||||
|
||||
|
||||
baseCommand: run_reference_generator.sh
|
||||
inputs:
|
||||
Genome_fasta:
|
||||
type: File[]
|
||||
label: Reference Genome
|
||||
doc: |-
|
||||
Reference genome file in FASTA format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse.
|
||||
inputBinding:
|
||||
prefix: --reference-genome
|
||||
shellQuote: false
|
||||
Gtf:
|
||||
type: File[]
|
||||
label: Transcript Annotations
|
||||
doc: |-
|
||||
Transcript annotation files in GTF format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode v42 for Human and M31 for Mouse.
|
||||
inputBinding:
|
||||
prefix: --gtf
|
||||
shellQuote: false
|
||||
Extra_sequences:
|
||||
type: File[]?
|
||||
label: Extra Sequences
|
||||
doc: |-
|
||||
Additional sequences in FASTA format to use when building the STAR index. (E.g. phiX genome)
|
||||
inputBinding:
|
||||
prefix: --extra-sequences
|
||||
shellQuote: false
|
||||
Mitochondrial_Contigs:
|
||||
type: string[]?
|
||||
default: ["chrM", "chrMT", "M", "MT"]
|
||||
label: Mitochondrial Contig Names
|
||||
doc: |-
|
||||
Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are identified as 'nuclear fragments' in the ATACseq analysis pipeline.
|
||||
inputBinding:
|
||||
prefix: --mitochondrial-contigs
|
||||
shellQuote: false
|
||||
Filtering_off:
|
||||
type: boolean?
|
||||
label: Turn off filtering
|
||||
doc: |-
|
||||
By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features having the following attribute values are are kept:
|
||||
- protein_coding
|
||||
- lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97)
|
||||
- IG_LV_gene
|
||||
- IG_V_gene
|
||||
- IG_V_pseudogene
|
||||
- IG_D_gene
|
||||
- IG_J_gene
|
||||
- IG_J_pseudogene
|
||||
- IG_C_gene
|
||||
- IG_C_pseudogene
|
||||
- TR_V_gene
|
||||
- TR_V_pseudogene
|
||||
- TR_D_gene
|
||||
- TR_J_gene
|
||||
- TR_J_pseudogene
|
||||
- TR_C_gene
|
||||
If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.
|
||||
inputBinding:
|
||||
prefix: --filtering-off
|
||||
shellQuote: false
|
||||
WTA_Only:
|
||||
type: boolean?
|
||||
label: WTA only index
|
||||
doc: Build a WTA only index, otherwise builds a WTA + ATAC index.
|
||||
inputBinding:
|
||||
prefix: --wta-only-index
|
||||
shellQuote: false
|
||||
Archive_prefix:
|
||||
type: string?
|
||||
label: Archive Prefix
|
||||
doc: |-
|
||||
A prefix for naming the compressed archive file containing the Reference genome index and annotation files. The default value is constructed based on the input Reference files.
|
||||
inputBinding:
|
||||
prefix: --archive-prefix
|
||||
shellQuote: false
|
||||
Extra_STAR_params:
|
||||
type: string?
|
||||
label: Extra STAR Params
|
||||
doc: |-
|
||||
Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.
|
||||
Example:
|
||||
--limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11
|
||||
inputBinding:
|
||||
prefix: --extra-star-params
|
||||
shellQuote: true
|
||||
|
||||
Maximum_threads:
|
||||
type: int?
|
||||
label: Maximum Number of Threads
|
||||
doc: |-
|
||||
The maximum number of threads to use in the pipeline. By default, all available cores are used.
|
||||
inputBinding:
|
||||
prefix: --maximum-threads
|
||||
shellQuote: false
|
||||
|
||||
outputs:
|
||||
|
||||
Archive:
|
||||
type: File
|
||||
doc: |-
|
||||
A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an input in the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline.
|
||||
id: Reference_Archive
|
||||
label: Reference Files Archive
|
||||
outputBinding:
|
||||
glob: '*.tar.gz'
|
||||
|
||||
161
src/bd_rhapsody/bd_rhapsody_make_reference/script.py
Normal file
161
src/bd_rhapsody/bd_rhapsody_make_reference/script.py
Normal file
@@ -0,0 +1,161 @@
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from typing import Any
|
||||
import yaml
|
||||
import shutil
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
"genome_fasta": [],
|
||||
"gtf": [],
|
||||
"extra_sequences": [],
|
||||
"mitochondrial_contigs": ["chrM", "chrMT", "M", "MT"],
|
||||
"filtering_off": False,
|
||||
"wta_only_index": False,
|
||||
"extra_star_params": None,
|
||||
"reference_archive": "output.tar.gz",
|
||||
}
|
||||
meta = {
|
||||
"config": "target/nextflow/reference/build_bdrhap_2_reference/.config.vsh.yaml",
|
||||
"resources_dir": os.path.abspath("src/reference/build_bdrhap_2_reference"),
|
||||
"temp_dir": os.getenv("VIASH_TEMP"),
|
||||
"memory_mb": None,
|
||||
"cpus": None
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
def clean_arg(argument):
|
||||
argument["clean_name"] = re.sub("^-*", "", argument["name"])
|
||||
return argument
|
||||
|
||||
def read_config(path: str) -> dict[str, Any]:
|
||||
with open(path, "r") as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
config["all_arguments"] = [
|
||||
clean_arg(arg)
|
||||
for grp in config["argument_groups"]
|
||||
for arg in grp["arguments"]
|
||||
]
|
||||
|
||||
return config
|
||||
|
||||
def strip_margin(text: str) -> str:
|
||||
return re.sub("(\n?)[ \t]*\|", "\\1", text)
|
||||
|
||||
def process_params(par: dict[str, Any], config) -> str:
|
||||
# check input parameters
|
||||
assert par["genome_fasta"], "Pass at least one set of inputs to --genome_fasta."
|
||||
assert par["gtf"], "Pass at least one set of inputs to --gtf."
|
||||
assert par["reference_archive"].endswith(".tar.gz"), "Output reference_archive must end with .tar.gz."
|
||||
|
||||
# make paths absolute
|
||||
for argument in config["all_arguments"]:
|
||||
if par[argument["clean_name"]] and argument["type"] == "file":
|
||||
if isinstance(par[argument["clean_name"]], list):
|
||||
par[argument["clean_name"]] = [ os.path.abspath(f) for f in par[argument["clean_name"]] ]
|
||||
else:
|
||||
par[argument["clean_name"]] = os.path.abspath(par[argument["clean_name"]])
|
||||
|
||||
return par
|
||||
|
||||
def generate_config(par: dict[str, Any], meta, config) -> str:
|
||||
content_list = [strip_margin(f"""\
|
||||
|#!/usr/bin/env cwl-runner
|
||||
|
|
||||
|""")]
|
||||
|
||||
|
||||
config_key_value_pairs = []
|
||||
for argument in config["all_arguments"]:
|
||||
config_key = (argument.get("info") or {}).get("config_key")
|
||||
arg_type = argument["type"]
|
||||
par_value = par[argument["clean_name"]]
|
||||
if par_value and config_key:
|
||||
config_key_value_pairs.append((config_key, arg_type, par_value))
|
||||
|
||||
if meta["cpus"]:
|
||||
config_key_value_pairs.append(("Maximum_threads", "integer", meta["cpus"]))
|
||||
|
||||
# print(config_key_value_pairs)
|
||||
|
||||
for config_key, arg_type, par_value in config_key_value_pairs:
|
||||
if arg_type == "file":
|
||||
str = strip_margin(f"""\
|
||||
|{config_key}:
|
||||
|""")
|
||||
if isinstance(par_value, list):
|
||||
for file in par_value:
|
||||
str += strip_margin(f"""\
|
||||
| - class: File
|
||||
| location: "{file}"
|
||||
|""")
|
||||
else:
|
||||
str += strip_margin(f"""\
|
||||
| class: File
|
||||
| location: "{par_value}"
|
||||
|""")
|
||||
content_list.append(str)
|
||||
else:
|
||||
content_list.append(strip_margin(f"""\
|
||||
|{config_key}: {par_value}
|
||||
|"""))
|
||||
|
||||
## Write config to file
|
||||
return "".join(content_list)
|
||||
|
||||
def get_cwl_file(meta: dict[str, Any]) -> str:
|
||||
# create cwl file (if need be)
|
||||
cwl_file=os.path.join(meta["resources_dir"], "make_rhap_reference_2.2.1_nodocker.cwl")
|
||||
|
||||
return cwl_file
|
||||
|
||||
def main(par: dict[str, Any], meta: dict[str, Any]):
|
||||
config = read_config(meta["config"])
|
||||
|
||||
# Preprocess params
|
||||
par = process_params(par, config)
|
||||
|
||||
# fetch cwl file
|
||||
cwl_file = get_cwl_file(meta)
|
||||
|
||||
# Create output dir if not exists
|
||||
outdir = os.path.dirname(par["reference_archive"])
|
||||
if not os.path.exists(outdir):
|
||||
os.makedirs(outdir)
|
||||
|
||||
## Run pipeline
|
||||
with tempfile.TemporaryDirectory(prefix="cwl-bd_rhapsody_wta-", dir=meta["temp_dir"]) as temp_dir:
|
||||
# Create params file
|
||||
config_file = os.path.join(temp_dir, "config.yml")
|
||||
config_content = generate_config(par, meta, config)
|
||||
with open(config_file, "w") as f:
|
||||
f.write(config_content)
|
||||
|
||||
|
||||
cmd = [
|
||||
"cwl-runner",
|
||||
"--no-container",
|
||||
"--preserve-entire-environment",
|
||||
"--outdir",
|
||||
temp_dir,
|
||||
cwl_file,
|
||||
config_file
|
||||
]
|
||||
|
||||
env = dict(os.environ)
|
||||
env["TMPDIR"] = temp_dir
|
||||
|
||||
print("> " + " ".join(cmd), flush=True)
|
||||
_ = subprocess.check_call(
|
||||
cmd,
|
||||
cwd=os.path.dirname(config_file),
|
||||
env=env
|
||||
)
|
||||
|
||||
shutil.move(os.path.join(temp_dir, "Rhap_reference.tar.gz"), par["reference_archive"])
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(par, meta)
|
||||
65
src/bd_rhapsody/bd_rhapsody_make_reference/test.sh
Normal file
65
src/bd_rhapsody/bd_rhapsody_make_reference/test.sh
Normal file
@@ -0,0 +1,65 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -e
|
||||
|
||||
#############################################
|
||||
# helper functions
|
||||
assert_file_exists() {
|
||||
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
|
||||
}
|
||||
assert_file_doesnt_exist() {
|
||||
[ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_empty() {
|
||||
[ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
|
||||
}
|
||||
assert_file_not_empty() {
|
||||
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
|
||||
}
|
||||
assert_file_contains() {
|
||||
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains() {
|
||||
grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
#############################################
|
||||
|
||||
in_fa="$meta_resources_dir/test_data/reference_small.fa"
|
||||
in_gtf="$meta_resources_dir/test_data/reference_small.gtf"
|
||||
|
||||
echo "#############################################"
|
||||
echo "> Simple run"
|
||||
|
||||
mkdir simple_run
|
||||
cd simple_run
|
||||
|
||||
out_tar="myreference.tar.gz"
|
||||
|
||||
echo "> Running $meta_name."
|
||||
$meta_executable \
|
||||
--genome_fasta "$in_fa" \
|
||||
--gtf "$in_gtf" \
|
||||
--reference_archive "$out_tar" \
|
||||
--extra_star_params "--genomeSAindexNbases 6" \
|
||||
---cpus 2
|
||||
|
||||
exit_code=$?
|
||||
[[ $exit_code != 0 ]] && echo "Non zero exit code: $exit_code" && exit 1
|
||||
|
||||
assert_file_exists "$out_tar"
|
||||
assert_file_not_empty "$out_tar"
|
||||
|
||||
echo ">> Checking whether output contains the expected files"
|
||||
tar -xvf "$out_tar" > /dev/null
|
||||
assert_file_exists "BD_Rhapsody_Reference_Files/star_index/genomeParameters.txt"
|
||||
assert_file_exists "BD_Rhapsody_Reference_Files/bwa-mem2_index/reference_small.ann"
|
||||
assert_file_exists "BD_Rhapsody_Reference_Files/reference_small-processed.gtf"
|
||||
assert_file_exists "BD_Rhapsody_Reference_Files/mitochondrial_contigs.txt"
|
||||
assert_file_contains "BD_Rhapsody_Reference_Files/reference_small-processed.gtf" "chr1.*HAVANA.*ENSG00000243485"
|
||||
assert_file_contains "BD_Rhapsody_Reference_Files/mitochondrial_contigs.txt" 'chrMT'
|
||||
|
||||
cd ..
|
||||
|
||||
echo "#############################################"
|
||||
|
||||
echo "> Tests succeeded!"
|
||||
@@ -0,0 +1,27 @@
|
||||
>chr1 1
|
||||
TGGGGAAGCAAGGCGGAGTTGGGCAGCTCGTGTTCAATGGGTAGAGTTTCAGGCTGGGGT
|
||||
GATGGAAGGGTGCTGGAAATGAGTGGTAGTGATGGCGGCACAACAGTGTGAATCTACTTA
|
||||
ATCCCACTGAACTGTATGCTGAAAAATGGTTTAGACGGTGAATTTTAGGTTATGTATGTT
|
||||
TTACCACAATTTTTAAAAAGCTAGTGAAAAGCTGGTAAAAAGAAAGAAAAGAGGCTTTTT
|
||||
TAAAAAGTTAAATATATAAAAAGAGCATCATCAGTCCAAAGTCCAGCAGTTGTCCCTCCT
|
||||
GGAATCCGTTGGCTTGCCTCCGGCATTTTTGGCCCTTGCCTTTTAGGGTTGCCAGATTAA
|
||||
AAGACAGGATGCCCAGCTAGTTTGAATTTTAGATAAACAACGAATAATTTCGTAGCATAA
|
||||
ATATGTCCCAAGCTTAGTTTGGGACATACTTATGCTAAAAAACATTATTGGTTGTTTATC
|
||||
TGAGATTCAGAATTAAGCATTTTATATTTTATTTGCTGCCTCTGGCCACCCTACTCTCTT
|
||||
CCTAACACTCTCTCCCTCTCCCAGTTTTGTCCGCCTTCCCTGCCTCCTCTTCTGGGGGAG
|
||||
TTAGATCGAGTTGTAACAAGAACATGCCACTGTCTCGCTGGCTGCAGCGTGTGGTCCCCT
|
||||
TACCAGAGGTAAAGAAGAGATGGATCTCCACTCATGTTGTAGACAGAATGTTTATGTCCT
|
||||
CTCCAAATGCTTATGTTGAAACCCTAACCCCTAATGTGATGGTATGTGGAGATGGGCCTT
|
||||
TGGTAGGTAATTACGGTTAGATGAGGTCATGGGGTGGGGCCCTCATTATAGATCTGGTAA
|
||||
GAAAAGAGAGCATTGTCTCTGTGTCTCCCTCTCTCTCTCTCTCTCTCTCTCTCATTTCTC
|
||||
TCTATCTCATTTCTCTCTCTCTCGCTATCTCATTTTTCTCTCTCTCTCTTTCTCTCCTCT
|
||||
GTCTTTTCCCACCAAGTGAGGATGCGAAGAGAAGGTGGCTGTCTGCAAACCAGGAAGAGA
|
||||
GCCCTCACCGGGAACCCGTCCAGCTGCCACCTTGAACTTGGACTTCCAAGCCTCCAGAAC
|
||||
TGTGAGGGATAAATGTATGATTTTAAAGTCGCCCAGTGTGTGGTATTTTGTTTTGACTAA
|
||||
TACAACCTGAAAACATTTTCCCCTCACTCCACCTGAGCAATATCTGAGTGGCTTAAGGTA
|
||||
CTCAGGACACAACAAAGGAGAAATGTCCCATGCACAAGGTGCACCCATGCCTGGGTAAAG
|
||||
CAGCCTGGCACAGAGGGAAGCACACAGGCTCAGGGATCTGCTATTCATTCTTTGTGTGAC
|
||||
CCTGGGCAAGCCATGAATGGAGCTTCAGTCACCCCATTTGTAATGGGATTTAATTGTGCT
|
||||
TGCCCTGCCTCCTTTTGAGGGCTGTAGAGAAAAGATGTCAAAGTATTTTGTAATCTGGCT
|
||||
GGGCGTGGTGGCTCATGCCTGTAATCCTAGCACTTTGGTAGGCTGACGCGAGAGGACTGC
|
||||
T
|
||||
@@ -0,0 +1,8 @@
|
||||
chr1 HAVANA exon 565 668 . + . gene_id "ENSG00000243485.5"; transcript_id "ENST00000473358.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 2; exon_id "ENSE00001922571.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; tag "Ensembl_canonical"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1";
|
||||
chr1 HAVANA exon 977 1098 . + . gene_id "ENSG00000243485.5"; transcript_id "ENST00000473358.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-202"; exon_number 3; exon_id "ENSE00001827679.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "dotter_confirmed"; tag "basic"; tag "Ensembl_canonical"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1";
|
||||
chr1 HAVANA transcript 268 1110 . + . gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
|
||||
chr1 HAVANA exon 268 668 . + . gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 1; exon_id "ENSE00001841699.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
|
||||
chr1 HAVANA exon 977 1110 . + . gene_id "ENSG00000243485.5"; transcript_id "ENST00000469289.1"; gene_type "lncRNA"; gene_name "MIR1302-2HG"; transcript_type "lncRNA"; transcript_name "MIR1302-2HG-201"; exon_number 2; exon_id "ENSE00001890064.1"; level 2; transcript_support_level "5"; hgnc_id "HGNC:52482"; tag "not_best_in_genome_evidence"; tag "basic"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002841.2";
|
||||
chr1 ENSEMBL gene 367 504 . + . gene_id "ENSG00000284332.1"; gene_type "miRNA"; gene_name "MIR1302-2"; level 3; hgnc_id "HGNC:35294";
|
||||
chr1 ENSEMBL transcript 367 504 . + . gene_id "ENSG00000284332.1"; transcript_id "ENST00000607096.1"; gene_type "miRNA"; gene_name "MIR1302-2"; transcript_type "miRNA"; transcript_name "MIR1302-2-201"; level 3; transcript_support_level "NA"; hgnc_id "HGNC:35294"; tag "basic"; tag "Ensembl_canonical";
|
||||
chr1 ENSEMBL exon 367 504 . + . gene_id "ENSG00000284332.1"; transcript_id "ENST00000607096.1"; gene_type "miRNA"; gene_name "MIR1302-2"; transcript_type "miRNA"; transcript_name "MIR1302-2-201"; exon_number 1; exon_id "ENSE00003695741.1"; level 3; transcript_support_level "NA"; hgnc_id "HGNC:35294"; tag "basic"; tag "Ensembl_canonical";
|
||||
@@ -0,0 +1,47 @@
|
||||
#!/bin/bash
|
||||
|
||||
TMP_DIR=/tmp/bd_rhapsody_make_reference
|
||||
OUT_DIR=src/bd_rhapsody/bd_rhapsody_make_reference/test_data
|
||||
|
||||
# check if seqkit is installed
|
||||
if ! command -v seqkit &> /dev/null; then
|
||||
echo "seqkit could not be found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# create temporary directory and clean up on exit
|
||||
mkdir -p $TMP_DIR
|
||||
function clean_up {
|
||||
rm -rf "$TMP_DIR"
|
||||
}
|
||||
trap clean_up EXIT
|
||||
|
||||
# fetch reference
|
||||
ORIG_FA=$TMP_DIR/reference.fa.gz
|
||||
if [ ! -f $ORIG_FA ]; then
|
||||
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz \
|
||||
-O $ORIG_FA
|
||||
fi
|
||||
|
||||
ORIG_GTF=$TMP_DIR/reference.gtf.gz
|
||||
if [ ! -f $ORIG_GTF ]; then
|
||||
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz \
|
||||
-O $ORIG_GTF
|
||||
fi
|
||||
|
||||
# create small reference
|
||||
START=30000
|
||||
END=31500
|
||||
CHR=chr1
|
||||
|
||||
# subset to small region
|
||||
seqkit grep -r -p "^$CHR\$" "$ORIG_FA" | \
|
||||
seqkit subseq -r "$START:$END" > $OUT_DIR/reference_small.fa
|
||||
|
||||
zcat "$ORIG_GTF" | \
|
||||
awk -v FS='\t' -v OFS='\t' "
|
||||
\$1 == \"$CHR\" && \$4 >= $START && \$5 <= $END {
|
||||
\$4 = \$4 - $START + 1;
|
||||
\$5 = \$5 - $START + 1;
|
||||
print;
|
||||
}" > $OUT_DIR/reference_small.gtf
|
||||
@@ -10,6 +10,9 @@ references:
|
||||
license: GPL-2.0
|
||||
requirements:
|
||||
commands: [bedtools]
|
||||
authors:
|
||||
- __merge__: /src/_authors/dries_schaumont.yaml
|
||||
roles: [ author, maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: Input arguments
|
||||
|
||||
@@ -9,6 +9,9 @@ links:
|
||||
references:
|
||||
doi: 10.1007/978-1-4939-9173-0_14
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,6 +9,9 @@ links:
|
||||
references:
|
||||
doi: 10.1007/978-1-4939-9173-0_14
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Outputs
|
||||
arguments:
|
||||
|
||||
@@ -9,6 +9,9 @@ links:
|
||||
references:
|
||||
doi: 10.1007/978-1-4939-9173-0_14
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,6 +9,9 @@ links:
|
||||
references:
|
||||
doi: 10.14806/ej.17.1.200
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/toni_verbeiren.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
####################################################################
|
||||
- name: Specify Adapters for R1
|
||||
|
||||
@@ -6,25 +6,25 @@ set -eo pipefail
|
||||
#############################################
|
||||
# helper functions
|
||||
assert_file_exists() {
|
||||
[ -f "$1" ] || (echo "File '$1' does not exist" && exit 1)
|
||||
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
|
||||
}
|
||||
assert_file_doesnt_exist() {
|
||||
[ ! -f "$1" ] || (echo "File '$1' exists but shouldn't" && exit 1)
|
||||
[ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_empty() {
|
||||
[ ! -s "$1" ] || (echo "File '$1' is not empty but should be" && exit 1)
|
||||
[ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
|
||||
}
|
||||
assert_file_not_empty() {
|
||||
[ -s "$1" ] || (echo "File '$1' is empty but shouldn't be" && exit 1)
|
||||
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
|
||||
}
|
||||
assert_file_contains() {
|
||||
grep -q "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
|
||||
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains() {
|
||||
grep -q "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
|
||||
grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
|
||||
#############################################
|
||||
|
||||
mkdir test_multiple_output
|
||||
cd test_multiple_output
|
||||
|
||||
|
||||
@@ -9,6 +9,9 @@ references:
|
||||
license: GPL-3.0
|
||||
requirements:
|
||||
commands: [falco]
|
||||
authors:
|
||||
- __merge__: /src/_authors/toni_verbeiren.yaml
|
||||
roles: [ author, maintainer ]
|
||||
|
||||
# Notes:
|
||||
# - falco as arguments similar to -subsample and we update those to --subsample
|
||||
|
||||
@@ -26,6 +26,9 @@ links:
|
||||
references:
|
||||
doi: "10.1093/bioinformatics/bty560"
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/robrecht_cannoodt.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
description: |
|
||||
|
||||
@@ -11,7 +11,9 @@ references:
|
||||
license: GPL-3.0
|
||||
requirements:
|
||||
commands: [ featureCounts ]
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/sai_nirmayi_yasa.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -8,8 +8,9 @@ links:
|
||||
references:
|
||||
doi: 10.12688/f1000research.23297.2
|
||||
license: MIT
|
||||
requirements:
|
||||
commands: [ gffread ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
@@ -52,7 +53,7 @@ argument_groups:
|
||||
required: true
|
||||
description: |
|
||||
Write the output records into <outfile>.
|
||||
default: output.gff
|
||||
example: output.gff
|
||||
- name: --force_exons
|
||||
type: boolean_true
|
||||
description: |
|
||||
@@ -154,7 +155,6 @@ argument_groups:
|
||||
- name: --table
|
||||
type: string
|
||||
multiple: true
|
||||
multiple_sep: ","
|
||||
description: |
|
||||
Output a simple tab delimited format instead of GFF, with columns having the values
|
||||
of GFF attributes given in <attrlist>; special pseudo-attributes (prefixed by @) are
|
||||
|
||||
@@ -50,6 +50,8 @@
|
||||
[[ "$par_expose_dups" == "false" ]] && unset par_expose_dups
|
||||
[[ "$par_cluster_only" == "false" ]] && unset par_cluster_only
|
||||
|
||||
# if par_table is not empty, replace ";" with ","
|
||||
par_table=$(echo "$par_table" | tr ';' ',')
|
||||
|
||||
$(which gffread) \
|
||||
"$par_input" \
|
||||
|
||||
@@ -86,7 +86,7 @@ diff "$expected_output_dir/transcripts.fa" "$test_output_dir/transcripts.fa" ||
|
||||
echo "> Test 4 - Generate table from GFF annotation file"
|
||||
|
||||
"$meta_executable" \
|
||||
--table @id,@chr,@start,@end,@strand,@exons,Name,gene,product \
|
||||
--table "@id;@chr;@start;@end;@strand;@exons;Name;gene;product" \
|
||||
--outfile "$test_output_dir/annotation.tbl" \
|
||||
--input "$test_dir/sequence.gff3"
|
||||
|
||||
|
||||
@@ -17,6 +17,9 @@ references:
|
||||
license: "MIT"
|
||||
requirements:
|
||||
commands: [ lofreq ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/kai_waldrant.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -18,6 +18,9 @@ references:
|
||||
license: "MIT"
|
||||
requirements:
|
||||
commands: [ lofreq ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/kai_waldrant.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -11,7 +11,9 @@ info:
|
||||
references:
|
||||
doi: 10.1093/bioinformatics/btw354
|
||||
licence: GPL v3 or later
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/dorien_roosen.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
|
||||
@@ -12,7 +12,10 @@ references:
|
||||
doi: 10.1093/bioinformatics/btt593
|
||||
license: "CC-BY-NC-SA-3.0"
|
||||
requirements:
|
||||
commands: [ pear , gzip ]
|
||||
commands: [ pear, gzip ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/kai_waldrant.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -12,7 +12,9 @@ references:
|
||||
license: GPL-3.0
|
||||
requirements:
|
||||
commands: [ salmon ]
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/sai_nirmayi_yasa.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -12,7 +12,9 @@ references:
|
||||
license: GPL-3.0
|
||||
requirements:
|
||||
commands: [ salmon ]
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/sai_nirmayi_yasa.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Common input options
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
@@ -30,10 +32,10 @@ argument_groups:
|
||||
- name: --coverage
|
||||
alternatives: -c
|
||||
type: integer
|
||||
description: |
|
||||
Coverage distribution min,max,step [1,1000,1].
|
||||
multiple: true
|
||||
multiple_sep: ','
|
||||
description: |
|
||||
Coverage distribution min;max;step. Default: [1, 1000, 1].
|
||||
example: [1, 1000, 1]
|
||||
- name: --remove_dups
|
||||
alternatives: -d
|
||||
type: boolean_true
|
||||
@@ -48,25 +50,25 @@ argument_groups:
|
||||
alternatives: -f
|
||||
type: string
|
||||
description: |
|
||||
Required flag, 0 for unset. See also `samtools flags`.
|
||||
default: "0"
|
||||
Required flag, 0 for unset. See also `samtools flags`. Default: `"0"`.
|
||||
example: "0"
|
||||
- name: --filtering_flag
|
||||
alternatives: -F
|
||||
type: string
|
||||
description: |
|
||||
Filtering flag, 0 for unset. See also `samtools flags`.
|
||||
default: "0"
|
||||
Filtering flag, 0 for unset. See also `samtools flags`. Default: `0`.
|
||||
example: "0"
|
||||
- name: --GC_depth
|
||||
type: double
|
||||
description: |
|
||||
The size of GC-depth bins (decreasing bin size increases memory requirement).
|
||||
default: 20000.0
|
||||
The size of GC-depth bins (decreasing bin size increases memory requirement). Default: `20000`.
|
||||
example: 20000.0
|
||||
- name: --insert_size
|
||||
alternatives: -i
|
||||
type: integer
|
||||
description: |
|
||||
Maximum insert size.
|
||||
default: 8000
|
||||
Maximum insert size. Default: `8000`.
|
||||
example: 8000
|
||||
- name: --id
|
||||
alternatives: -I
|
||||
type: string
|
||||
@@ -76,14 +78,14 @@ argument_groups:
|
||||
alternatives: -l
|
||||
type: integer
|
||||
description: |
|
||||
Include in the statistics only reads with the given read length.
|
||||
default: -1
|
||||
Include in the statistics only reads with the given read length. Default: `-1`.
|
||||
example: -1
|
||||
- name: --most_inserts
|
||||
alternatives: -m
|
||||
type: double
|
||||
description: |
|
||||
Report only the main part of inserts.
|
||||
default: 0.99
|
||||
Report only the main part of inserts. Default: `0.99`.
|
||||
example: 0.99
|
||||
- name: --split_prefix
|
||||
alternatives: -P
|
||||
type: string
|
||||
@@ -93,8 +95,8 @@ argument_groups:
|
||||
alternatives: -q
|
||||
type: integer
|
||||
description: |
|
||||
The BWA trimming parameter.
|
||||
default: 0
|
||||
The BWA trimming parameter. Default: `0`.
|
||||
example: 0
|
||||
- name: --ref_seq
|
||||
alternatives: -r
|
||||
type: file
|
||||
@@ -124,8 +126,8 @@ argument_groups:
|
||||
alternatives: -g
|
||||
type: integer
|
||||
description: |
|
||||
Only bases with coverage above this value will be included in the target percentage computation.
|
||||
default: 0
|
||||
Only bases with coverage above this value will be included in the target percentage computation. Default: `0`.
|
||||
example: 0
|
||||
- name: --input_fmt_option
|
||||
type: string
|
||||
description: |
|
||||
@@ -141,7 +143,7 @@ argument_groups:
|
||||
type: file
|
||||
description: |
|
||||
Output file.
|
||||
default: "out.txt"
|
||||
example: "out.txt"
|
||||
required: true
|
||||
direction: output
|
||||
|
||||
|
||||
@@ -10,6 +10,9 @@ set -e
|
||||
[[ "$par_sparse" == "false" ]] && unset par_sparse
|
||||
[[ "$par_remove_overlaps" == "false" ]] && unset par_remove_overlaps
|
||||
|
||||
# change the coverage input from X;X;X to X,X,X
|
||||
par_coverage=$(echo "$par_coverage" | tr ';' ',')
|
||||
|
||||
samtools stats \
|
||||
${par_coverage:+-c "$par_coverage"} \
|
||||
${par_remove_dups:+-d} \
|
||||
|
||||
@@ -17,7 +17,7 @@ echo ">>> Checking whether output is non-empty"
|
||||
[ ! -s "$test_dir/test.paired_end.sorted.txt" ] && echo "File 'test.paired_end.sorted.txt' is empty!" && exit 1
|
||||
|
||||
echo ">>> Checking whether output is correct"
|
||||
# compare using diff, ignoring the line stating the command that was passed.
|
||||
# compare using diff, ignoring the line stating the command that was passed.
|
||||
diff <(grep -v "^# The command" "$test_dir/test.paired_end.sorted.txt") \
|
||||
<(grep -v "^# The command" "$test_dir/ref.paired_end.sorted.txt") || \
|
||||
(echo "Output file ref.paired_end.sorted.txt does not match expected output" && exit 1)
|
||||
|
||||
@@ -9,7 +9,9 @@ links:
|
||||
references:
|
||||
doi: [10.1093/bioinformatics/btp352, 10.1093/gigascience/giab008]
|
||||
license: MIT/Expat
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
57
src/seqtk/seqtk_sample/config.vsh.yaml
Normal file
57
src/seqtk/seqtk_sample/config.vsh.yaml
Normal file
@@ -0,0 +1,57 @@
|
||||
name: seqtk_sample
|
||||
namespace: seqtk
|
||||
description: Subsamples sequences from FASTA/Q files.
|
||||
keywords: [sample, FASTA, FASTQ]
|
||||
links:
|
||||
repository: https://github.com/lh3/seqtk/tree/v1.4
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/jakub_majercik.yaml
|
||||
roles: [ author, maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: --input
|
||||
type: file
|
||||
description: The input FASTA/Q file.
|
||||
required: true
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: --output
|
||||
type: file
|
||||
description: The output FASTA/Q file.
|
||||
required: true
|
||||
direction: output
|
||||
|
||||
- name: Options
|
||||
arguments:
|
||||
- name: --seed
|
||||
type: integer
|
||||
description: Seed for random generator.
|
||||
example: 42
|
||||
- name: --fraction_number
|
||||
type: double
|
||||
description: Fraction or number of sequences to sample.
|
||||
required: true
|
||||
example: 0.1
|
||||
- name: --two_pass_mode
|
||||
type: boolean_true
|
||||
description: Twice as slow but with much reduced memory
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: ../test_data
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/seqtk:1.4--he4a0461_2
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
9
src/seqtk/seqtk_sample/help.txt
Normal file
9
src/seqtk/seqtk_sample/help.txt
Normal file
@@ -0,0 +1,9 @@
|
||||
```
|
||||
seqtk_subseq
|
||||
```
|
||||
Usage: seqtk subseq [options] <in.fa> <in.bed>|<name.list>
|
||||
Options:
|
||||
-t TAB delimited output
|
||||
-s strand aware
|
||||
-l INT sequence line length [0]
|
||||
Note: Use 'samtools faidx' if only a few regions are intended.
|
||||
11
src/seqtk/seqtk_sample/script.sh
Normal file
11
src/seqtk/seqtk_sample/script.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
seqtk sample \
|
||||
${par_two_pass_mode:+-2} \
|
||||
${par_seed:+-s "$par_seed"} \
|
||||
"$par_input" \
|
||||
"$par_fraction_number" \
|
||||
> "$par_output"
|
||||
104
src/seqtk/seqtk_sample/test.sh
Normal file
104
src/seqtk/seqtk_sample/test.sh
Normal file
@@ -0,0 +1,104 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -e
|
||||
|
||||
## VIASH START
|
||||
meta_executable="target/executable/seqtk/seqtk_sample"
|
||||
meta_resources_dir="src/seqtk"
|
||||
## VIASH END
|
||||
|
||||
#########################################################################################
|
||||
mkdir seqtk_sample_se
|
||||
cd seqtk_sample_se
|
||||
|
||||
echo "> Run seqtk_sample on fastq SE"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
|
||||
--seed 42 \
|
||||
--fraction_number 3 \
|
||||
--output "sampled.fastq"
|
||||
|
||||
echo ">> Check if output exists"
|
||||
if [ ! -f "sampled.fastq" ]; then
|
||||
echo ">> sampled.fastq does not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ">> Count number of samples"
|
||||
num_samples=$(grep -c '^@' sampled.fastq)
|
||||
if [ "$num_samples" -ne 3 ]; then
|
||||
echo ">> sampled.fastq does not contain 3 samples"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
cd ..
|
||||
mkdir seqtk_sample_pe_number
|
||||
cd seqtk_sample_pe_number
|
||||
|
||||
echo ">> Run seqtk_sample on fastq.gz PE with number of reads"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
|
||||
--seed 42 \
|
||||
--fraction_number 3 \
|
||||
--output "sampled_1.fastq"
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/reads/a.2.fastq.gz" \
|
||||
--seed 42 \
|
||||
--fraction_number 3 \
|
||||
--output "sampled_2.fastq"
|
||||
|
||||
echo ">> Check if output exists"
|
||||
if [ ! -f "sampled_1.fastq" ] || [ ! -f "sampled_2.fastq" ]; then
|
||||
echo ">> One or both output files do not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ">> Compare reads"
|
||||
# Extract headers
|
||||
headers1=$(grep '^@' sampled_1.fastq | sed -e's/ 1$//' | sort)
|
||||
headers2=$(grep '^@' sampled_2.fastq | sed -e 's/ 2$//' | sort)
|
||||
|
||||
# Compare headers
|
||||
diff <(echo "$headers1") <(echo "$headers2") || { echo "Mismatch detected"; exit 1; }
|
||||
|
||||
echo ">> Count number of samples"
|
||||
num_headers=$(echo "$headers1" | wc -l)
|
||||
if [ "$num_headers" -ne 3 ]; then
|
||||
echo ">> sampled_1.fastq does not contain 3 headers"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
cd ..
|
||||
mkdir seqtk_sample_pe_fraction
|
||||
cd seqtk_sample_pe_fraction
|
||||
|
||||
echo ">> Run seqtk_sample on fastq.gz PE with fraction of reads"
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/reads/a.1.fastq.gz" \
|
||||
--seed 42 \
|
||||
--fraction_number 0.5 \
|
||||
--output "sampled_1.fastq"
|
||||
|
||||
"$meta_executable" \
|
||||
--input "$meta_resources_dir/test_data/reads/a.2.fastq.gz" \
|
||||
--seed 42 \
|
||||
--fraction_number 0.5 \
|
||||
--output "sampled_2.fastq"
|
||||
|
||||
echo ">> Check if output exists"
|
||||
if [ ! -f "sampled_1.fastq" ] || [ ! -f "sampled_2.fastq" ]; then
|
||||
echo ">> One or both output files do not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ">> Compare reads"
|
||||
# Extract headers
|
||||
headers1=$(grep '^@' sampled_1.fastq | sed -e's/ 1$//' | sort)
|
||||
headers2=$(grep '^@' sampled_2.fastq | sed -e 's/ 2$//' | sort)
|
||||
|
||||
# Compare headers
|
||||
diff <(echo "$headers1") <(echo "$headers2") || { echo "Mismatch detected"; exit 1; }
|
||||
|
||||
78
src/seqtk/seqtk_subseq/config.vsh.yaml
Normal file
78
src/seqtk/seqtk_subseq/config.vsh.yaml
Normal file
@@ -0,0 +1,78 @@
|
||||
name: seqtk_subseq
|
||||
namespace: seqtk
|
||||
description: |
|
||||
Extract subsequences from FASTA/Q files. Takes as input a FASTA/Q file and a name.lst (sequence ids file) or a reg.bed (genomic regions file).
|
||||
keywords: [subseq, FASTA, FASTQ]
|
||||
links:
|
||||
repository: https://github.com/lh3/seqtk/tree/v1.4
|
||||
license: MIT
|
||||
authors:
|
||||
- __merge__: /src/_authors/theodoro_gasperin.yaml
|
||||
roles: [ author, maintainer ]
|
||||
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
- name: "--input"
|
||||
type: file
|
||||
direction: input
|
||||
description: The input FASTA/Q file.
|
||||
required: true
|
||||
example: input.fa
|
||||
|
||||
- name: "--name_list"
|
||||
type: file
|
||||
direction: input
|
||||
description: |
|
||||
List of sequence names (name.lst) or genomic regions (reg.bed) to extract.
|
||||
required: true
|
||||
example: list.lst
|
||||
|
||||
- name: Outputs
|
||||
arguments:
|
||||
- name: "--output"
|
||||
alternatives: -o
|
||||
type: file
|
||||
direction: output
|
||||
description: The output FASTA/Q file.
|
||||
required: true
|
||||
default: output.fa
|
||||
|
||||
- name: Options
|
||||
arguments:
|
||||
- name: "--tab"
|
||||
alternatives: -t
|
||||
type: boolean_true
|
||||
description: TAB delimited output.
|
||||
|
||||
- name: "--strand_aware"
|
||||
alternatives: -s
|
||||
type: boolean_true
|
||||
description: Strand aware.
|
||||
|
||||
- name: "--sequence_line_length"
|
||||
alternatives: -l
|
||||
type: integer
|
||||
description: |
|
||||
Sequence line length of input fasta file. Default: 0.
|
||||
example: 0
|
||||
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/seqtk:1.4--he4a0461_2
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
echo $(echo $(seqtk 2>&1) | sed -n 's/.*\(Version: [^ ]*\).*/\1/p') > /var/software_versions.txt
|
||||
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
9
src/seqtk/seqtk_subseq/help.txt
Normal file
9
src/seqtk/seqtk_subseq/help.txt
Normal file
@@ -0,0 +1,9 @@
|
||||
```bash
|
||||
seqtk subseq
|
||||
```
|
||||
Usage: seqtk subseq [options] <in.fa> <in.bed>|<name.list>
|
||||
Options:
|
||||
-t TAB delimited output
|
||||
-s strand aware
|
||||
-l INT sequence line length [0]
|
||||
Note: Use 'samtools faidx' if only a few regions are intended.
|
||||
15
src/seqtk/seqtk_subseq/script.sh
Normal file
15
src/seqtk/seqtk_subseq/script.sh
Normal file
@@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
[[ "$par_tab" == "false" ]] && unset par_tab
|
||||
[[ "$par_strand_aware" == "false" ]] && unset par_strand_aware
|
||||
|
||||
seqtk subseq \
|
||||
${par_tab:+-t} \
|
||||
${par_strand_aware:+-s} \
|
||||
${par_sequence_line_length:+-l "$par_sequence_line_length"} \
|
||||
"$par_input" \
|
||||
"$par_name_list" \
|
||||
> "$par_output"
|
||||
182
src/seqtk/seqtk_subseq/test.sh
Normal file
182
src/seqtk/seqtk_subseq/test.sh
Normal file
@@ -0,0 +1,182 @@
|
||||
#!/bin/bash
|
||||
|
||||
# exit on error
|
||||
set -e
|
||||
|
||||
## VIASH START
|
||||
meta_executable="target/executable/seqtk/seqtk_subseq"
|
||||
meta_resources_dir="src/seqtk"
|
||||
## VIASH END
|
||||
|
||||
# Create directories for tests
|
||||
echo "Creating Test Data..."
|
||||
mkdir test_data
|
||||
|
||||
# Create and populate input.fasta
|
||||
cat > "test_data/input.fasta" <<EOL
|
||||
>KU562861.1
|
||||
GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCA
|
||||
AGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA
|
||||
>GU056837.1
|
||||
CTAATTTTATTTTTTTATAATAATTATTGGAGGAACTAAAACATTAATGAAATAATAATTATCATAATTA
|
||||
TTAATTACATATTTATTAGGTATAATATTTAAGGAAAAATATATTTTATGTTAATTGTAATAATTAGAAC
|
||||
>CP097510.1
|
||||
CGATTTAGATCGGTGTAGTCAACACACATCCTCCACTTCCATTAGGCTTCTTGACGAGGACTACATTGAC
|
||||
AGCCACCGAGGGAACCGACCTCCTCAATGAAGTCAGACGCCAAGAGCCTATCAACTTCCTTCTGCACAGC
|
||||
>JAMFTS010000002.1
|
||||
CCTAAACCCTAAACCCTAAACCCCCTACAAACCTTACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA
|
||||
ACCCGAAACCCTATACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCAAACCTAATCCCTAAACC
|
||||
>MH150936.1
|
||||
TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTC
|
||||
AAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG
|
||||
EOL
|
||||
|
||||
# Update id.list with new entries
|
||||
cat > "test_data/id.list" <<EOL
|
||||
KU562861.1
|
||||
MH150936.1
|
||||
EOL
|
||||
|
||||
# Create and populate reg.bed
|
||||
cat > "test_data/reg.bed" <<EOL
|
||||
KU562861.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")+$(echo -e "\n")
|
||||
MH150936.1$(echo -e "\t")10$(echo -e "\t")20$(echo -e "\t")region$(echo -e "\t")0$(echo -e "\t")-
|
||||
EOL
|
||||
|
||||
#########################################################################################
|
||||
# Run basic test
|
||||
mkdir test1
|
||||
cd test1
|
||||
|
||||
echo "> Run seqtk_subseq on FASTA/Q file"
|
||||
"$meta_executable" \
|
||||
--input "../test_data/input.fasta" \
|
||||
--name_list "../test_data/id.list" \
|
||||
--output "sub_sample.fq"
|
||||
|
||||
expected_output_basic=">KU562861.1
|
||||
GGAGCAGGAGAGTGTTCGAGTTCAGAGATGTCCATGGCGCCGTACGAGAAGGTGATGGATGACCTGGCCAAGGGGCAGCAGTTCGCGACGCAGCTGCAGGGCCTCCTCCGGGACTCCCCCAAGGCCGGCCACATCATGGA
|
||||
>MH150936.1
|
||||
TAGAAGCTAATGAAAACTTTTCCTTTACTAAAAACCGTCAAACACGGTAAGAAACGCTTTTAATCATTTCAAAAGCAATCCCAATAGTGGTTACATCCAAACAAAACCCATTTCTTATATTTTCTCAAAAACAGTGAGAG"
|
||||
output_basic=$(cat sub_sample.fq)
|
||||
|
||||
if [ "$output_basic" != "$expected_output_basic" ]; then
|
||||
echo "Test failed"
|
||||
echo "Expected:"
|
||||
echo "$expected_output_basic"
|
||||
echo "Got:"
|
||||
echo "$output_basic"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
# Run reg.bed as name list input test
|
||||
cd ..
|
||||
mkdir test2
|
||||
cd test2
|
||||
|
||||
echo "> Run seqtk_subseq on FASTA/Q file with BED file as name list"
|
||||
"$meta_executable" \
|
||||
--input "../test_data/input.fasta" \
|
||||
--name_list "../test_data/reg.bed" \
|
||||
--output "sub_sample.fq"
|
||||
|
||||
expected_output_basic=">KU562861.1:11-20
|
||||
AGTGTTCGAG
|
||||
>MH150936.1:11-20
|
||||
TGAAAACTTT"
|
||||
output_basic=$(cat sub_sample.fq)
|
||||
|
||||
if [ "$output_basic" != "$expected_output_basic" ]; then
|
||||
echo "Test failed"
|
||||
echo "Expected:"
|
||||
echo "$expected_output_basic"
|
||||
echo "Got:"
|
||||
echo "$output_basic"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
# Run tab option output test
|
||||
cd ..
|
||||
mkdir test3
|
||||
cd test3
|
||||
|
||||
echo "> Run seqtk_subseq with TAB option"
|
||||
"$meta_executable" \
|
||||
--tab \
|
||||
--input "../test_data/input.fasta" \
|
||||
--name_list "../test_data/reg.bed" \
|
||||
--output "sub_sample.fq"
|
||||
|
||||
expected_output_tabular=$'KU562861.1\t11\tAGTGTTCGAG\nMH150936.1\t11\tTGAAAACTTT'
|
||||
output_tabular=$(cat sub_sample.fq)
|
||||
|
||||
if [ "$output_tabular" != "$expected_output_tabular" ]; then
|
||||
echo "Test failed"
|
||||
echo "Expected:"
|
||||
echo "$expected_output_tabular"
|
||||
echo "Got:"
|
||||
echo "$output_tabular"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
# Run line option output test
|
||||
cd ..
|
||||
mkdir test4
|
||||
cd test4
|
||||
|
||||
echo "> Run seqtk_subseq with line length option"
|
||||
"$meta_executable" \
|
||||
--sequence_line_length 5 \
|
||||
--input "../test_data/input.fasta" \
|
||||
--name_list "../test_data/reg.bed" \
|
||||
--output "sub_sample.fq"
|
||||
|
||||
expected_output_wrapped=">KU562861.1:11-20
|
||||
AGTGT
|
||||
TCGAG
|
||||
>MH150936.1:11-20
|
||||
TGAAA
|
||||
ACTTT"
|
||||
output_wrapped=$(cat sub_sample.fq)
|
||||
|
||||
if [ "$output_wrapped" != "$expected_output_wrapped" ]; then
|
||||
echo "Test failed"
|
||||
echo "Expected:"
|
||||
echo "$expected_output_wrapped"
|
||||
echo "Got:"
|
||||
echo "$output_wrapped"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#########################################################################################
|
||||
# Run Strand Aware option output test
|
||||
cd ..
|
||||
mkdir test5
|
||||
cd test5
|
||||
|
||||
echo "> Run seqtk_subseq with strand aware option"
|
||||
"$meta_executable" \
|
||||
--strand_aware \
|
||||
--input "../test_data/input.fasta" \
|
||||
--name_list "../test_data/reg.bed" \
|
||||
--output "sub_sample.fq"
|
||||
|
||||
expected_output_wrapped=">KU562861.1:11-20
|
||||
AGTGTTCGAG
|
||||
>MH150936.1:11-20
|
||||
AAAGTTTTCA"
|
||||
output_wrapped=$(cat sub_sample.fq)
|
||||
|
||||
if [ "$output_wrapped" != "$expected_output_wrapped" ]; then
|
||||
echo "Test failed"
|
||||
echo "Expected:"
|
||||
echo "$expected_output_wrapped"
|
||||
echo "Got:"
|
||||
echo "$output_wrapped"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "All tests succeeded!"
|
||||
BIN
src/seqtk/test_data/reads/a.1.fastq.gz
Normal file
BIN
src/seqtk/test_data/reads/a.1.fastq.gz
Normal file
Binary file not shown.
BIN
src/seqtk/test_data/reads/a.2.fastq.gz
Normal file
BIN
src/seqtk/test_data/reads/a.2.fastq.gz
Normal file
Binary file not shown.
4
src/seqtk/test_data/reads/a.fastq
Normal file
4
src/seqtk/test_data/reads/a.fastq
Normal file
@@ -0,0 +1,4 @@
|
||||
@1
|
||||
ACGGCAT
|
||||
+
|
||||
!!!!!!!
|
||||
BIN
src/seqtk/test_data/reads/a.fastq.gz
Normal file
BIN
src/seqtk/test_data/reads/a.fastq.gz
Normal file
Binary file not shown.
1
src/seqtk/test_data/reads/id.list
Normal file
1
src/seqtk/test_data/reads/id.list
Normal file
@@ -0,0 +1 @@
|
||||
1
|
||||
9
src/seqtk/test_data/script.sh
Executable file
9
src/seqtk/test_data/script.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
# clone repo
|
||||
if [ ! -d /tmp/snakemake-wrappers ]; then
|
||||
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
|
||||
fi
|
||||
|
||||
# copy test data
|
||||
cp -r /tmp/snakemake-wrappers/bio/seqtk/test/* src/seqtk/test_data
|
||||
|
||||
rm src/seqtk/test_data/Snakefile
|
||||
File diff suppressed because it is too large
Load Diff
@@ -11,6 +11,11 @@ references:
|
||||
license: MIT
|
||||
requirements:
|
||||
commands: [ STAR, python, ps, zcat, bzcat ]
|
||||
authors:
|
||||
- __merge__: /src/_authors/angela_o_pisco.yaml
|
||||
roles: [ author ]
|
||||
- __merge__: /src/_authors/robrecht_cannoodt.yaml
|
||||
roles: [ author, maintainer ]
|
||||
# manually taking care of the main input and output arguments
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
@@ -113,6 +118,8 @@ engines:
|
||||
rm -rf /tmp/STAR-${STAR_VERSION} /tmp/${STAR_VERSION}.zip && \
|
||||
apt-get --purge autoremove -y ${PACKAGES} && \
|
||||
apt-get clean
|
||||
- type: python
|
||||
packages: [ pyyaml ]
|
||||
- type: docker
|
||||
run: |
|
||||
STAR --version | sed 's#\(.*\)#star: "\1"#' > /var/software_versions.txt
|
||||
|
||||
@@ -2,6 +2,7 @@ import tempfile
|
||||
import subprocess
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
import yaml
|
||||
|
||||
## VIASH START
|
||||
par = {
|
||||
@@ -18,10 +19,20 @@ par = {
|
||||
}
|
||||
meta = {
|
||||
"cpus": 8,
|
||||
"temp_dir": "/tmp"
|
||||
"temp_dir": "/tmp",
|
||||
"config": "target/executable/star/star_align_reads/.config.vsh.yaml",
|
||||
}
|
||||
## VIASH END
|
||||
|
||||
# read config
|
||||
with open(meta["config"], 'r') as stream:
|
||||
config = yaml.safe_load(stream)
|
||||
all_arguments = {
|
||||
arg["name"].lstrip('-'): arg
|
||||
for argument_group in config["argument_groups"]
|
||||
for arg in argument_group["arguments"]
|
||||
}
|
||||
|
||||
##################################################
|
||||
# check and process SE / PE R1 input files
|
||||
input_r1 = par["input"]
|
||||
@@ -87,8 +98,13 @@ with tempfile.TemporaryDirectory(prefix="star-", dir=meta["temp_dir"], ignore_cl
|
||||
cmd_args = [ "STAR" ]
|
||||
for name, value in par.items():
|
||||
if value is not None:
|
||||
if name in all_arguments:
|
||||
arg_info = all_arguments[name].get("info", {})
|
||||
cli_name = arg_info.get("orig_name", f"--{name}")
|
||||
else:
|
||||
cli_name = f"--{name}"
|
||||
val_to_add = value if isinstance(value, list) else [value]
|
||||
cmd_args.extend([f"--{name}"] + [str(x) for x in val_to_add])
|
||||
cmd_args.extend([cli_name] + [str(x) for x in val_to_add])
|
||||
print("", flush=True)
|
||||
|
||||
# run command
|
||||
|
||||
@@ -7,35 +7,34 @@ meta_executable="target/docker/star/star_align_reads/star_align_reads"
|
||||
meta_resources_dir="src/star/star_align_reads"
|
||||
## VIASH END
|
||||
|
||||
#########################################################################################
|
||||
|
||||
#############################################
|
||||
# helper functions
|
||||
assert_file_exists() {
|
||||
[ -f "$1" ] || (echo "File '$1' does not exist" && exit 1)
|
||||
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
|
||||
}
|
||||
assert_file_doesnt_exist() {
|
||||
[ ! -f "$1" ] || (echo "File '$1' exists but shouldn't" && exit 1)
|
||||
[ ! -f "$1" ] || { echo "File '$1' exists but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_empty() {
|
||||
[ ! -s "$1" ] || (echo "File '$1' is not empty but should be" && exit 1)
|
||||
[ ! -s "$1" ] || { echo "File '$1' is not empty but should be" && exit 1; }
|
||||
}
|
||||
assert_file_not_empty() {
|
||||
[ -s "$1" ] || (echo "File '$1' is empty but shouldn't be" && exit 1)
|
||||
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
|
||||
}
|
||||
assert_file_contains() {
|
||||
grep -q "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
|
||||
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains() {
|
||||
grep -q "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
|
||||
grep -q "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
assert_file_contains_regex() {
|
||||
grep -q -E "$2" "$1" || (echo "File '$1' does not contain '$2'" && exit 1)
|
||||
grep -q -E "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
|
||||
}
|
||||
assert_file_not_contains_regex() {
|
||||
grep -q -E "$2" "$1" && (echo "File '$1' contains '$2' but shouldn't" && exit 1)
|
||||
grep -q -E "$2" "$1" && { echo "File '$1' contains '$2' but shouldn't" && exit 1; }
|
||||
}
|
||||
#############################################
|
||||
|
||||
#########################################################################################
|
||||
echo "> Prepare test data"
|
||||
|
||||
cat > reads_R1.fastq <<'EOF'
|
||||
@@ -89,14 +88,14 @@ cd star_align_reads_se
|
||||
echo "> Run star_align_reads on SE"
|
||||
"$meta_executable" \
|
||||
--input "../reads_R1.fastq" \
|
||||
--genomeDir "../index/" \
|
||||
--genome_dir "../index/" \
|
||||
--aligned_reads "output.sam" \
|
||||
--log "log.txt" \
|
||||
--outReadsUnmapped "Fastx" \
|
||||
--out_reads_unmapped "Fastx" \
|
||||
--unmapped "unmapped.sam" \
|
||||
--quantMode "TranscriptomeSAM;GeneCounts" \
|
||||
--quant_mode "TranscriptomeSAM;GeneCounts" \
|
||||
--reads_per_gene "reads_per_gene.tsv" \
|
||||
--outSJtype Standard \
|
||||
--out_sj_type Standard \
|
||||
--splice_junctions "splice_junctions.tsv" \
|
||||
--reads_aligned_to_transcriptome "transcriptome_aligned.bam" \
|
||||
${meta_cpus:+---cpus $meta_cpus}
|
||||
@@ -144,10 +143,10 @@ echo ">> Run star_align_reads on PE"
|
||||
"$meta_executable" \
|
||||
--input ../reads_R1.fastq \
|
||||
--input_r2 ../reads_R2.fastq \
|
||||
--genomeDir ../index/ \
|
||||
--genome_dir ../index/ \
|
||||
--aligned_reads output.bam \
|
||||
--log log.txt \
|
||||
--outReadsUnmapped Fastx \
|
||||
--out_reads_unmapped Fastx \
|
||||
--unmapped unmapped_r1.bam \
|
||||
--unmapped_r2 unmapped_r2.bam \
|
||||
${meta_cpus:+---cpus $meta_cpus}
|
||||
|
||||
@@ -14,6 +14,14 @@ param_txt <- iconv(param_txt, "UTF-8", "ASCII//TRANSLIT")
|
||||
dev_begin <- grep("#####UnderDevelopment_begin", param_txt)
|
||||
dev_end <- grep("#####UnderDevelopment_end", param_txt)
|
||||
|
||||
camel_case_to_snake_case <- function(x) {
|
||||
x %>%
|
||||
str_replace_all("([A-Z][A-Z][A-Z]*)", "_\\1_") %>%
|
||||
str_replace_all("([a-z])([A-Z])", "\\1_\\2") %>%
|
||||
str_to_lower() %>%
|
||||
str_replace_all("_$", "")
|
||||
}
|
||||
|
||||
# strip development sections
|
||||
nondev_ix <- unlist(map2(c(1, dev_end + 1), c(dev_begin - 1, length(param_txt)), function(i, j) {
|
||||
if (i >= 1 && i < j) {
|
||||
@@ -128,9 +136,8 @@ out2 <- out %>%
|
||||
# remove arguments that are related to a different runmode
|
||||
filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description)) %>%
|
||||
filter(!grepl("--runMode", group_name) | grepl("--runMode alignReads", group_name)) %>%
|
||||
filter(!grepl("STARsolo", group_name)) %>%
|
||||
mutate(
|
||||
viash_arg = paste0("--", name),
|
||||
viash_arg = paste0("--", camel_case_to_snake_case(name)),
|
||||
type_step1 = type %>%
|
||||
str_replace_all(".*(int, string|string|int|real|double)\\(?(s?).*", "\\1\\2"),
|
||||
viash_type = type_map[gsub("(int, string|string|int|real|double).*", "\\1", type_step1)],
|
||||
@@ -155,28 +162,41 @@ out2 <- out %>%
|
||||
group_name = gsub(" - .*", "", group_name),
|
||||
required = ifelse(name %in% required_args, TRUE, NA)
|
||||
)
|
||||
print(out2, n = 200)
|
||||
out2 %>% mutate(i = row_number()) %>%
|
||||
# filter(is.na(default_step1) != is.na(viash_default)) %>%
|
||||
|
||||
# change references to argument names
|
||||
out3 <- out2
|
||||
for (i in seq_len(nrow(out2))) {
|
||||
orig_name <- paste0("--", out2$name[[i]])
|
||||
new_name <- out2$viash_arg[[i]]
|
||||
out3$description <- str_replace_all(out3$description, orig_name, new_name)
|
||||
}
|
||||
|
||||
# sanity checks
|
||||
out3 %>% select(name, viash_arg) %>% as.data.frame()
|
||||
print(out3, n = 200)
|
||||
out3 %>%
|
||||
mutate(i = row_number()) %>%
|
||||
select(-group_name, -description)
|
||||
out3 %>% filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description))
|
||||
|
||||
out2 %>% filter(!grepl("--runMode", description) | grepl("--runMode alignReads", description))
|
||||
|
||||
argument_groups <- map(unique(out2$group_name), function(group_name) {
|
||||
args <- out2 %>%
|
||||
# create argument groups
|
||||
argument_groups <- map(unique(out3$group_name), function(group_name) {
|
||||
args <- out3 %>%
|
||||
filter(group_name == !!group_name) %>%
|
||||
pmap(function(viash_arg, viash_type, multiple, viash_default, description, required, ...) {
|
||||
li <- lst(
|
||||
pmap(function(viash_arg, viash_type, multiple, viash_default, description, required, name, ...) {
|
||||
li <- list(
|
||||
name = viash_arg,
|
||||
type = viash_type,
|
||||
description = description
|
||||
description = description,
|
||||
info = list(
|
||||
orig_name = paste0("--", name)
|
||||
)
|
||||
)
|
||||
if (all(!is.na(viash_default))) {
|
||||
li$example <- viash_default
|
||||
}
|
||||
if (!is.na(multiple) && multiple) {
|
||||
li$multiple <- multiple
|
||||
li$multiple_sep <- ";"
|
||||
}
|
||||
if (!is.na(required) && required) {
|
||||
li$required <- required
|
||||
@@ -186,4 +206,10 @@ argument_groups <- map(unique(out2$group_name), function(group_name) {
|
||||
list(name = group_name, arguments = args)
|
||||
})
|
||||
|
||||
yaml::write_yaml(list(argument_groups = argument_groups), yaml_file)
|
||||
yaml::write_yaml(
|
||||
list(argument_groups = argument_groups),
|
||||
yaml_file,
|
||||
handlers = list(
|
||||
logical = yaml::verbatim_logical
|
||||
)
|
||||
)
|
||||
|
||||
@@ -11,75 +11,74 @@ references:
|
||||
license: MIT
|
||||
requirements:
|
||||
commands: [ STAR ]
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/sai_nirmayi_yasa.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: "Input"
|
||||
arguments:
|
||||
- name: "--genomeFastaFiles"
|
||||
- name: "--genome_fasta_files"
|
||||
type: file
|
||||
description: |
|
||||
Path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped.
|
||||
required: true
|
||||
multiple: yes
|
||||
multiple_sep: ;
|
||||
- name: "--sjdbGTFfile"
|
||||
multiple: true
|
||||
- name: "--sjdb_gtf_file"
|
||||
type: file
|
||||
description: Path to the GTF file with annotations
|
||||
- name: --sjdbOverhang
|
||||
- name: --sjdb_overhang
|
||||
type: integer
|
||||
description: Length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)
|
||||
example: 100
|
||||
- name: --sjdbGTFchrPrefix
|
||||
- name: --sjdb_gtf_chr_prefix
|
||||
type: string
|
||||
description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes)
|
||||
- name: --sjdbGTFfeatureExon
|
||||
- name: --sjdb_gtf_feature_exon
|
||||
type: string
|
||||
description: Feature type in GTF file to be used as exons for building transcripts
|
||||
example: exon
|
||||
- name: --sjdbGTFtagExonParentTranscript
|
||||
- name: --sjdb_gtf_tag_exon_parent_transcript
|
||||
type: string
|
||||
description: GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files)
|
||||
example: transcript_id
|
||||
- name: --sjdbGTFtagExonParentGene
|
||||
- name: --sjdb_gtf_tag_exon_parent_gene
|
||||
type: string
|
||||
description: GTF attribute name for parent gene ID (default "gene_id" works for GTF files)
|
||||
example: gene_id
|
||||
- name: --sjdbGTFtagExonParentGeneName
|
||||
- name: --sjdb_gtf_tag_exon_parent_gene_name
|
||||
type: string
|
||||
description: GTF attribute name for parent gene name
|
||||
example: gene_name
|
||||
multiple: yes
|
||||
multiple_sep: ;
|
||||
- name: --sjdbGTFtagExonParentGeneType
|
||||
multiple: true
|
||||
- name: --sjdb_gtf_tag_exon_parent_gene_type
|
||||
type: string
|
||||
description: GTF attribute name for parent gene type
|
||||
example:
|
||||
- gene_type
|
||||
- gene_biotype
|
||||
multiple: yes
|
||||
multiple_sep: ;
|
||||
- name: --limitGenomeGenerateRAM
|
||||
multiple: true
|
||||
- name: --limit_genome_generate_ram
|
||||
type: long
|
||||
description: Maximum available RAM (bytes) for genome generation
|
||||
example: '31000000000'
|
||||
- name: --genomeSAindexNbases
|
||||
example: 31000000000
|
||||
- name: --genome_sa_index_nbases
|
||||
type: integer
|
||||
description: Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, this parameter must be scaled down to min(14, log2(GenomeLength)/2 - 1).
|
||||
example: 14
|
||||
- name: --genomeChrBinNbits
|
||||
- name: --genome_chr_bin_nbits
|
||||
type: integer
|
||||
description: Defined as log2(chrBin), where chrBin is the size of the bins for genome storage. Each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
|
||||
example: 18
|
||||
- name: --genomeSAsparseD
|
||||
- name: --genome_sa_sparse_d
|
||||
type: integer
|
||||
min: 0
|
||||
example: 1
|
||||
description: Suffux array sparsity, i.e. distance between indices. Use bigger numbers to decrease needed RAM at the cost of mapping speed reduction.
|
||||
- name: --genomeSuffixLengthMax
|
||||
- name: --genome_suffix_length_max
|
||||
type: integer
|
||||
description: Maximum length of the suffixes, has to be longer than read length. Use -1 for infinite length.
|
||||
example: -1
|
||||
- name: --genomeTransformType
|
||||
- name: --genome_transform_type
|
||||
type: string
|
||||
description: |
|
||||
Type of genome transformation
|
||||
@@ -87,7 +86,7 @@ argument_groups:
|
||||
Haploid ... replace reference alleles with alternative alleles from VCF file (e.g. consensus allele)
|
||||
Diploid ... create two haplotypes for each chromosome listed in VCF file, for genotypes 1|2, assumes perfect phasing (e.g. personal genome)
|
||||
example: None
|
||||
- name: --genomeTransformVCF
|
||||
- name: --genome_transform_vcf
|
||||
type: file
|
||||
description: path to VCF file for genome transformation
|
||||
|
||||
|
||||
@@ -10,20 +10,20 @@ mkdir -p $par_index
|
||||
STAR \
|
||||
--runMode genomeGenerate \
|
||||
--genomeDir $par_index \
|
||||
--genomeFastaFiles $par_genomeFastaFiles \
|
||||
--genomeFastaFiles $par_genome_fasta_files \
|
||||
${meta_cpus:+--runThreadN "${meta_cpus}"} \
|
||||
${par_sjdbGTFfile:+--sjdbGTFfile "${par_sjdbGTFfile}"} \
|
||||
${par_sjdb_gtf_file:+--sjdbGTFfile "${par_sjdb_gtf_file}"} \
|
||||
${par_sjdbOverhang:+--sjdbOverhang "${par_sjdbOverhang}"} \
|
||||
${par_genomeSAindexNbases:+--genomeSAindexNbases "${par_genomeSAindexNbases}"} \
|
||||
${par_sjdbGTFchrPrefix:+--sjdbGTFchrPrefix "${par_sjdbGTFchrPrefix}"} \
|
||||
${par_sjdbGTFfeatureExon:+--sjdbGTFfeatureExon "${par_sjdbGTFfeatureExon}"} \
|
||||
${par_sjdbGTFtagExonParentTranscript:+--sjdbGTFtagExonParentTranscript "${par_sjdbGTFtagExonParentTranscript}"} \
|
||||
${par_sjdbGTFtagExonParentGene:+--sjdbGTFtagExonParentGene "${par_sjdbGTFtagExonParentGene}"} \
|
||||
${par_sjdbGTFtagExonParentGeneName:+--sjdbGTFtagExonParentGeneName "${par_sjdbGTFtagExonParentGeneName}"} \
|
||||
${par_sjdbGTFtagExonParentGeneType:+--sjdbGTFtagExonParentGeneType "${sjdbGTFtagExonParentGeneType}"} \
|
||||
${par_limitGenomeGenerateRAM:+--limitGenomeGenerateRAM "${par_limitGenomeGenerateRAM}"} \
|
||||
${par_genomeChrBinNbits:+--genomeChrBinNbits "${par_genomeChrBinNbits}"} \
|
||||
${par_genomeSAsparseD:+--genomeSAsparseD "${par_genomeSAsparseD}"} \
|
||||
${par_genomeSuffixLengthMax:+--genomeSuffixLengthMax "${par_genomeSuffixLengthMax}"} \
|
||||
${par_genomeTransformType:+--genomeTransformType "${par_genomeTransformType}"} \
|
||||
${par_genomeTransformVCF:+--genomeTransformVCF "${par_genomeTransformVCF}"} \
|
||||
${par_genome_sa_index_nbases:+--genomeSAindexNbases "${par_genome_sa_index_nbases}"} \
|
||||
${par_sjdb_gtf_chr_prefix:+--sjdbGTFchrPrefix "${par_sjdb_gtf_chr_prefix}"} \
|
||||
${par_sjdb_gtf_feature_exon:+--sjdbGTFfeatureExon "${par_sjdb_gtf_feature_exon}"} \
|
||||
${par_sjdb_gtf_tag_exon_parent_transcript:+--sjdbGTFtag_exon_parent_transcript "${par_sjdb_gtf_tag_exon_parent_transcript}"} \
|
||||
${par_sjdb_gtf_tag_exon_parent_gene:+--sjdbGTFtag_exon_parent_gene "${par_sjdb_gtf_tag_exon_parent_gene}"} \
|
||||
${par_sjdb_gtf_tag_exon_parent_geneName:+--sjdbGTFtag_exon_parent_geneName "${par_sjdb_gtf_tag_exon_parent_geneName}"} \
|
||||
${par_sjdb_gtf_tag_exon_parent_geneType:+--sjdbGTFtag_exon_parent_geneType "${sjdbGTFtag_exon_parent_geneType}"} \
|
||||
${par_limit_genome_generate_ram:+--limitGenomeGenerateRAM "${par_limit_genome_generate_ram}"} \
|
||||
${par_genome_chr_bin_nbits:+--genomeChrBinNbits "${par_genome_chr_bin_nbits}"} \
|
||||
${par_genome_sa_sparse_d:+--genomeSAsparseD "${par_genome_sa_sparse_d}"} \
|
||||
${par_genome_suffix_length_max:+--genomeSuffixLengthMax "${par_genome_suffix_length_max}"} \
|
||||
${par_genome_transform_type:+--genomeTransformType "${par_genome_transform_type}"} \
|
||||
${par_genome_transform_vcf:+--genomeTransformVCF "${par_genome_transform_vCF}"} \
|
||||
|
||||
@@ -27,9 +27,9 @@ echo "> Generate index"
|
||||
"$meta_executable" \
|
||||
${meta_cpus:+---cpus $meta_cpus} \
|
||||
--index "star_index/" \
|
||||
--genomeFastaFiles "genome.fasta" \
|
||||
--sjdbGTFfile "genes.gtf" \
|
||||
--genomeSAindexNbases 2
|
||||
--genome_fasta_files "genome.fasta" \
|
||||
--sjdb_gtf_file "genes.gtf" \
|
||||
--genome_sa_index_nbases 4
|
||||
|
||||
files=("Genome" "Log.out" "SA" "SAindex" "chrLength.txt" "chrName.txt" "chrNameLength.txt" "chrStart.txt" "exonGeTrInfo.tab" "exonInfo.tab" "geneInfo.tab" "genomeParameters.txt" "sjdbInfo.txt" "sjdbList.fromGTF.out.tab" "sjdbList.out.tab" "transcriptInfo.tab")
|
||||
|
||||
|
||||
@@ -10,7 +10,9 @@ links:
|
||||
references:
|
||||
doi: 10.1101/gr.209601.116
|
||||
license: MIT
|
||||
|
||||
authors:
|
||||
- __merge__: /src/_authors/emma_rousseau.yaml
|
||||
roles: [ author, maintainer ]
|
||||
argument_groups:
|
||||
- name: Inputs
|
||||
arguments:
|
||||
|
||||
197
src/umi_tools/umi_tools_extract/config.vsh.yaml
Normal file
197
src/umi_tools/umi_tools_extract/config.vsh.yaml
Normal file
@@ -0,0 +1,197 @@
|
||||
name: umi_tools_extract
|
||||
namespace: umi_tools
|
||||
description: |
|
||||
Flexible removal of UMI sequences from fastq reads.
|
||||
UMIs are removed and appended to the read name. Any other barcode, for example a library barcode,
|
||||
is left on the read. Can also filter reads by quality or against a whitelist.
|
||||
keywords: [ extract, umi-tools, umi, fastq ]
|
||||
links:
|
||||
homepage: https://umi-tools.readthedocs.io/en/latest/
|
||||
documentation: https://umi-tools.readthedocs.io/en/latest/reference/extract.html
|
||||
repository: https://github.com/CGATOxford/UMI-tools
|
||||
references:
|
||||
doi: 10.1101/gr.209601.116
|
||||
license: MIT
|
||||
|
||||
argument_groups:
|
||||
|
||||
- name: Input
|
||||
arguments:
|
||||
- name: --input
|
||||
type: file
|
||||
required: true
|
||||
description: File containing the input data.
|
||||
example: sample.fastq
|
||||
- name: --read2_in
|
||||
type: file
|
||||
required: false
|
||||
description: File containing the input data for the R2 reads (if paired). If provided, a <list of other required arguments> need to be provided.
|
||||
example: sample_R2.fastq
|
||||
- name: --bc_pattern
|
||||
alternatives: -p
|
||||
type: string
|
||||
description: |
|
||||
The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides
|
||||
of the read are from the UMI.
|
||||
- name: --bc_pattern2
|
||||
type: string
|
||||
description: The UMI barcode pattern to use for read 2.
|
||||
|
||||
- name: "Output"
|
||||
arguments:
|
||||
- name: --output
|
||||
type: file
|
||||
required: true
|
||||
description: Output file for read 1.
|
||||
direction: output
|
||||
- name: --read2_out
|
||||
type: file
|
||||
description: Output file for read 2.
|
||||
direction: output
|
||||
- name: --filtered_out
|
||||
type: file
|
||||
description: |
|
||||
Write out reads not matching regex pattern or cell barcode whitelist to this file.
|
||||
- name: --filtered_out2
|
||||
type: file
|
||||
description: |
|
||||
Write out read pairs not matching regex pattern or cell barcode whitelist to this file.
|
||||
|
||||
- name: Extract Options
|
||||
arguments:
|
||||
- name: --extract_method
|
||||
type: string
|
||||
choices: [string, regex]
|
||||
description: |
|
||||
UMI pattern to use. Default: `string`.
|
||||
example: "string"
|
||||
- name: --error_correct_cell
|
||||
type: boolean_true
|
||||
description: Error correct cell barcodes to the whitelist.
|
||||
- name: --whitelist
|
||||
type: file
|
||||
description: |
|
||||
Whitelist of accepted cell barcodes tab-separated format, where column 1 is the whitelisted
|
||||
cell barcodes and column 2 is the list (comma-separated) of other cell barcodes which should
|
||||
be corrected to the barcode in column 1. If the --error_correct_cell option is not used, this
|
||||
column will be ignored.
|
||||
- name: --blacklist
|
||||
type: file
|
||||
description: BlackWhitelist of cell barcodes to discard.
|
||||
- name: --subset_reads
|
||||
type: integer
|
||||
description: Only parse the first N reads.
|
||||
- name: --quality_filter_threshold
|
||||
type: integer
|
||||
description: Remove reads where any UMI base quality score falls below this threshold.
|
||||
- name: --quality_filter_mask
|
||||
type: string
|
||||
description: |
|
||||
If a UMI base has a quality below this threshold, replace the base with 'N'.
|
||||
- name: --quality_encoding
|
||||
type: string
|
||||
choices: [phred33, phred64, solexa]
|
||||
description: |
|
||||
Quality score encoding. Choose from:
|
||||
* phred33 [33-77]
|
||||
* phred64 [64-106]
|
||||
* solexa [59-106]
|
||||
- name: --reconcile_pairs
|
||||
type: boolean_true
|
||||
description: |
|
||||
Allow read 2 infile to contain reads not in read 1 infile. This enables support for upstream protocols
|
||||
where read one contains cell barcodes, and the read pairs have been filtered and corrected without regard
|
||||
to the read2.
|
||||
- name: --three_prime
|
||||
alternatives: --3prime
|
||||
type: boolean_true
|
||||
description: |
|
||||
By default the barcode is assumed to be on the 5' end of the read, but use this option to sepecify that it is
|
||||
on the 3' end instead. This option only works with --extract_method=string since 3' encoding can be specified
|
||||
explicitly with a regex, e.g `.*(?P<umi_1>.{5})$`.
|
||||
- name: --ignore_read_pair_suffixes
|
||||
type: boolean_true
|
||||
description: |
|
||||
Ignore "/1" and "/2" read name suffixes. Note that this options is required if the suffixes are not whitespace
|
||||
separated from the rest of the read name.
|
||||
arguments:
|
||||
- name: --umi_separator
|
||||
type: string
|
||||
description: |
|
||||
The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with
|
||||
UMI-tools and used other software. Default: `_`
|
||||
example: "_"
|
||||
- name: --grouping_method
|
||||
type: string
|
||||
choices: [unique, percentile, cluster, adjacency, directional]
|
||||
description: |
|
||||
Method to use to determine read groups by subsuming those with similar UMIs. All methods start by identifying
|
||||
the reads with the same mapping position, but treat similar yet nonidentical UMIs differently. Default: `directional`
|
||||
example: "directional"
|
||||
- name: --umi_discard_read
|
||||
type: integer
|
||||
choices: [0, 1, 2]
|
||||
description: |
|
||||
After UMI barcode extraction discard either R1 or R2 by setting this parameter to 1 or 2, respectively. Default: `0`
|
||||
example: 0
|
||||
|
||||
- name: Common Options
|
||||
arguments:
|
||||
- name: --log
|
||||
type: file
|
||||
description: File with logging information.
|
||||
direction: output
|
||||
- name: --log2stderr
|
||||
type: boolean_true
|
||||
description: Send logging information to stderr.
|
||||
direction: output
|
||||
- name: --verbose
|
||||
type: integer
|
||||
description: Log level. The higher, the more output.
|
||||
- name: --error
|
||||
type: file
|
||||
description: File with error information.
|
||||
direction: output
|
||||
- name: --temp_dir
|
||||
type: string
|
||||
description: |
|
||||
Directory for temporary files. If not set, the bash environmental variable TMPDIR is used.
|
||||
- name: --compresslevel
|
||||
type: integer
|
||||
description: |
|
||||
Level of Gzip compression to use. Default=6 matches GNU gzip rather than python gzip default (which is 9).
|
||||
Default `6`.
|
||||
example: 6
|
||||
- name: --timeit
|
||||
type: file
|
||||
description: Store timing information in file.
|
||||
direction: output
|
||||
- name: --timeit_name
|
||||
type: string
|
||||
description: Name in timing file for this class of jobs.
|
||||
default: all
|
||||
- name: --timeit_header
|
||||
type: boolean_true
|
||||
description: Add header for timing information.
|
||||
- name: --random_seed
|
||||
type: integer
|
||||
description: Random seed to initialize number generator with.
|
||||
|
||||
resources:
|
||||
- type: bash_script
|
||||
path: script.sh
|
||||
test_resources:
|
||||
- type: bash_script
|
||||
path: test.sh
|
||||
- type: file
|
||||
path: test_data
|
||||
engines:
|
||||
- type: docker
|
||||
image: quay.io/biocontainers/umi_tools:1.1.4--py310h4b81fae_2
|
||||
setup:
|
||||
- type: docker
|
||||
run: |
|
||||
umi_tools -v | sed 's/ version//g' > /var/software_versions.txt
|
||||
runners:
|
||||
- type: executable
|
||||
- type: nextflow
|
||||
106
src/umi_tools/umi_tools_extract/help.txt
Normal file
106
src/umi_tools/umi_tools_extract/help.txt
Normal file
@@ -0,0 +1,106 @@
|
||||
'''
|
||||
Generated from the following UMI-tools documentation:
|
||||
https://umi-tools.readthedocs.io/en/latest/common_options.html#common-options
|
||||
https://umi-tools.readthedocs.io/en/latest/reference/extract.html
|
||||
'''
|
||||
|
||||
extract - Extract UMI from fastq
|
||||
|
||||
Usage:
|
||||
|
||||
Single-end:
|
||||
umi_tools extract [OPTIONS] -p PATTERN [-I IN_FASTQ[.gz]] [-S OUT_FASTQ[.gz]]
|
||||
|
||||
Paired end:
|
||||
umi_tools extract [OPTIONS] -p PATTERN [-I IN_FASTQ[.gz]] [-S OUT_FASTQ[.gz]] --read2-in=IN2_FASTQ[.gz] --read2-out=OUT2_FASTQ[.gz]
|
||||
|
||||
note: If -I/-S are ommited standard in and standard out are used
|
||||
for input and output. To generate a valid BAM file on
|
||||
standard out, please redirect log with --log=LOGFILE or
|
||||
--log2stderr. Input/Output will be (de)compressed if a
|
||||
filename provided to -S/-I/--read2-in/read2-out ends in .gz
|
||||
|
||||
Common UMI-tools Options:
|
||||
|
||||
-S, --stdout File where output is to go [default = stdout].
|
||||
-L, --log File with logging information [default = stdout].
|
||||
--log2stderr Send logging information to stderr [default = False].
|
||||
-v, --verbose Log level. The higher, the more output [default = 1].
|
||||
-E, --error File with error information [default = stderr].
|
||||
--temp-dir Directory for temporary files. If not set, the bash environmental variable TMPDIR is used[default = None].
|
||||
--compresslevel Level of Gzip compression to use. Default=6 matches GNU gzip rather than python gzip default (which is 9)
|
||||
|
||||
profiling and debugging options:
|
||||
--timeit Store timing information in file [default=none].
|
||||
--timeit-name Name in timing file for this class of jobs [default=all].
|
||||
--timeit-header Add header for timing information [default=none].
|
||||
--random-seed Random seed to initialize number generator with [default=none].
|
||||
|
||||
Extract Options:
|
||||
-I, --stdin File containing the input data [default = stdin].
|
||||
--error-correct-cell Error correct cell barcodes to the whitelist (see --whitelist)
|
||||
--whitelist Whitelist of accepted cell barcodes. The whitelist should be in the following format (tab-separated):
|
||||
AAAAAA AGAAAA
|
||||
AAAATC
|
||||
AAACAT
|
||||
AAACTA AAACTN,GAACTA
|
||||
AAATAC
|
||||
AAATCA GAATCA
|
||||
AAATGT AAAGGT,CAATGT
|
||||
Where column 1 is the whitelisted cell barcodes and column 2 is the list (comma-separated) of other cell
|
||||
barcodes which should be corrected to the barcode in column 1. If the --error-correct-cell option is not
|
||||
used, this column will be ignored. Any additional columns in the whitelist input, such as the counts columns
|
||||
from the output of umi_tools whitelist, will be ignored.
|
||||
--blacklist BlackWhitelist of cell barcodes to discard
|
||||
--subset-reads=[N] Only parse the first N reads
|
||||
--quality-filter-threshold Remove reads where any UMI base quality score falls below this threshold
|
||||
--quality-filter-mask If a UMI base has a quality below this threshold, replace the base with 'N'
|
||||
--quality-encoding Quality score encoding. Choose from:
|
||||
'phred33' [33-77]
|
||||
'phred64' [64-106]
|
||||
'solexa' [59-106]
|
||||
--reconcile-pairs Allow read 2 infile to contain reads not in read 1 infile. This enables support for upstream protocols
|
||||
where read one contains cell barcodes, and the read pairs have been filtered and corrected without regard
|
||||
to the read2s.
|
||||
|
||||
Experimental options:
|
||||
Note: These options have not been extensively testing to ensure behaviour is as expected. If you have some suitable input files which
|
||||
we can use for testing, please contact us.
|
||||
If you have a library preparation method where the UMI may be in either read, you can use the following options to search for the
|
||||
UMI in either read:
|
||||
|
||||
--either-read --extract-method --bc-pattern=[PATTERN1] --bc-pattern2=[PATTERN2]
|
||||
|
||||
Where both patterns match, the default behaviour is to discard both reads. If you want to select the read with the UMI with highest
|
||||
sequence quality, provide --either-read-resolve=quality.
|
||||
|
||||
|
||||
--bc-pattern Pattern for barcode(s) on read 1. See --extract-method
|
||||
--bc-pattern2 Pattern for barcode(s) on read 2. See --extract-method
|
||||
--extract-method There are two methods enabled to extract the umi barcode (+/- cell barcode). For both methods, the patterns
|
||||
should be provided using the --bc-pattern and --bc-pattern2 options.x
|
||||
string:
|
||||
This should be used where the barcodes are always in the same place in the read.
|
||||
N = UMI position (required)
|
||||
C = cell barcode position (optional)
|
||||
X = sample position (optional)
|
||||
Bases with Ns and Cs will be extracted and added to the read name. The corresponding sequence qualities will
|
||||
be removed from the read. Bases with an X will be reattached to the read.
|
||||
regex:
|
||||
This method allows for more flexible barcode extraction and should be used where the cell barcodes are variable
|
||||
in length. Alternatively, the regex option can also be used to filter out reads which do not contain an expected
|
||||
adapter sequence. The regex must contain groups to define how the barcodes are encoded in the read.
|
||||
The expected groups in the regex are:
|
||||
umi_n = UMI positions, where n can be any value (required)
|
||||
cell_n = cell barcode positions, where n can be any value (optional)
|
||||
discard_n = positions to discard, where n can be any value (optional)
|
||||
--3prime By default the barcode is assumed to be on the 5' end of the read, but use this option to sepecify that it is
|
||||
on the 3' end instead. This option only works with --extract-method=string since 3' encoding can be specified
|
||||
explicitly with a regex, e.g .*(?P<umi_1>.{5})$
|
||||
--read2-in Filename for read pairs
|
||||
--filtered-out Write out reads not matching regex pattern or cell barcode whitelist to this file
|
||||
--filtered-out2 Write out read pairs not matching regex pattern or cell barcode whitelist to this file
|
||||
--ignore-read-pair-suffixes Ignore SOH and STX read name suffixes. Note that this options is required if the suffixes are not whitespace
|
||||
separated from the rest of the read name
|
||||
|
||||
For full UMI-tools documentation, see https://umi-tools.readthedocs.io/en/latest/
|
||||
88
src/umi_tools/umi_tools_extract/script.sh
Normal file
88
src/umi_tools/umi_tools_extract/script.sh
Normal file
@@ -0,0 +1,88 @@
|
||||
#!/bin/bash
|
||||
|
||||
## VIASH START
|
||||
## VIASH END
|
||||
|
||||
set -exo pipefail
|
||||
|
||||
test_dir="${metal_executable}/test_data"
|
||||
|
||||
[[ "$par_error_correct_cell" == "false" ]] && unset par_error_correct_cell
|
||||
[[ "$par_reconcile_pairs" == "false" ]] && unset par_reconcile_pairs
|
||||
[[ "$par_three_prime" == "false" ]] && unset par_three_prime
|
||||
[[ "$par_ignore_read_pair_suffixes" == "false" ]] && unset par_ignore_read_pair_suffixes
|
||||
[[ "$par_timeit_header" == "false" ]] && unset par_timeit_header
|
||||
[[ "$par_log2stderr" == "false" ]] && unset par_log2stderr
|
||||
|
||||
|
||||
# Check if we have the correct number of input files and patterns for paired-end or single-end reads
|
||||
|
||||
# For paired-end rends, check that we have two read files, two patterns
|
||||
# Check for paired-end inputs
|
||||
if [ -n "$par_input" ] && [ -n "$par_read2_in" ]; then
|
||||
# Paired-end checks: Ensure both UMI patterns are provided
|
||||
if [ -z "$par_bc_pattern" ] || [ -z "$par_bc_pattern2" ]; then
|
||||
echo "Paired end input requires two UMI patterns."
|
||||
exit 1
|
||||
fi
|
||||
elif [ -n "$par_input" ]; then
|
||||
# Single-end checks: Ensure no second read or UMI pattern for the second read is provided
|
||||
if [ -n "$par_bc_pattern2" ]; then
|
||||
echo "Single end input requires only one read file and one UMI pattern."
|
||||
exit 1
|
||||
fi
|
||||
# Check that discard_read is not set or set to 0 for single-end reads
|
||||
if [ -n "$par_umi_discard_read" ] && [ "$par_umi_discard_read" != 0 ]; then
|
||||
echo "umi_discard_read is only valid when processing paired end reads."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
# No inputs provided
|
||||
echo "No input files provided."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
||||
|
||||
|
||||
umi_tools extract \
|
||||
-I "$par_input" \
|
||||
${par_read2_in:+ --read2-in "$par_read2_in"} \
|
||||
-S "$par_output" \
|
||||
${par_read2_out:+--read2-out "$par_read2_out"} \
|
||||
${par_extract_method:+--extract-method "$par_extract_method"} \
|
||||
--bc-pattern "$par_bc_pattern" \
|
||||
${par_bc_pattern2:+ --bc-pattern2 "$par_bc_pattern2"} \
|
||||
${par_umi_separator:+--umi-separator "$par_umi_separator"} \
|
||||
${par_output_stats:+--output-stats "$par_output_stats"} \
|
||||
${par_error_correct_cell:+--error-correct-cell} \
|
||||
${par_whitelist:+--whitelist "$par_whitelist"} \
|
||||
${par_blacklist:+--blacklist "$par_blacklist"} \
|
||||
${par_subset_reads:+--subset-reads "$par_subset_reads"} \
|
||||
${par_quality_filter_threshold:+--quality-filter-threshold "$par_quality_filter_threshold"} \
|
||||
${par_quality_filter_mask:+--quality-filter-mask "$par_quality_filter_mask"} \
|
||||
${par_quality_encoding:+--quality-encoding "$par_quality_encoding"} \
|
||||
${par_reconcile_pairs:+--reconcile-pairs} \
|
||||
${par_three_prime:+--3prime} \
|
||||
${par_filtered_out:+--filtered-out "$par_filtered_out"} \
|
||||
${par_filtered_out2:+--filtered-out2 "$par_filtered_out2"} \
|
||||
${par_ignore_read_pair_suffixes:+--ignore-read-pair-suffixes} \
|
||||
${par_random_seed:+--random-seed "$par_random_seed"} \
|
||||
${par_temp_dir:+--temp-dir "$par_temp_dir"} \
|
||||
${par_compresslevel:+--compresslevel "$par_compresslevel"} \
|
||||
${par_timeit:+--timeit "$par_timeit"} \
|
||||
${par_timeit_name:+--timeit-name "$par_timeit_name"} \
|
||||
${par_timeit_header:+--timeit-header} \
|
||||
${par_log:+--log "$par_log"} \
|
||||
${par_log2stderr:+--log2stderr} \
|
||||
${par_verbose:+--verbose "$par_verbose"} \
|
||||
${par_error:+--error "$par_error"}
|
||||
|
||||
|
||||
if [ "$par_umi_discard_read" == 1 ]; then
|
||||
# discard read 1
|
||||
rm "$par_read1_out"
|
||||
elif [ "$par_umi_discard_read" == 2 ]; then
|
||||
# discard read 2 (-f to bypass file existence check)
|
||||
rm -f "$par_read2_out"
|
||||
fi
|
||||
86
src/umi_tools/umi_tools_extract/test.sh
Normal file
86
src/umi_tools/umi_tools_extract/test.sh
Normal file
@@ -0,0 +1,86 @@
|
||||
#!/bin/bash
|
||||
|
||||
test_dir="${meta_resources_dir}/test_data"
|
||||
|
||||
echo ">>> Testing $meta_functionality_name"
|
||||
|
||||
############################################################################################################
|
||||
|
||||
echo ">>> Test 1: Testing for paired-end reads"
|
||||
"$meta_executable" \
|
||||
--input "$test_dir/scrb_seq_fastq.1_30"\
|
||||
--read2_in "$test_dir/scrb_seq_fastq.2_30" \
|
||||
--bc_pattern "CCCCCCNNNNNNNNNN"\
|
||||
--bc_pattern2 "CCCCCCNNNNNNNNNN" \
|
||||
--extract_method string \
|
||||
--umi_separator '_' \
|
||||
--grouping_method directional \
|
||||
--umi_discard_read 0 \
|
||||
--output scrb_seq_fastq.1_30.extract \
|
||||
--read2_out scrb_seq_fastq.2_30.extract \
|
||||
--random_seed 1
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[[ ! -f "scrb_seq_fastq.1_30.extract" ]] || [[ ! -f "scrb_seq_fastq.2_30.extract" ]] && echo "Reads file missing" && exit 1
|
||||
[ ! -s "scrb_seq_fastq.1_30.extract" ] && echo "Read 1 file is empty" && exit 1
|
||||
[ ! -s "scrb_seq_fastq.2_30.extract" ] && echo "Read 2 file is empty" && exit 1
|
||||
|
||||
|
||||
echo ">> Checking if the files are correct"
|
||||
diff -q "${meta_resources_dir}/scrb_seq_fastq.1_30.extract" "$test_dir/scrb_seq_fastq.1_30.extract" || \
|
||||
(echo "Read 1 file is not correct" && exit 1)
|
||||
diff -q "${meta_resources_dir}/scrb_seq_fastq.2_30.extract" "$test_dir/scrb_seq_fastq.2_30.extract" || \
|
||||
(echo "Read 2 file is not correct" && exit 1)
|
||||
|
||||
rm scrb_seq_fastq.1_30.extract scrb_seq_fastq.2_30.extract
|
||||
|
||||
############################################################################################################
|
||||
|
||||
echo ">>> Test 2: Testing for paired-end reads with umi_discard_reads option"
|
||||
"$meta_executable" \
|
||||
--input "$test_dir/scrb_seq_fastq.1_30" \
|
||||
--read2_in "$test_dir/scrb_seq_fastq.2_30" \
|
||||
--bc_pattern CCCCCCNNNNNNNNNN \
|
||||
--bc_pattern2 CCCCCCNNNNNNNNNN \
|
||||
--extract_method string \
|
||||
--umi_separator '_' \
|
||||
--grouping_method directional \
|
||||
--umi_discard_read 2 \
|
||||
--output scrb_seq_fastq.1_30.extract \
|
||||
--random_seed 1
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[ ! -f "scrb_seq_fastq.1_30.extract" ] && echo "Read 1 file is missing" && exit 1
|
||||
[ ! -s "scrb_seq_fastq.1_30.extract" ] && echo "Read 1 file is empty" && exit 1
|
||||
[ -f "scrb_seq_fastq.2_30.extract" ] && echo "Read 2 is not discarded" && exit 1
|
||||
|
||||
echo ">> Checking if the files are correct"
|
||||
diff -q "${meta_resources_dir}/scrb_seq_fastq.1_30.extract" "$test_dir/scrb_seq_fastq.1_30.extract" || \
|
||||
(echo "Read 1 file is not correct" && exit 1)
|
||||
|
||||
rm scrb_seq_fastq.1_30.extract
|
||||
|
||||
############################################################################################################
|
||||
|
||||
echo ">>> Test 3: Testing for single-end reads"
|
||||
"$meta_executable" \
|
||||
--input "$test_dir/slim_30.fastq" \
|
||||
--bc_pattern "^(?P<umi_1>.{3}).{4}(?P<umi_2>.{2})" \
|
||||
--extract_method regex \
|
||||
--umi_separator '_' \
|
||||
--grouping_method directional \
|
||||
--output slim_30.extract \
|
||||
--random_seed 1
|
||||
|
||||
echo ">> Checking if the correct files are present"
|
||||
[ ! -f "slim_30.extract" ] && echo "Trimmed reads file missing" && exit 1
|
||||
[ ! -s "slim_30.extract" ] && echo "Trimmed reads file is empty" && exit 1
|
||||
|
||||
echo ">> Checking if the files are correct"
|
||||
diff -q "${meta_resources_dir}/slim_30.extract" "$test_dir/slim_30.extract" || \
|
||||
(echo "Trimmed reads file is not correct" && exit 1)
|
||||
|
||||
rm slim_30.extract
|
||||
|
||||
echo ">>> Test finished successfully"
|
||||
exit 0
|
||||
120
src/umi_tools/umi_tools_extract/test_data/scrb_seq_fastq.1_30
Normal file
120
src/umi_tools/umi_tools_extract/test_data/scrb_seq_fastq.1_30
Normal file
@@ -0,0 +1,120 @@
|
||||
@SRR1058032.1 HISEQ:653:H12WDADXX:1:1101:1210:2217 length=17
|
||||
AATAACTTCCCGCGTCG
|
||||
+SRR1058032.1 HISEQ:653:H12WDADXX:1:1101:1210:2217 length=17
|
||||
@@@DDDBDDF>FFHGIB
|
||||
@SRR1058032.2 HISEQ:653:H12WDADXX:1:1101:1191:2236 length=17
|
||||
AGCGGGGTGCTCGTCGT
|
||||
+SRR1058032.2 HISEQ:653:H12WDADXX:1:1101:1191:2236 length=17
|
||||
CCCFFFFFHHHHHJJJJ
|
||||
@SRR1058032.3 HISEQ:653:H12WDADXX:1:1101:1715:2245 length=17
|
||||
CTTTAGTACCAGTCCTT
|
||||
+SRR1058032.3 HISEQ:653:H12WDADXX:1:1101:1715:2245 length=17
|
||||
BBCFFDADHHHHHHIJJ
|
||||
@SRR1058032.4 HISEQ:653:H12WDADXX:1:1101:1905:2212 length=17
|
||||
AGGCGTTGTTTTTTTTT
|
||||
+SRR1058032.4 HISEQ:653:H12WDADXX:1:1101:1905:2212 length=17
|
||||
CCCFFFFFHHHHHJJJJ
|
||||
@SRR1058032.5 HISEQ:653:H12WDADXX:1:1101:1927:2237 length=17
|
||||
ATCGAGACATAATTGAT
|
||||
+SRR1058032.5 HISEQ:653:H12WDADXX:1:1101:1927:2237 length=17
|
||||
@B@FFFFFHHHHHJJJJ
|
||||
@SRR1058032.6 HISEQ:653:H12WDADXX:1:1101:1876:2243 length=17
|
||||
TGGGGGCGGTACATGAT
|
||||
+SRR1058032.6 HISEQ:653:H12WDADXX:1:1101:1876:2243 length=17
|
||||
BBBFFFFFHHHHHJJJJ
|
||||
@SRR1058032.7 HISEQ:653:H12WDADXX:1:1101:2491:2207 length=17
|
||||
CTATATGTTTGCGCTGT
|
||||
+SRR1058032.7 HISEQ:653:H12WDADXX:1:1101:2491:2207 length=17
|
||||
1=BDFFFFHHHHHJJJJ
|
||||
@SRR1058032.8 HISEQ:653:H12WDADXX:1:1101:2513:2219 length=17
|
||||
CTCCCGCATGCTGCTGT
|
||||
+SRR1058032.8 HISEQ:653:H12WDADXX:1:1101:2513:2219 length=17
|
||||
?BBFFFFFHHHHHJJJJ
|
||||
@SRR1058032.9 HISEQ:653:H12WDADXX:1:1101:2604:2231 length=17
|
||||
GAGCCCTGAGGGGATCT
|
||||
+SRR1058032.9 HISEQ:653:H12WDADXX:1:1101:2604:2231 length=17
|
||||
1??DDDFD>DFDGFGHG
|
||||
@SRR1058032.10 HISEQ:653:H12WDADXX:1:1101:2936:2218 length=17
|
||||
AGCGGGGTTCGCGGTTT
|
||||
+SRR1058032.10 HISEQ:653:H12WDADXX:1:1101:2936:2218 length=17
|
||||
CCCFFFFFHHHHHJIJI
|
||||
@SRR1058032.11 HISEQ:653:H12WDADXX:1:1101:3447:2241 length=17
|
||||
AGAATTGCCTGGATTTT
|
||||
+SRR1058032.11 HISEQ:653:H12WDADXX:1:1101:3447:2241 length=17
|
||||
@CCFFFFAFHHHGJJJJ
|
||||
@SRR1058032.12 HISEQ:653:H12WDADXX:1:1101:3620:2196 length=17
|
||||
AGGCGGGGCAACGGGTT
|
||||
+SRR1058032.12 HISEQ:653:H12WDADXX:1:1101:3620:2196 length=17
|
||||
CCCFFFFFHHGHHJJHH
|
||||
@SRR1058032.13 HISEQ:653:H12WDADXX:1:1101:3875:2206 length=17
|
||||
GTCCCCGCGTCGTGTAG
|
||||
+SRR1058032.13 HISEQ:653:H12WDADXX:1:1101:3875:2206 length=17
|
||||
@C@FFFFFHFFGHJJJJ
|
||||
@SRR1058032.14 HISEQ:653:H12WDADXX:1:1101:4131:2215 length=17
|
||||
CCACGCATTCACTCGGT
|
||||
+SRR1058032.14 HISEQ:653:H12WDADXX:1:1101:4131:2215 length=17
|
||||
BBBDFFFFHHHHHJJJJ
|
||||
@SRR1058032.15 HISEQ:653:H12WDADXX:1:1101:4284:2241 length=17
|
||||
TGCGCAATAAGCGCTAT
|
||||
+SRR1058032.15 HISEQ:653:H12WDADXX:1:1101:4284:2241 length=17
|
||||
+:=DDDDDBHHGDIBEH
|
||||
@SRR1058032.16 HISEQ:653:H12WDADXX:1:1101:4599:2232 length=17
|
||||
CGCTGGCAGAGCCCGGT
|
||||
+SRR1058032.16 HISEQ:653:H12WDADXX:1:1101:4599:2232 length=17
|
||||
@BCFFFFFHHHHHJJJJ
|
||||
@SRR1058032.17 HISEQ:653:H12WDADXX:1:1101:5428:2200 length=17
|
||||
AGGCGGTGCATAGTCTT
|
||||
+SRR1058032.17 HISEQ:653:H12WDADXX:1:1101:5428:2200 length=17
|
||||
CCCFFFFFHHHHHIJIH
|
||||
@SRR1058032.18 HISEQ:653:H12WDADXX:1:1101:5336:2218 length=17
|
||||
GTCCCCCGCGTGTGACT
|
||||
+SRR1058032.18 HISEQ:653:H12WDADXX:1:1101:5336:2218 length=17
|
||||
<BBFFFFFHHHHHIIJJ
|
||||
@SRR1058032.19 HISEQ:653:H12WDADXX:1:1101:5397:2220 length=17
|
||||
TATAGAAAAAACTTTTT
|
||||
+SRR1058032.19 HISEQ:653:H12WDADXX:1:1101:5397:2220 length=17
|
||||
B@BFDDFFGHHFHIJIJ
|
||||
@SRR1058032.20 HISEQ:653:H12WDADXX:1:1101:5605:2194 length=17
|
||||
CATTATGGGCTTATTTT
|
||||
+SRR1058032.20 HISEQ:653:H12WDADXX:1:1101:5605:2194 length=17
|
||||
BBBFFFFFHHHHHJJJJ
|
||||
@SRR1058032.21 HISEQ:653:H12WDADXX:1:1101:5519:2196 length=17
|
||||
AAATGTGCAGTTCAGAT
|
||||
+SRR1058032.21 HISEQ:653:H12WDADXX:1:1101:5519:2196 length=17
|
||||
BCCFFFFFHHHHHJJJJ
|
||||
@SRR1058032.22 HISEQ:653:H12WDADXX:1:1101:5705:2220 length=17
|
||||
TGGGGGCTAAAGGGACT
|
||||
+SRR1058032.22 HISEQ:653:H12WDADXX:1:1101:5705:2220 length=17
|
||||
BBBDFFFFHHHHHJIJI
|
||||
@SRR1058032.23 HISEQ:653:H12WDADXX:1:1101:5558:2236 length=17
|
||||
GATAATACTTACGGTGT
|
||||
+SRR1058032.23 HISEQ:653:H12WDADXX:1:1101:5558:2236 length=17
|
||||
CCCFFFFFHHHHHJFHI
|
||||
@SRR1058032.24 HISEQ:653:H12WDADXX:1:1101:5649:2244 length=17
|
||||
CGTTAATAATTGTGGTT
|
||||
+SRR1058032.24 HISEQ:653:H12WDADXX:1:1101:5649:2244 length=17
|
||||
BBBFFFFFHHHHHIIHG
|
||||
@SRR1058032.25 HISEQ:653:H12WDADXX:1:1101:5910:2207 length=17
|
||||
AAAAAAAAAAAAAAAAA
|
||||
+SRR1058032.25 HISEQ:653:H12WDADXX:1:1101:5910:2207 length=17
|
||||
@CCFFFFFGHAA<:46'
|
||||
@SRR1058032.26 HISEQ:653:H12WDADXX:1:1101:5757:2217 length=17
|
||||
GCCGACCAACGATTTTT
|
||||
+SRR1058032.26 HISEQ:653:H12WDADXX:1:1101:5757:2217 length=17
|
||||
:=?DD@?DH;AFBFDFF
|
||||
@SRR1058032.27 HISEQ:653:H12WDADXX:1:1101:5790:2248 length=17
|
||||
AATCAAGACCACTGAAT
|
||||
+SRR1058032.27 HISEQ:653:H12WDADXX:1:1101:5790:2248 length=17
|
||||
@CCFFFFFHHHHHJJJI
|
||||
@SRR1058032.28 HISEQ:653:H12WDADXX:1:1101:6079:2195 length=17
|
||||
CGCGCTTTTGTTTTTTT
|
||||
+SRR1058032.28 HISEQ:653:H12WDADXX:1:1101:6079:2195 length=17
|
||||
BB@FFFFFHHHHHJJJJ
|
||||
@SRR1058032.29 HISEQ:653:H12WDADXX:1:1101:6133:2213 length=17
|
||||
AAATACTTTGAGGGAAT
|
||||
+SRR1058032.29 HISEQ:653:H12WDADXX:1:1101:6133:2213 length=17
|
||||
@CCFFEFFHHFHGJJII
|
||||
@SRR1058032.30 HISEQ:653:H12WDADXX:1:1101:6651:2198 length=17
|
||||
AGCGGGGTTTTATCGGT
|
||||
+SRR1058032.30 HISEQ:653:H12WDADXX:1:1101:6651:2198 length=17
|
||||
CCCFFFFDHHHHHHJJJ
|
||||
@@ -0,0 +1,120 @@
|
||||
@SRR1058032.1_AATAACCCTACA_TTCCCGCGTCCTCTTTCCCT HISEQ:653:H12WDADXX:1:1101:1210:2217 length=17
|
||||
G
|
||||
+
|
||||
B
|
||||
@SRR1058032.2_AGCGGGACGCTA_GTGCTCGTCGTACTCTTTCC HISEQ:653:H12WDADXX:1:1101:1191:2236 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.3_CTTTAGACGCTA_TACCAGTCCTCACTCTTTCC HISEQ:653:H12WDADXX:1:1101:1715:2245 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.4_AGGCGTACTTTA_TGTTTTTTTTCACTCTCTCC HISEQ:653:H12WDADXX:1:1101:1905:2212 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.5_ATCGAGGTGTAG_ACATAATTGAGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:1927:2237 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.6_TGGGGGCCTATA_CGGTACATGATAGTATAGCT HISEQ:653:H12WDADXX:1:1101:1876:2243 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.7_CTATATATTAAA_GTTTGCGCTGGACAAACTAC HISEQ:653:H12WDADXX:1:1101:2491:2207 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.8_CTCCCGGCCTAG_CATGCTGCTGTTGTGAACCA HISEQ:653:H12WDADXX:1:1101:2513:2219 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.9_GAGCCCCCCTTC_TGAGGGGATCACGACGCTAC HISEQ:653:H12WDADXX:1:1101:2604:2231 length=17
|
||||
T
|
||||
+
|
||||
G
|
||||
@SRR1058032.10_AGCGGGGGGAAA_GTTCGCGGTTGAGTGTGTCG HISEQ:653:H12WDADXX:1:1101:2936:2218 length=17
|
||||
T
|
||||
+
|
||||
I
|
||||
@SRR1058032.11_AGAATTCCCACA_GCCTGGATTTCTCTTTCCCT HISEQ:653:H12WDADXX:1:1101:3447:2241 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.12_AGGCGGGTGTAT_GGCAACGGGTGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:3620:2196 length=17
|
||||
T
|
||||
+
|
||||
H
|
||||
@SRR1058032.13_GTCCCCCTCTTT_GCGTCGTGTACCCTACACTC HISEQ:653:H12WDADXX:1:1101:3875:2206 length=17
|
||||
G
|
||||
+
|
||||
J
|
||||
@SRR1058032.14_CCACGCGTGTAG_ATTCACTCGGCGTCGTGTAG HISEQ:653:H12WDADXX:1:1101:4131:2215 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.15_TGCGCAGTGTAT_ATAAGCGCTAGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:4284:2241 length=17
|
||||
T
|
||||
+
|
||||
H
|
||||
@SRR1058032.16_CGCTGGACTCTT_CAGAGCCCGGTCCCTACACT HISEQ:653:H12WDADXX:1:1101:4599:2232 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.17_AGGCGGGATTCT_TGCATAGTCTTCAAATGAGG HISEQ:653:H12WDADXX:1:1101:5428:2200 length=17
|
||||
T
|
||||
+
|
||||
H
|
||||
@SRR1058032.18_GTCCCCGCGTCG_CGCGTGTGACTGTAGGGAAA HISEQ:653:H12WDADXX:1:1101:5336:2218 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.19_TATAGACCATCA_AAAAACTTTTCGCCTGCCCT HISEQ:653:H12WDADXX:1:1101:5397:2220 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.20_CATTATTTAATG_GGGCTTATTTGACTGTTTCA HISEQ:653:H12WDADXX:1:1101:5605:2194 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.21_AAATGTTATCTA_GCAGTTCAGAGACTGCTCGT HISEQ:653:H12WDADXX:1:1101:5519:2196 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.22_TGGGGGACTGTT_CTAAAGGGACCTTTAACCAA HISEQ:653:H12WDADXX:1:1101:5705:2220 length=17
|
||||
T
|
||||
+
|
||||
I
|
||||
@SRR1058032.23_GATAATTTCCAT_ACTTACGGTGACACTCTTTC HISEQ:653:H12WDADXX:1:1101:5558:2236 length=17
|
||||
T
|
||||
+
|
||||
I
|
||||
@SRR1058032.24_CGTTAAAGACGG_TAATTGTGGTACCAGAGCGA HISEQ:653:H12WDADXX:1:1101:5649:2244 length=17
|
||||
T
|
||||
+
|
||||
G
|
||||
@SRR1058032.25_AAAAAAGAGTAT_AAAAAAAAAAAGGGAAAGAG HISEQ:653:H12WDADXX:1:1101:5910:2207 length=17
|
||||
A
|
||||
+
|
||||
'
|
||||
@SRR1058032.26_GCCGACCCTTTT_CAACGATTTTATACAATACA HISEQ:653:H12WDADXX:1:1101:5757:2217 length=17
|
||||
T
|
||||
+
|
||||
F
|
||||
@SRR1058032.27_AATCAAATCACA_GACCACTGAAGCTGGAGAGA HISEQ:653:H12WDADXX:1:1101:5790:2248 length=17
|
||||
T
|
||||
+
|
||||
I
|
||||
@SRR1058032.28_CGCGCTGTACTA_TTTGTTTTTTGGCATCGTCA HISEQ:653:H12WDADXX:1:1101:6079:2195 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
@SRR1058032.29_AAATACCCAATA_TTTGAGGGAAACTTGACCAA HISEQ:653:H12WDADXX:1:1101:6133:2213 length=17
|
||||
T
|
||||
+
|
||||
I
|
||||
@SRR1058032.30_AGCGGGGAGTGT_GTTTTATCGGACACTCTTTC HISEQ:653:H12WDADXX:1:1101:6651:2198 length=17
|
||||
T
|
||||
+
|
||||
J
|
||||
120
src/umi_tools/umi_tools_extract/test_data/scrb_seq_fastq.2_30
Normal file
120
src/umi_tools/umi_tools_extract/test_data/scrb_seq_fastq.2_30
Normal file
@@ -0,0 +1,120 @@
|
||||
@SRR1058032.1 HISEQ:653:H12WDADXX:1:1101:1210:2217 length=34
|
||||
CCTACACTCTTTCCCTACACGACGCTACACTCTN
|
||||
+SRR1058032.1 HISEQ:653:H12WDADXX:1:1101:1210:2217 length=34
|
||||
@@@DFEDABD?A?ABGHGGGIGGEGIIIJJJFI#
|
||||
@SRR1058032.2 HISEQ:653:H12WDADXX:1:1101:1191:2236 length=34
|
||||
ACGCTATACTCTTTCCCTACACGACGCTACACTN
|
||||
+SRR1058032.2 HISEQ:653:H12WDADXX:1:1101:1191:2236 length=34
|
||||
CCCDFFFFHHHHHJJJJGGICGE6FDH<?F<F<#
|
||||
@SRR1058032.3 HISEQ:653:H12WDADXX:1:1101:1715:2245 length=34
|
||||
ACGCTACACTCTTTCCCTACACGACGCTACACTN
|
||||
+SRR1058032.3 HISEQ:653:H12WDADXX:1:1101:1715:2245 length=34
|
||||
C@CFFFFFGHHGAEHIIEIGIIAGFHIFG@FBE#
|
||||
@SRR1058032.4 HISEQ:653:H12WDADXX:1:1101:1905:2212 length=34
|
||||
ACTTTACACTCTCTCCCTACACGACGCTACACTN
|
||||
+SRR1058032.4 HISEQ:653:H12WDADXX:1:1101:1905:2212 length=34
|
||||
??;==DBDD?F:D<EGH<HGHIF>GEGCDG9FD#
|
||||
@SRR1058032.5 HISEQ:653:H12WDADXX:1:1101:1927:2237 length=34
|
||||
GTGTAGGGAAAGAGTGTAAGGAAAGAGTGTAGCN
|
||||
+SRR1058032.5 HISEQ:653:H12WDADXX:1:1101:1927:2237 length=34
|
||||
?=??B?DB2ACCAEAEFHHIHHHIHFHCEHHIG#
|
||||
@SRR1058032.6 HISEQ:653:H12WDADXX:1:1101:1876:2243 length=34
|
||||
CCTATATAGTATAGCTTCCCATCTTCTTTGAGAN
|
||||
+SRR1058032.6 HISEQ:653:H12WDADXX:1:1101:1876:2243 length=34
|
||||
CCCFFFFFHDHBHEIIJJJJIIIJJJGGGIGIE#
|
||||
@SRR1058032.7 HISEQ:653:H12WDADXX:1:1101:2491:2207 length=34
|
||||
ATTAAAGACAAACTACAACTCATATGAGGCATTN
|
||||
+SRR1058032.7 HISEQ:653:H12WDADXX:1:1101:2491:2207 length=34
|
||||
@@@DDDADDHHHFBFAHIGBHH<H<BHDFGIIG#
|
||||
@SRR1058032.8 HISEQ:653:H12WDADXX:1:1101:2513:2219 length=34
|
||||
GCCTAGTTGTGAACCAAATGTGAAAAAACCTCCN
|
||||
+SRR1058032.8 HISEQ:653:H12WDADXX:1:1101:2513:2219 length=34
|
||||
@@@FFDDDFFFFFIIGHIFI<HHEHCEBEFEED#
|
||||
@SRR1058032.9 HISEQ:653:H12WDADXX:1:1101:2604:2231 length=34
|
||||
CCCTTCACGACGCTACACTCTTTCCCTACACGAN
|
||||
+SRR1058032.9 HISEQ:653:H12WDADXX:1:1101:2604:2231 length=34
|
||||
C@CFFFFFHHHHHJJJIJJJJIJJJIGBHBFG:#
|
||||
@SRR1058032.10 HISEQ:653:H12WDADXX:1:1101:2936:2218 length=34
|
||||
GGGAAAGAGTGTGTCGTGTATGGAAAGAGTGTAN
|
||||
+SRR1058032.10 HISEQ:653:H12WDADXX:1:1101:2936:2218 length=34
|
||||
CCCFFFFDD>FAH;E@@?AB>F@BF3;3?1C?<#
|
||||
@SRR1058032.11 HISEQ:653:H12WDADXX:1:1101:3447:2241 length=34
|
||||
CCCACACTCTTTCCCTACACGACGCTACACTCTN
|
||||
+SRR1058032.11 HISEQ:653:H12WDADXX:1:1101:3447:2241 length=34
|
||||
@@@DDFDDBHBFHGI<F@GFBFEE>)C:D@@@B#
|
||||
@SRR1058032.12 HISEQ:653:H12WDADXX:1:1101:3620:2196 length=34
|
||||
GTGTATGGAAAGAGTGTAGGGAAAGAGTGTAGGN
|
||||
+SRR1058032.12 HISEQ:653:H12WDADXX:1:1101:3620:2196 length=34
|
||||
@@@DDDDAHHHFHIABEEEAB??CFBF?C@BFF#
|
||||
@SRR1058032.13 HISEQ:653:H12WDADXX:1:1101:3875:2206 length=34
|
||||
CTCTTTCCCTACACTCTTTCCCTACACGACGCTN
|
||||
+SRR1058032.13 HISEQ:653:H12WDADXX:1:1101:3875:2206 length=34
|
||||
@@@DDDAAADHDHDGDGIIIIIJJJJJJIJIIJ#
|
||||
@SRR1058032.14 HISEQ:653:H12WDADXX:1:1101:4131:2215 length=34
|
||||
GTGTAGCGTCGTGTAGGGAAAGAGTGTGTGGAAN
|
||||
+SRR1058032.14 HISEQ:653:H12WDADXX:1:1101:4131:2215 length=34
|
||||
@@@DDDDD?DFDCAEFHIGGFHEH:D1C:CG@F#
|
||||
@SRR1058032.15 HISEQ:653:H12WDADXX:1:1101:4284:2241 length=34
|
||||
GTGTATGGAAAGAGTGTGCGTCGTACGTGTAGAN
|
||||
+SRR1058032.15 HISEQ:653:H12WDADXX:1:1101:4284:2241 length=34
|
||||
@?@DDFFFHHHHGDAC:CHGGIIGIIIFHFGHB#
|
||||
@SRR1058032.16 HISEQ:653:H12WDADXX:1:1101:4599:2232 length=34
|
||||
ACTCTTTCCCTACACTCTTTCCCTACACGACGCN
|
||||
+SRR1058032.16 HISEQ:653:H12WDADXX:1:1101:4599:2232 length=34
|
||||
@@<DAAAA?>BCBE@9;EGGGGGIHJJIJHIGG#
|
||||
@SRR1058032.17 HISEQ:653:H12WDADXX:1:1101:5428:2200 length=34
|
||||
GATTCTTCAAATGAGGACTATGCGGGACATGAAN
|
||||
+SRR1058032.17 HISEQ:653:H12WDADXX:1:1101:5428:2200 length=34
|
||||
@@@DDDDDFHHFAHB;FHIIIIIIIIFHEHIHI#
|
||||
@SRR1058032.18 HISEQ:653:H12WDADXX:1:1101:5336:2218 length=34
|
||||
GCGTCGTGTAGGGAAAGAGTGTAGCGTCGTGTAN
|
||||
+SRR1058032.18 HISEQ:653:H12WDADXX:1:1101:5336:2218 length=34
|
||||
@@@DDDDD<FFD?GIIDGF+<<CBAFCGE@FB@#
|
||||
@SRR1058032.19 HISEQ:653:H12WDADXX:1:1101:5397:2220 length=34
|
||||
CCATCACGCCTGCCCTTCCTTGAAATTACACCTN
|
||||
+SRR1058032.19 HISEQ:653:H12WDADXX:1:1101:5397:2220 length=34
|
||||
;===AAA<@A72??A+22<+,+<+@+++*:***#
|
||||
@SRR1058032.20 HISEQ:653:H12WDADXX:1:1101:5605:2194 length=34
|
||||
TTAATGGACTGTTTCAGGTAAAAGAGAATGAATN
|
||||
+SRR1058032.20 HISEQ:653:H12WDADXX:1:1101:5605:2194 length=34
|
||||
CCCFFDAEHHHHDEHIGCEIIJJIGIJGIGGHE#
|
||||
@SRR1058032.21 HISEQ:653:H12WDADXX:1:1101:5519:2196 length=34
|
||||
TATCTAGACTGCTCGTCATTTAGAAGACACGTCN
|
||||
+SRR1058032.21 HISEQ:653:H12WDADXX:1:1101:5519:2196 length=34
|
||||
@B@FDDFFHFHBHEIIGIIJJGHGHIIIGIGII#
|
||||
@SRR1058032.22 HISEQ:653:H12WDADXX:1:1101:5705:2220 length=34
|
||||
ACTGTTCTTTAACCAAACATCCGTGCGATTCGTN
|
||||
+SRR1058032.22 HISEQ:653:H12WDADXX:1:1101:5705:2220 length=34
|
||||
CCCFFFFFHHHHHJJJJGHIJJIGIIIBEFG?G#
|
||||
@SRR1058032.23 HISEQ:653:H12WDADXX:1:1101:5558:2236 length=34
|
||||
TTCCATACACTCTTTCCCTACACGACGCACACTN
|
||||
+SRR1058032.23 HISEQ:653:H12WDADXX:1:1101:5558:2236 length=34
|
||||
@@@DFBEFHFFD<A<CD>BHEGGFGHGGIEGII#
|
||||
@SRR1058032.24 HISEQ:653:H12WDADXX:1:1101:5649:2244 length=34
|
||||
AGACGGACCAGAGCGAAAGCATTTGCCAAGAATN
|
||||
+SRR1058032.24 HISEQ:653:H12WDADXX:1:1101:5649:2244 length=34
|
||||
CCCFFFDFGHHHGJIIJJIJHEDD919CGGHJ@#
|
||||
@SRR1058032.25 HISEQ:653:H12WDADXX:1:1101:5910:2207 length=34
|
||||
GAGTATAGGGAAAGAGTTTTTTTTTTTTTTTTTN
|
||||
+SRR1058032.25 HISEQ:653:H12WDADXX:1:1101:5910:2207 length=34
|
||||
?=?DDDD>AB:ACEEGHIJJIJJJJIIJJHFDD#
|
||||
@SRR1058032.26 HISEQ:653:H12WDADXX:1:1101:5757:2217 length=34
|
||||
CCTTTTATACAATACAAAGCTTTGCTTTTTTTTN
|
||||
+SRR1058032.26 HISEQ:653:H12WDADXX:1:1101:5757:2217 length=34
|
||||
???DDDDDDDDD4EEEII@A<:33<33,22110#
|
||||
@SRR1058032.27 HISEQ:653:H12WDADXX:1:1101:5790:2248 length=34
|
||||
ATCACAGCTGGAGAGATCTTGATCTTCATGGTGN
|
||||
+SRR1058032.27 HISEQ:653:H12WDADXX:1:1101:5790:2248 length=34
|
||||
CCCFFFFFHHFHGGIIIIJIEAHCEHHEFECGD#
|
||||
@SRR1058032.28 HISEQ:653:H12WDADXX:1:1101:6079:2195 length=34
|
||||
GTACTAGGCATCGTCATCCAATGCGACGAGTCCN
|
||||
+SRR1058032.28 HISEQ:653:H12WDADXX:1:1101:6079:2195 length=34
|
||||
@@CFFDDFHHGHHIJJJIJJJIGGHIDG<GFHG#
|
||||
@SRR1058032.29 HISEQ:653:H12WDADXX:1:1101:6133:2213 length=34
|
||||
CCAATAACTTGACCAACGGAACAAGTTACCCTAN
|
||||
+SRR1058032.29 HISEQ:653:H12WDADXX:1:1101:6133:2213 length=34
|
||||
@CCFFFFFHHGHHIJJJJIJIIIIIIIIIJIJI#
|
||||
@SRR1058032.30 HISEQ:653:H12WDADXX:1:1101:6651:2198 length=34
|
||||
GAGTGTACACTCTTTCCCTACACGACGTTACACN
|
||||
+SRR1058032.30 HISEQ:653:H12WDADXX:1:1101:6651:2198 length=34
|
||||
???A:2ABDBDDDBEEIIA:F:CC8F<))1:??#
|
||||
@@ -0,0 +1,120 @@
|
||||
@SRR1058032.1_AATAACCCTACA_TTCCCGCGTCCTCTTTCCCT HISEQ:653:H12WDADXX:1:1101:1210:2217 length=34
|
||||
ACACGACGCTACACTCTN
|
||||
+
|
||||
HGGGIGGEGIIIJJJFI#
|
||||
@SRR1058032.2_AGCGGGACGCTA_GTGCTCGTCGTACTCTTTCC HISEQ:653:H12WDADXX:1:1101:1191:2236 length=34
|
||||
CTACACGACGCTACACTN
|
||||
+
|
||||
JGGICGE6FDH<?F<F<#
|
||||
@SRR1058032.3_CTTTAGACGCTA_TACCAGTCCTCACTCTTTCC HISEQ:653:H12WDADXX:1:1101:1715:2245 length=34
|
||||
CTACACGACGCTACACTN
|
||||
+
|
||||
IEIGIIAGFHIFG@FBE#
|
||||
@SRR1058032.4_AGGCGTACTTTA_TGTTTTTTTTCACTCTCTCC HISEQ:653:H12WDADXX:1:1101:1905:2212 length=34
|
||||
CTACACGACGCTACACTN
|
||||
+
|
||||
H<HGHIF>GEGCDG9FD#
|
||||
@SRR1058032.5_ATCGAGGTGTAG_ACATAATTGAGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:1927:2237 length=34
|
||||
TAAGGAAAGAGTGTAGCN
|
||||
+
|
||||
FHHIHHHIHFHCEHHIG#
|
||||
@SRR1058032.6_TGGGGGCCTATA_CGGTACATGATAGTATAGCT HISEQ:653:H12WDADXX:1:1101:1876:2243 length=34
|
||||
TCCCATCTTCTTTGAGAN
|
||||
+
|
||||
JJJJIIIJJJGGGIGIE#
|
||||
@SRR1058032.7_CTATATATTAAA_GTTTGCGCTGGACAAACTAC HISEQ:653:H12WDADXX:1:1101:2491:2207 length=34
|
||||
AACTCATATGAGGCATTN
|
||||
+
|
||||
HIGBHH<H<BHDFGIIG#
|
||||
@SRR1058032.8_CTCCCGGCCTAG_CATGCTGCTGTTGTGAACCA HISEQ:653:H12WDADXX:1:1101:2513:2219 length=34
|
||||
AATGTGAAAAAACCTCCN
|
||||
+
|
||||
HIFI<HHEHCEBEFEED#
|
||||
@SRR1058032.9_GAGCCCCCCTTC_TGAGGGGATCACGACGCTAC HISEQ:653:H12WDADXX:1:1101:2604:2231 length=34
|
||||
ACTCTTTCCCTACACGAN
|
||||
+
|
||||
IJJJJIJJJIGBHBFG:#
|
||||
@SRR1058032.10_AGCGGGGGGAAA_GTTCGCGGTTGAGTGTGTCG HISEQ:653:H12WDADXX:1:1101:2936:2218 length=34
|
||||
TGTATGGAAAGAGTGTAN
|
||||
+
|
||||
@?AB>F@BF3;3?1C?<#
|
||||
@SRR1058032.11_AGAATTCCCACA_GCCTGGATTTCTCTTTCCCT HISEQ:653:H12WDADXX:1:1101:3447:2241 length=34
|
||||
ACACGACGCTACACTCTN
|
||||
+
|
||||
F@GFBFEE>)C:D@@@B#
|
||||
@SRR1058032.12_AGGCGGGTGTAT_GGCAACGGGTGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:3620:2196 length=34
|
||||
TAGGGAAAGAGTGTAGGN
|
||||
+
|
||||
EEEAB??CFBF?C@BFF#
|
||||
@SRR1058032.13_GTCCCCCTCTTT_GCGTCGTGTACCCTACACTC HISEQ:653:H12WDADXX:1:1101:3875:2206 length=34
|
||||
TTTCCCTACACGACGCTN
|
||||
+
|
||||
GIIIIIJJJJJJIJIIJ#
|
||||
@SRR1058032.14_CCACGCGTGTAG_ATTCACTCGGCGTCGTGTAG HISEQ:653:H12WDADXX:1:1101:4131:2215 length=34
|
||||
GGAAAGAGTGTGTGGAAN
|
||||
+
|
||||
HIGGFHEH:D1C:CG@F#
|
||||
@SRR1058032.15_TGCGCAGTGTAT_ATAAGCGCTAGGAAAGAGTG HISEQ:653:H12WDADXX:1:1101:4284:2241 length=34
|
||||
TGCGTCGTACGTGTAGAN
|
||||
+
|
||||
:CHGGIIGIIIFHFGHB#
|
||||
@SRR1058032.16_CGCTGGACTCTT_CAGAGCCCGGTCCCTACACT HISEQ:653:H12WDADXX:1:1101:4599:2232 length=34
|
||||
CTTTCCCTACACGACGCN
|
||||
+
|
||||
;EGGGGGIHJJIJHIGG#
|
||||
@SRR1058032.17_AGGCGGGATTCT_TGCATAGTCTTCAAATGAGG HISEQ:653:H12WDADXX:1:1101:5428:2200 length=34
|
||||
ACTATGCGGGACATGAAN
|
||||
+
|
||||
FHIIIIIIIIFHEHIHI#
|
||||
@SRR1058032.18_GTCCCCGCGTCG_CGCGTGTGACTGTAGGGAAA HISEQ:653:H12WDADXX:1:1101:5336:2218 length=34
|
||||
GAGTGTAGCGTCGTGTAN
|
||||
+
|
||||
DGF+<<CBAFCGE@FB@#
|
||||
@SRR1058032.19_TATAGACCATCA_AAAAACTTTTCGCCTGCCCT HISEQ:653:H12WDADXX:1:1101:5397:2220 length=34
|
||||
TCCTTGAAATTACACCTN
|
||||
+
|
||||
22<+,+<+@+++*:***#
|
||||
@SRR1058032.20_CATTATTTAATG_GGGCTTATTTGACTGTTTCA HISEQ:653:H12WDADXX:1:1101:5605:2194 length=34
|
||||
GGTAAAAGAGAATGAATN
|
||||
+
|
||||
GCEIIJJIGIJGIGGHE#
|
||||
@SRR1058032.21_AAATGTTATCTA_GCAGTTCAGAGACTGCTCGT HISEQ:653:H12WDADXX:1:1101:5519:2196 length=34
|
||||
CATTTAGAAGACACGTCN
|
||||
+
|
||||
GIIJJGHGHIIIGIGII#
|
||||
@SRR1058032.22_TGGGGGACTGTT_CTAAAGGGACCTTTAACCAA HISEQ:653:H12WDADXX:1:1101:5705:2220 length=34
|
||||
ACATCCGTGCGATTCGTN
|
||||
+
|
||||
JGHIJJIGIIIBEFG?G#
|
||||
@SRR1058032.23_GATAATTTCCAT_ACTTACGGTGACACTCTTTC HISEQ:653:H12WDADXX:1:1101:5558:2236 length=34
|
||||
CCTACACGACGCACACTN
|
||||
+
|
||||
D>BHEGGFGHGGIEGII#
|
||||
@SRR1058032.24_CGTTAAAGACGG_TAATTGTGGTACCAGAGCGA HISEQ:653:H12WDADXX:1:1101:5649:2244 length=34
|
||||
AAGCATTTGCCAAGAATN
|
||||
+
|
||||
JJIJHEDD919CGGHJ@#
|
||||
@SRR1058032.25_AAAAAAGAGTAT_AAAAAAAAAAAGGGAAAGAG HISEQ:653:H12WDADXX:1:1101:5910:2207 length=34
|
||||
TTTTTTTTTTTTTTTTTN
|
||||
+
|
||||
HIJJIJJJJIIJJHFDD#
|
||||
@SRR1058032.26_GCCGACCCTTTT_CAACGATTTTATACAATACA HISEQ:653:H12WDADXX:1:1101:5757:2217 length=34
|
||||
AAGCTTTGCTTTTTTTTN
|
||||
+
|
||||
II@A<:33<33,22110#
|
||||
@SRR1058032.27_AATCAAATCACA_GACCACTGAAGCTGGAGAGA HISEQ:653:H12WDADXX:1:1101:5790:2248 length=34
|
||||
TCTTGATCTTCATGGTGN
|
||||
+
|
||||
IIJIEAHCEHHEFECGD#
|
||||
@SRR1058032.28_CGCGCTGTACTA_TTTGTTTTTTGGCATCGTCA HISEQ:653:H12WDADXX:1:1101:6079:2195 length=34
|
||||
TCCAATGCGACGAGTCCN
|
||||
+
|
||||
JIJJJIGGHIDG<GFHG#
|
||||
@SRR1058032.29_AAATACCCAATA_TTTGAGGGAAACTTGACCAA HISEQ:653:H12WDADXX:1:1101:6133:2213 length=34
|
||||
CGGAACAAGTTACCCTAN
|
||||
+
|
||||
JJIJIIIIIIIIIJIJI#
|
||||
@SRR1058032.30_AGCGGGGAGTGT_GTTTTATCGGACACTCTTTC HISEQ:653:H12WDADXX:1:1101:6651:2198 length=34
|
||||
CCTACACGACGTTACACN
|
||||
+
|
||||
IIA:F:CC8F<))1:??#
|
||||
34
src/umi_tools/umi_tools_extract/test_data/script.sh
Executable file
34
src/umi_tools/umi_tools_extract/test_data/script.sh
Executable file
@@ -0,0 +1,34 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Download test data
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/slim.fastq.gz
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.1.gz
|
||||
wget https://github.com/CGATOxford/UMI-tools/raw/master/tests/scrb_seq_fastq.2.gz
|
||||
|
||||
gunzip -f slim.fastq.gz scrb_seq_fastq.1.gz scrb_seq_fastq.2.gz
|
||||
|
||||
# smaller datasets
|
||||
head -n 120 slim.fastq > slim_30.fastq
|
||||
head -n 120 scrb_seq_fastq.1 > scrb_seq_fastq.1_30
|
||||
head -n 120 scrb_seq_fastq.2 > scrb_seq_fastq.2_30
|
||||
rm slim.fastq scrb_seq_fastq.1 scrb_seq_fastq.2
|
||||
|
||||
# Generate expected output
|
||||
# Test 1 and 2
|
||||
umi_tools extract \
|
||||
--stdin "scrb_seq_fastq.1_30" \
|
||||
--read2-in "scrb_seq_fastq.2_30" \
|
||||
--bc-pattern "CCCCCCNNNNNNNNNN" \
|
||||
--bc-pattern2 "CCCCCCNNNNNNNNNN" \
|
||||
--extract-method string \
|
||||
--stdout scrb_seq_fastq.1_30.extract \
|
||||
--read2-out scrb_seq_fastq.2_30.extract \
|
||||
--random-seed 1
|
||||
|
||||
# Test 3
|
||||
umi_tools extract \
|
||||
--stdin "slim_30.fastq" \
|
||||
--bc-pattern "^(?P<umi_1>.{3}).{4}(?P<umi_2>.{2})" \
|
||||
--extract-method regex \
|
||||
--stdout slim_30.extract \
|
||||
--random-seed 1
|
||||
120
src/umi_tools/umi_tools_extract/test_data/slim_30.extract
Normal file
120
src/umi_tools/umi_tools_extract/test_data/slim_30.extract
Normal file
@@ -0,0 +1,120 @@
|
||||
@SRR2057595.7_CAGAA
|
||||
GTTCTCTCGGTGGGACCTC
|
||||
+
|
||||
FFFFHHHJJJFGIJIJJIJ
|
||||
@SRR2057595.9_TTGAA
|
||||
GTTCTCTGATGCCCTCTTCTGGTGCATCTGAAGACAGCTACAGTGTACTTAGATATAATAAATAAATCTT
|
||||
+
|
||||
FDBDFHHIGGEHJGGIHGHGGCAFCHGIGEHIJJJJIJJJIHIIIIIIJIIIIIGHIIGGIJGIIJIIJ@
|
||||
@SRR2057595.14_TGGAT
|
||||
GTTAGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+
|
||||
FFFFHHHJJIJJJJIGHJJIIJJJJJIJHFHHFFEDEEEEDDDDBDDDD
|
||||
@SRR2057595.22_ACGAT
|
||||
GTTAGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGC
|
||||
+
|
||||
FFFFHHHJJJJJJJJIJJJJJJJJJJJJHHHFFFEDEEEEDDDDBDDD
|
||||
@SRR2057595.23_GCGTT
|
||||
GTTACCTAAGGCGAGCTCAGGGAGGACAGAAACCTCCCGTGGAGCAGAAGGGCAAAAGCTCGCTTGATCT
|
||||
+
|
||||
FFFFHHHJJJJJJJJJJJJJJJIJJIIJJJJJJJJJJJJIJJHHHHHFFFFDDDDDDDDDDDDDDDDDDA
|
||||
@SRR2057595.29_ACGTT
|
||||
GTTCGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCTT
|
||||
+
|
||||
FFFFHHHJJJJJJJJHIJJJJJJIJJJJHHHFFDEDEEDDCDDDBDDDDD
|
||||
@SRR2057595.30_GAGAA
|
||||
GTTGAATCCGTGCTAAGAAGAA
|
||||
+
|
||||
DFFFHHHJJJJIJJJJJJJJJJ
|
||||
@SRR2057595.33_TCGAT
|
||||
GTTTCTCGTCTGATCTCGGAAGCTAAGCAGGGTCGGGCCTGGTTAGTACTTGGATGGGAGACCGCCTGGG
|
||||
+
|
||||
FFFFHHHJJJJJJJJJJJJJJJJJJJJJJJJJDHIJJJJIJJJHGGEEHFFFFFFEDDEDDDDDDDDDDB
|
||||
@SRR2057595.35_ACGCT
|
||||
GTTACCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+
|
||||
DFFFHHHJJJJJJIJJJJJJIJJJJJJJHIIJJ
|
||||
@SRR2057595.38_GGGCC
|
||||
GTTATGCATGTTTATAGTTTCTAGTTTTGGCATTTTGTGTGGTCTCTTTTTTGTT
|
||||
+
|
||||
DFFFHHHJJJJJJJJJJHJJIJJJIJJJJJJJJJJJJGIGHJHIJJIJJJJJJJJ
|
||||
@SRR2057595.42_TAGGA
|
||||
GTTGTAAGTTATACACTGACTAAGTCATCTGTTACTGCCTTCACTGAGTTTTTATTTCCTTT
|
||||
+
|
||||
DFFFHHHJJJJJJJJJJJJJJJJJIIJJJJGJJJJJJJJJJJJJJJIIHIJJJJJJJIJJJI
|
||||
@SRR2057595.45_CTGGC
|
||||
GTTTTGCGGAAGGATCATTA
|
||||
+
|
||||
DDDDFFDFFAGFE<EB8?BF
|
||||
@SRR2057595.46_CAGTT
|
||||
GTTTTGGCTTTTTTTTAAAACCATTTTGTGAAAGGTTTCTGAAACTTGATAATAAAAAGCAGTTGGTGTA
|
||||
+
|
||||
DDDDHHFIGIJJJJJJJIIIJIIIJJJICHHIGIJFHHGHIEIHGHFHEDFFEFEFEEDEDD@CDD<@B:
|
||||
@SRR2057595.56_GGGCG
|
||||
GTTTATGAAGAACGCAGCTAGCTGCGAGAATTAATGTGAATTGCAGGACACATTGATCATCGACACTTCG
|
||||
+
|
||||
FFFFHHHJJJJJJJJJJJJJJJJJJJIJJJJJIJJJJJJJJJJJJJJJJJJJHHHHHHFFFFFDDDDDDD
|
||||
@SRR2057595.59_GCGCC
|
||||
GTTATCCTGTCTTATCATTGTCTTTTGAGCCTGGGCCTTGCCAGGTAGCTCTAGACTGGCCTAGAACTCA
|
||||
+
|
||||
FFFFHHHJJJJJJJJJJJJ4CHHJJJJJJJJJJJJJJJJJJJJIJDHIJJJIIJJJJJIJJJJHHHHHHB
|
||||
@SRR2057595.60_ATGCA
|
||||
GTTTTCTCGTCTGATCTCGGAAGCTAAGCAGGGCCGGGCCTGGTTAGTACTTGGATGGGAGACCGCC
|
||||
+
|
||||
DDFDBBBFECFE@HHIBCBG<2CGEC49?1CBD)86:;AB=7C.=;=)77;A3;?C@;96=?@B8;?
|
||||
@SRR2057595.61_GAGAG
|
||||
GTTTCAGGACACATTGATCATCGACACTTCGAACGCACTTGCGGCCCCGGGTTCCTCCCGGGGCTACGCC
|
||||
+
|
||||
DFFFHHHGGHGHIIJGEFEGGFH9GGIIFGGGGIFGDHBG@FGGGHEFCCB?@@CDCCD?B7>B@ACB9<
|
||||
@SRR2057595.65_GCGCG
|
||||
GTTTGAGCTTGCTCCGTCCACTCAACGCATCGACCTGGTATTGCAGTACCTCCAGGAACGGTGCACCAAG
|
||||
+
|
||||
FFFFHHHJJJJJJHJIHHIIIIIIIJHJBHIHBFHHJI@EHJJHHHHHHHFFFBDE?AEBD=AB@CDBD?
|
||||
@SRR2057595.67_AAGGT
|
||||
GTTGTTTTGAGGTCCTGCTCGTGCAGGGT
|
||||
+
|
||||
DDDFHHHHGFHGFGGIIDGHHIGIJJJJ9
|
||||
@SRR2057595.69_ATTAT
|
||||
GGTTTTTGTTTTTCCTCCTTCTCTTTCTAAA
|
||||
+
|
||||
FFFFHHHHJJJJJJJJJJJJJJJJJJIJIJJ
|
||||
@SRR2057595.70_TTAAA
|
||||
GGTTTTGTAATTTTATGAGGTCCCATTTGTCAATTCTT
|
||||
+
|
||||
DDDD2CDFA@FBGHCCHFHGBFHGHIGGDHGHIIFCFF
|
||||
@SRR2057595.71_TGCCA
|
||||
GGTTTATTAGCATGGCCCCTGCGCAAGGATGACACGCAAATTCGTGAAGCGTTCCATATTT
|
||||
+
|
||||
FFFFHGHHJJJJJJJJJJIIJJIJIJJIFHJIIIJJJIJJJJJJHIIHHHHFFFDEECEEE
|
||||
@SRR2057595.73_TGACA
|
||||
GGTTGCGAGTGCCTAGTGGGCCACTTTTGGTAAGCAGAACTGGCGCTGCGGGA
|
||||
+
|
||||
FFFFGFFHC@EBHGHGAEGIIHIIIIJJJJGHIIIJIJIIGHIJIJJIGGEFD
|
||||
@SRR2057595.74_AATTC
|
||||
GGTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
|
||||
+
|
||||
FFFFDFFHFIJJJGGGGJJGDDDDDDDDDDDDDBDDDDDDBBDDDDDDDDDDDDDDDDDDDBBBDDDDBD>
|
||||
@SRR2057595.77_GCGGA
|
||||
GTTCTCCCACTTCTGAC
|
||||
+
|
||||
FFFFDHHHIJJJIJJJJ
|
||||
@SRR2057595.82_GAGAC
|
||||
GGTTTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+
|
||||
FFFFHHHHJJJJJJJJJJJJJJJJIJJJJJIJIIJJJH
|
||||
@SRR2057595.83_TGGAT
|
||||
GTTGCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+
|
||||
DFFFHHHJJIJJJJJJIJJJIJJIGGHIFHGEH
|
||||
@SRR2057595.86_ACCAC
|
||||
GGTTTTTTTTTAAATGTAAAGCATAAATAAAAAGCCTTTGTGGACTGTGAAAAAAAAAAAAAAAAAAAAAA
|
||||
+
|
||||
FFFFHHHHJJJJJJJJIIJJJJJJJJJIJJJJJJJJJJJJGIJIIJJIJJJJJJHFDDDDDDDDDDDDDB>
|
||||
@SRR2057595.88_TCAGC
|
||||
GGTTCTAAGCATAGATAACCATATATCAGGGGGAGCTCCATGTTCTAGTCCTGCAAGCGCCTGGGCAATAA
|
||||
+
|
||||
FFFFHHHHJJJJJJIJJJJJIJJJJJJIJJIJJIJJJJJJJJJHIJJJJJJIIIHJIHHHFFDDDDEDDD@
|
||||
@SRR2057595.99_TGACA
|
||||
GGTTTCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT
|
||||
+
|
||||
FFFFDHHHIHIIIJJIJJJJJIGEHGFHIJJGHIHADHIIJIJJJIJG
|
||||
120
src/umi_tools/umi_tools_extract/test_data/slim_30.fastq
Normal file
120
src/umi_tools/umi_tools_extract/test_data/slim_30.fastq
Normal file
@@ -0,0 +1,120 @@
|
||||
@SRR2057595.7
|
||||
CAGGTTCAATCTCGGTGGGACCTC
|
||||
+SRR2057595.7
|
||||
1=DFFFFHHHHHJJJFGIJIJJIJ
|
||||
@SRR2057595.9
|
||||
TTGGTTCAATCTGATGCCCTCTTCTGGTGCATCTGAAGACAGCTACAGTGTACTTAGATATAATAAATAAATCTT
|
||||
+SRR2057595.9
|
||||
4=DFDBDHHFHHIGGEHJGGIHGHGGCAFCHGIGEHIJJJJIJJJIHIIIIIIJIIIIIGHIIGGIJGIIJIIJ@
|
||||
@SRR2057595.14
|
||||
TGGGTTAATGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+SRR2057595.14
|
||||
1=DFFFFHHHHHJJIJJJJIGHJJIIJJJJJIJHFHHFFEDEEEEDDDDBDDDD
|
||||
@SRR2057595.22
|
||||
ACGGTTAATGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGC
|
||||
+SRR2057595.22
|
||||
1=DFFFFHHHHHJJJJJJJJIJJJJJJJJJJJJHHHFFFEDEEEEDDDDBDDD
|
||||
@SRR2057595.23
|
||||
GCGGTTATTCCTAAGGCGAGCTCAGGGAGGACAGAAACCTCCCGTGGAGCAGAAGGGCAAAAGCTCGCTTGATCT
|
||||
+SRR2057595.23
|
||||
1=DFFFFHHHHHJJJJJJJJJJJJJJJIJJIIJJJJJJJJJJJJIJJHHHHHFFFFDDDDDDDDDDDDDDDDDDA
|
||||
@SRR2057595.29
|
||||
ACGGTTCTTGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCTT
|
||||
+SRR2057595.29
|
||||
1=DFFFFHHHHHJJJJJJJJHIJJJJJJIJJJJHHHFFDEDEEDDCDDDBDDDDD
|
||||
@SRR2057595.30
|
||||
GAGGTTGAAAATCCGTGCTAAGAAGAA
|
||||
+SRR2057595.30
|
||||
4=DDFFFHHHHHJJJJIJJJJJJJJJJ
|
||||
@SRR2057595.33
|
||||
TCGGTTTATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCGGGCCTGGTTAGTACTTGGATGGGAGACCGCCTGGG
|
||||
+SRR2057595.33
|
||||
1=DFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJDHIJJJJIJJJHGGEEHFFFFFFEDDEDDDDDDDDDDB
|
||||
@SRR2057595.35
|
||||
ACGGTTACTCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+SRR2057595.35
|
||||
1=DDFFFHHHHHJJJJJJIJJJJJJIJJJJJJJHIIJJ
|
||||
@SRR2057595.38
|
||||
GGGGTTACCTGCATGTTTATAGTTTCTAGTTTTGGCATTTTGTGTGGTCTCTTTTTTGTT
|
||||
+SRR2057595.38
|
||||
1=DDFFFHHHHHJJJJJJJJJJHJJIJJJIJJJJJJJJJJJJGIGHJHIJJIJJJJJJJJ
|
||||
@SRR2057595.42
|
||||
TAGGTTGGATAAGTTATACACTGACTAAGTCATCTGTTACTGCCTTCACTGAGTTTTTATTTCCTTT
|
||||
+SRR2057595.42
|
||||
1=DDFFFHHHHHJJJJJJJJJJJJJJJJJIIJJJJGJJJJJJJJJJJJJJJIIHIJJJJJJJIJJJI
|
||||
@SRR2057595.45
|
||||
CTGGTTTGCTGCGGAAGGATCATTA
|
||||
+SRR2057595.45
|
||||
1:DDDDDDDFFDFFAGFE<EB8?BF
|
||||
@SRR2057595.46
|
||||
CAGGTTTTTTGGCTTTTTTTTAAAACCATTTTGTGAAAGGTTTCTGAAACTTGATAATAAAAAGCAGTTGGTGTA
|
||||
+SRR2057595.46
|
||||
4=DDDDDHHHHFIGIJJJJJJJIIIJIIIJJJICHHIGIJFHHGHIEIHGHFHEDFFEFEFEEDEDD@CDD<@B:
|
||||
@SRR2057595.56
|
||||
GGGGTTTCGATGAAGAACGCAGCTAGCTGCGAGAATTAATGTGAATTGCAGGACACATTGATCATCGACACTTCG
|
||||
+SRR2057595.56
|
||||
4=DFFFFHHHHHJJJJJJJJJJJJJJJJJJJIJJJJJIJJJJJJJJJJJJJJJJJJJHHHHHHFFFFFDDDDDDD
|
||||
@SRR2057595.59
|
||||
GCGGTTACCTCCTGTCTTATCATTGTCTTTTGAGCCTGGGCCTTGCCAGGTAGCTCTAGACTGGCCTAGAACTCA
|
||||
+SRR2057595.59
|
||||
1=DFFFFHHHHHJJJJJJJJJJJJ4CHHJJJJJJJJJJJJJJJJJJJJIJDHIJJJIIJJJJJIJJJJHHHHHHB
|
||||
@SRR2057595.60
|
||||
ATGGTTTCATCTCGTCTGATCTCGGAAGCTAAGCAGGGCCGGGCCTGGTTAGTACTTGGATGGGAGACCGCC
|
||||
+SRR2057595.60
|
||||
11BDDFDFFBBBFECFE@HHIBCBG<2CGEC49?1CBD)86:;AB=7C.=;=)77;A3;?C@;96=?@B8;?
|
||||
@SRR2057595.61
|
||||
GAGGTTTAGCAGGACACATTGATCATCGACACTTCGAACGCACTTGCGGCCCCGGGTTCCTCCCGGGGCTACGCC
|
||||
+SRR2057595.61
|
||||
1=DDFFFGHHHHGGHGHIIJGEFEGGFH9GGIIFGGGGIFGDHBG@FGGGHEFCCB?@@CDCCD?B7>B@ACB9<
|
||||
@SRR2057595.65
|
||||
GCGGTTTCGGAGCTTGCTCCGTCCACTCAACGCATCGACCTGGTATTGCAGTACCTCCAGGAACGGTGCACCAAG
|
||||
+SRR2057595.65
|
||||
1=DFFFFHHHHHJJJJJJHJIHHIIIIIIIJHJBHIHBFHHJI@EHJJHHHHHHHFFFBDE?AEBD=AB@CDBD?
|
||||
@SRR2057595.67
|
||||
AAGGTTGGTTTTTGAGGTCCTGCTCGTGCAGGGT
|
||||
+SRR2057595.67
|
||||
1:BDDDFHFHHHHGFHGFGGIIDGHHIGIJJJJ9
|
||||
@SRR2057595.69
|
||||
ATTGGTTATTTTGTTTTTCCTCCTTCTCTTTCTAAA
|
||||
+SRR2057595.69
|
||||
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJIJIJJ
|
||||
@SRR2057595.70
|
||||
TTAGGTTAATTGTAATTTTATGAGGTCCCATTTGTCAATTCTT
|
||||
+SRR2057595.70
|
||||
@@@DDDDD+2CDFA@FBGHCCHFHGBFHGHIGGDHGHIIFCFF
|
||||
@SRR2057595.71
|
||||
TGCGGTTCATATTAGCATGGCCCCTGCGCAAGGATGACACGCAAATTCGTGAAGCGTTCCATATTT
|
||||
+SRR2057595.71
|
||||
CCCFFFFFHHGHHJJJJJJJJJJIIJJIJIJJIFHJIIIJJJIJJJJJJHIIHHHHFFFDEECEEE
|
||||
@SRR2057595.73
|
||||
TGAGGTTCAGCGAGTGCCTAGTGGGCCACTTTTGGTAAGCAGAACTGGCGCTGCGGGA
|
||||
+SRR2057595.73
|
||||
@@@FFFFFHGFFHC@EBHGHGAEGIIHIIIIJJJJGHIIIJIJIIGHIJIJJIGGEFD
|
||||
@SRR2057595.74
|
||||
AATGGTTTCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
|
||||
+SRR2057595.74
|
||||
@CCFFFFFGDFFHFIJJJGGGGJJGDDDDDDDDDDDDDBDDDDDDBBDDDDDDDDDDDDDDDDDDDBBBDDDDBD>
|
||||
@SRR2057595.77
|
||||
GCGGTTCGATCCCACTTCTGAC
|
||||
+SRR2057595.77
|
||||
1=DFFFFHGDHHHIJJJIJJJJ
|
||||
@SRR2057595.82
|
||||
GAGGGTTACTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+SRR2057595.82
|
||||
CBCFFFFFHHHHHJJJJJJJJJJJJJJJJIJJJJJIJIIJJJH
|
||||
@SRR2057595.83
|
||||
TGGGTTGATCCCGGGGCTACGCCTGTCTGAGCGTCGCT
|
||||
+SRR2057595.83
|
||||
1=DDFFFHHHHHJJIJJJJJJIJJJIJJIGGHIFHGEH
|
||||
@SRR2057595.86
|
||||
ACCGGTTACTTTTTTTAAATGTAAAGCATAAATAAAAAGCCTTTGTGGACTGTGAAAAAAAAAAAAAAAAAAAAAA
|
||||
+SRR2057595.86
|
||||
BCCFFFFFHHHHHJJJJJJJJIIJJJJJJJJJIJJJJJJJJJJJJGIJIIJJIJJJJJJHFDDDDDDDDDDDDDB>
|
||||
@SRR2057595.88
|
||||
TCAGGTTGCCTAAGCATAGATAACCATATATCAGGGGGAGCTCCATGTTCTAGTCCTGCAAGCGCCTGGGCAATAA
|
||||
+SRR2057595.88
|
||||
CCCFFFFFHHHHHJJJJJJIJJJJJIJJJJJJIJJIJJIJJJJJJJJJHIJJJJJJIIIHJIHHHFFDDDDEDDD@
|
||||
@SRR2057595.99
|
||||
TGAGGTTCATCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGT
|
||||
+SRR2057595.99
|
||||
B@CFFFFFFDHHHIHIIIJJIJJJJJIGEHGFHIJJGHIHADHIIJIJJJIJG
|
||||
254
target/executable/agat/agat_convert_sp_gff2gtf/.config.vsh.yaml
Normal file
254
target/executable/agat/agat_convert_sp_gff2gtf/.config.vsh.yaml
Normal file
@@ -0,0 +1,254 @@
|
||||
name: "agat_convert_sp_gff2gtf"
|
||||
namespace: "agat"
|
||||
version: "qualimap"
|
||||
authors:
|
||||
- name: "Leïla Paquay"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
email: "leila@data-intuitive.com"
|
||||
github: "Leila011"
|
||||
linkedin: "leilapaquay"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Software Developer"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--gff"
|
||||
alternatives:
|
||||
- "-i"
|
||||
description: "Input GFF/GTF file that will be read"
|
||||
info: null
|
||||
example:
|
||||
- "input.gff"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Outputs"
|
||||
arguments:
|
||||
- type: "file"
|
||||
name: "--output"
|
||||
alternatives:
|
||||
- "-o"
|
||||
- "--out"
|
||||
- "--outfile"
|
||||
- "--gtf"
|
||||
description: "Output GTF file. If no output file is specified, the output will\
|
||||
\ be written to STDOUT."
|
||||
info: null
|
||||
example:
|
||||
- "output.gtf"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: true
|
||||
direction: "output"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- name: "Arguments"
|
||||
arguments:
|
||||
- type: "string"
|
||||
name: "--gtf_version"
|
||||
description: "Version of the GTF output (1,2,2.1,2.2,2.5,3 or relax). Default\
|
||||
\ value from AGAT config file (relax for the default config). The script option\
|
||||
\ has the higher priority. \n\n * relax: all feature types are accepted. \
|
||||
\ \n * GTF3 (9 feature types accepted): gene, transcript, exon, CDS, Selenocysteine,\
|
||||
\ start_codon, stop_codon, three_prime_utr and five_prime_utr. \n * GTF2.5\
|
||||
\ (8 feature types accepted): gene, transcript, exon, CDS, UTR, start_codon,\
|
||||
\ stop_codon, Selenocysteine. \n * GTF2.2 (9 feature types accepted): CDS,\
|
||||
\ start_codon, stop_codon, 5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon.\
|
||||
\ \n * GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon, exon,\
|
||||
\ 5UTR, 3UTR. \n * GTF2 (4 feature types accepted): CDS, start_codon, stop_codon,\
|
||||
\ exon. \n * GTF1 (5 feature types accepted): CDS, start_codon, stop_codon,\
|
||||
\ exon, intron. \n"
|
||||
info: null
|
||||
example:
|
||||
- "3"
|
||||
required: false
|
||||
choices:
|
||||
- "relax"
|
||||
- "1"
|
||||
- "2"
|
||||
- "2.1"
|
||||
- "2.2"
|
||||
- "2.5"
|
||||
- "3"
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
- type: "file"
|
||||
name: "--config"
|
||||
alternatives:
|
||||
- "-c"
|
||||
description: "Input agat config file. By default AGAT takes as input agat_config.yaml\
|
||||
\ file from the working directory if any, otherwise it takes the orignal agat_config.yaml\
|
||||
\ shipped with AGAT. To get the agat_config.yaml locally type: \"agat config\
|
||||
\ --expose\". The --config option gives you the possibility to use your own\
|
||||
\ AGAT config file (located elsewhere or named differently).\n"
|
||||
info: null
|
||||
example:
|
||||
- "custom_agat_config.yaml"
|
||||
must_exist: true
|
||||
create_parent: true
|
||||
required: false
|
||||
direction: "input"
|
||||
multiple: false
|
||||
multiple_sep: ";"
|
||||
resources:
|
||||
- type: "bash_script"
|
||||
path: "script.sh"
|
||||
is_executable: true
|
||||
description: "The script aims to convert any GTF/GFF file into a proper GTF file.\
|
||||
\ Full\ninformation about the format can be found here:\nhttps://agat.readthedocs.io/en/latest/gxf.html\
|
||||
\ You can choose among 7\ndifferent GTF types (1, 2, 2.1, 2.2, 2.5, 3 or relax).\
|
||||
\ Depending the\nversion selected the script will filter out the features that are\
|
||||
\ not\naccepted. For GTF2.5 and 3, every level1 feature (e.g nc_gene\npseudogene)\
|
||||
\ will be converted into gene feature and every level2 feature\n(e.g mRNA ncRNA)\
|
||||
\ will be converted into transcript feature. Using the\n\"relax\" option you will\
|
||||
\ produce a GTF-like output keeping all original\nfeature types (3rd column). No\
|
||||
\ modification will occur e.g. mRNA to\ntranscript.\n\nTo be fully GTF compliant\
|
||||
\ all feature have a gene_id and a transcript_id\nattribute. The gene_id is unique\
|
||||
\ identifier for the genomic source of\nthe transcript, which is used to group transcripts\
|
||||
\ into genes. The\ntranscript_id is a unique identifier for the predicted transcript,\
|
||||
\ which\nis used to group features into transcripts.\n"
|
||||
test_resources:
|
||||
- type: "bash_script"
|
||||
path: "test.sh"
|
||||
is_executable: true
|
||||
- type: "file"
|
||||
path: "test_data"
|
||||
info: null
|
||||
status: "enabled"
|
||||
requirements:
|
||||
commands:
|
||||
- "ps"
|
||||
keywords:
|
||||
- "gene annotations"
|
||||
- "GTF conversion"
|
||||
license: "GPL-3.0"
|
||||
references:
|
||||
doi:
|
||||
- "10.5281/zenodo.3552717"
|
||||
links:
|
||||
repository: "https://github.com/NBISweden/AGAT"
|
||||
homepage: "https://github.com/NBISweden/AGAT"
|
||||
documentation: "https://agat.readthedocs.io/"
|
||||
issue_tracker: "https://github.com/NBISweden/AGAT/issues"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
docker_setup_strategy: "ifneedbepullelsecachedbuild"
|
||||
- type: "nextflow"
|
||||
id: "nextflow"
|
||||
directives:
|
||||
tag: "$id"
|
||||
auto:
|
||||
simplifyInput: true
|
||||
simplifyOutput: false
|
||||
transcript: false
|
||||
publish: false
|
||||
config:
|
||||
labels:
|
||||
mem1gb: "memory = 1000000000.B"
|
||||
mem2gb: "memory = 2000000000.B"
|
||||
mem5gb: "memory = 5000000000.B"
|
||||
mem10gb: "memory = 10000000000.B"
|
||||
mem20gb: "memory = 20000000000.B"
|
||||
mem50gb: "memory = 50000000000.B"
|
||||
mem100gb: "memory = 100000000000.B"
|
||||
mem200gb: "memory = 200000000000.B"
|
||||
mem500gb: "memory = 500000000000.B"
|
||||
mem1tb: "memory = 1000000000000.B"
|
||||
mem2tb: "memory = 2000000000000.B"
|
||||
mem5tb: "memory = 5000000000000.B"
|
||||
mem10tb: "memory = 10000000000000.B"
|
||||
mem20tb: "memory = 20000000000000.B"
|
||||
mem50tb: "memory = 50000000000000.B"
|
||||
mem100tb: "memory = 100000000000000.B"
|
||||
mem200tb: "memory = 200000000000000.B"
|
||||
mem500tb: "memory = 500000000000000.B"
|
||||
mem1gib: "memory = 1073741824.B"
|
||||
mem2gib: "memory = 2147483648.B"
|
||||
mem4gib: "memory = 4294967296.B"
|
||||
mem8gib: "memory = 8589934592.B"
|
||||
mem16gib: "memory = 17179869184.B"
|
||||
mem32gib: "memory = 34359738368.B"
|
||||
mem64gib: "memory = 68719476736.B"
|
||||
mem128gib: "memory = 137438953472.B"
|
||||
mem256gib: "memory = 274877906944.B"
|
||||
mem512gib: "memory = 549755813888.B"
|
||||
mem1tib: "memory = 1099511627776.B"
|
||||
mem2tib: "memory = 2199023255552.B"
|
||||
mem4tib: "memory = 4398046511104.B"
|
||||
mem8tib: "memory = 8796093022208.B"
|
||||
mem16tib: "memory = 17592186044416.B"
|
||||
mem32tib: "memory = 35184372088832.B"
|
||||
mem64tib: "memory = 70368744177664.B"
|
||||
mem128tib: "memory = 140737488355328.B"
|
||||
mem256tib: "memory = 281474976710656.B"
|
||||
mem512tib: "memory = 562949953421312.B"
|
||||
cpu1: "cpus = 1"
|
||||
cpu2: "cpus = 2"
|
||||
cpu5: "cpus = 5"
|
||||
cpu10: "cpus = 10"
|
||||
cpu20: "cpus = 20"
|
||||
cpu50: "cpus = 50"
|
||||
cpu100: "cpus = 100"
|
||||
cpu200: "cpus = 200"
|
||||
cpu500: "cpus = 500"
|
||||
cpu1000: "cpus = 1000"
|
||||
debug: false
|
||||
container: "docker"
|
||||
engines:
|
||||
- type: "docker"
|
||||
id: "docker"
|
||||
image: "quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0"
|
||||
target_registry: "images.viash-hub.com"
|
||||
target_tag: "qualimap"
|
||||
namespace_separator: "/"
|
||||
setup:
|
||||
- type: "docker"
|
||||
run:
|
||||
- "agat --version | sed 's/AGAT\\s\\(.*\\)/agat: \"\\1\"/' > /var/software_versions.txt\n"
|
||||
entrypoint: []
|
||||
cmd: null
|
||||
- type: "native"
|
||||
id: "native"
|
||||
build_info:
|
||||
config: "src/agat/agat_convert_sp_gff2gtf/config.vsh.yaml"
|
||||
runner: "executable"
|
||||
engine: "docker|native"
|
||||
output: "target/executable/agat/agat_convert_sp_gff2gtf"
|
||||
executable: "target/executable/agat/agat_convert_sp_gff2gtf/agat_convert_sp_gff2gtf"
|
||||
viash_version: "0.9.0-RC6"
|
||||
git_commit: "28cd12293505544b3e09ff6343e4724dedb772d3"
|
||||
git_remote: "https://github.com/viash-hub/biobox"
|
||||
package_config:
|
||||
name: "biobox"
|
||||
version: "qualimap"
|
||||
description: "A collection of bioinformatics tools for working with sequence data.\n"
|
||||
info: null
|
||||
viash_version: "0.9.0-RC6"
|
||||
source: "src"
|
||||
target: "target"
|
||||
config_mods:
|
||||
- ".requirements.commands := ['ps']\n"
|
||||
- ".engines += { type: \"native\" }"
|
||||
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
|
||||
- ".engines[.type == 'docker'].target_tag := 'qualimap'"
|
||||
keywords:
|
||||
- "bioinformatics"
|
||||
- "modules"
|
||||
- "sequencing"
|
||||
license: "MIT"
|
||||
organization: "vsh"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/biobox"
|
||||
issue_tracker: "https://github.com/viash-hub/biobox/issues"
|
||||
1188
target/executable/agat/agat_convert_sp_gff2gtf/agat_convert_sp_gff2gtf
Executable file
1188
target/executable/agat/agat_convert_sp_gff2gtf/agat_convert_sp_gff2gtf
Executable file
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,23 @@
|
||||
name: "arriba"
|
||||
version: "qualimap"
|
||||
authors:
|
||||
- name: "Robrecht Cannoodt"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
email: "robrecht@data-intuitive.com"
|
||||
github: "rcannood"
|
||||
orcid: "0000-0003-3641-729X"
|
||||
linkedin: "robrechtcannoodt"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Science Engineer"
|
||||
- name: "Open Problems"
|
||||
href: "https://openproblems.bio"
|
||||
role: "Core Member"
|
||||
argument_groups:
|
||||
- name: "Inputs"
|
||||
arguments:
|
||||
@@ -688,7 +706,7 @@ build_info:
|
||||
output: "target/executable/arriba"
|
||||
executable: "target/executable/arriba/arriba"
|
||||
viash_version: "0.9.0-RC6"
|
||||
git_commit: "e6420cd80f226128b7223ff79ce1297f99993657"
|
||||
git_commit: "28cd12293505544b3e09ff6343e4724dedb772d3"
|
||||
git_remote: "https://github.com/viash-hub/biobox"
|
||||
package_config:
|
||||
name: "biobox"
|
||||
|
||||
@@ -10,6 +10,9 @@
|
||||
# authors of this component should specify the license in the header of such
|
||||
# files, or include a separate license file detailing the licenses of all included
|
||||
# files.
|
||||
#
|
||||
# Component authors:
|
||||
# * Robrecht Cannoodt (author, maintainer)
|
||||
|
||||
set -e
|
||||
|
||||
@@ -748,10 +751,11 @@ FROM quay.io/biocontainers/arriba:2.4.0--h0033a41_2
|
||||
ENTRYPOINT []
|
||||
RUN arriba -h | grep 'Version:' 2>&1 | sed 's/Version:\s\(.*\)/arriba: "\1"/' > /var/software_versions.txt
|
||||
|
||||
LABEL org.opencontainers.image.authors="Robrecht Cannoodt"
|
||||
LABEL org.opencontainers.image.description="Companion container for running component arriba"
|
||||
LABEL org.opencontainers.image.created="2024-07-29T14:42:19Z"
|
||||
LABEL org.opencontainers.image.created="2024-07-29T14:45:24Z"
|
||||
LABEL org.opencontainers.image.source="https://github.com/suhrig/arriba"
|
||||
LABEL org.opencontainers.image.revision="e6420cd80f226128b7223ff79ce1297f99993657"
|
||||
LABEL org.opencontainers.image.revision="28cd12293505544b3e09ff6343e4724dedb772d3"
|
||||
LABEL org.opencontainers.image.version="qualimap"
|
||||
|
||||
VIASHDOCKER
|
||||
|
||||
@@ -1,5 +1,30 @@
|
||||
name: "bcl_convert"
|
||||
version: "qualimap"
|
||||
authors:
|
||||
- name: "Toni Verbeiren"
|
||||
roles:
|
||||
- "author"
|
||||
- "maintainer"
|
||||
info:
|
||||
links:
|
||||
github: "tverbeiren"
|
||||
linkedin: "verbeiren"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist and CEO"
|
||||
- name: "Dorien Roosen"
|
||||
roles:
|
||||
- "author"
|
||||
info:
|
||||
links:
|
||||
email: "dorien@data-intuitive.com"
|
||||
github: "dorien-er"
|
||||
linkedin: "dorien-roosen"
|
||||
organizations:
|
||||
- name: "Data Intuitive"
|
||||
href: "https://www.data-intuitive.com"
|
||||
role: "Data Scientist"
|
||||
argument_groups:
|
||||
- name: "Input arguments"
|
||||
arguments:
|
||||
@@ -281,9 +306,16 @@ status: "enabled"
|
||||
requirements:
|
||||
commands:
|
||||
- "ps"
|
||||
license: "MIT"
|
||||
keywords:
|
||||
- "demultiplex"
|
||||
- "fastq"
|
||||
- "bcl"
|
||||
- "illumina"
|
||||
license: "Proprietary"
|
||||
links:
|
||||
repository: "https://github.com/viash-hub/biobox"
|
||||
homepage: "https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html"
|
||||
documentation: "https://support.illumina.com/downloads/bcl-convert-user-guide.html"
|
||||
runners:
|
||||
- type: "executable"
|
||||
id: "executable"
|
||||
@@ -386,7 +418,7 @@ build_info:
|
||||
output: "target/executable/bcl_convert"
|
||||
executable: "target/executable/bcl_convert/bcl_convert"
|
||||
viash_version: "0.9.0-RC6"
|
||||
git_commit: "e6420cd80f226128b7223ff79ce1297f99993657"
|
||||
git_commit: "28cd12293505544b3e09ff6343e4724dedb772d3"
|
||||
git_remote: "https://github.com/viash-hub/biobox"
|
||||
package_config:
|
||||
name: "biobox"
|
||||
|
||||
@@ -10,6 +10,10 @@
|
||||
# authors of this component should specify the license in the header of such
|
||||
# files, or include a separate license file detailing the licenses of all included
|
||||
# files.
|
||||
#
|
||||
# Component authors:
|
||||
# * Toni Verbeiren (author, maintainer)
|
||||
# * Dorien Roosen (author)
|
||||
|
||||
set -e
|
||||
|
||||
@@ -592,10 +596,11 @@ rm /tmp/bcl-convert.rpm
|
||||
|
||||
RUN echo "bcl-convert: \"$(bcl-convert -V 2>&1 >/dev/null | sed -n '/Version/ s/^bcl-convert\ Version //p')\"" > /var/software_versions.txt
|
||||
|
||||
LABEL org.opencontainers.image.authors="Toni Verbeiren, Dorien Roosen"
|
||||
LABEL org.opencontainers.image.description="Companion container for running component bcl_convert"
|
||||
LABEL org.opencontainers.image.created="2024-07-29T14:42:19Z"
|
||||
LABEL org.opencontainers.image.created="2024-07-29T14:45:25Z"
|
||||
LABEL org.opencontainers.image.source="https://github.com/viash-hub/biobox"
|
||||
LABEL org.opencontainers.image.revision="e6420cd80f226128b7223ff79ce1297f99993657"
|
||||
LABEL org.opencontainers.image.revision="28cd12293505544b3e09ff6343e4724dedb772d3"
|
||||
LABEL org.opencontainers.image.version="qualimap"
|
||||
|
||||
VIASHDOCKER
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user