Build branch minor_improvements with version minor_improvements (ea649a6)

Build pipeline: viash-hub.craftbox.minor-improvements-2sqzv

Source commit: ea649a62ec

Source message: add documentation and bump viash
This commit is contained in:
CI
2025-04-08 07:40:46 +00:00
commit a826ba63bf
49 changed files with 25227 additions and 0 deletions

30
CHANGELOG.md Normal file
View File

@@ -0,0 +1,30 @@
# craftbox 0.2.0
## NEW FEATURES
* `sync_resources`: Sync a Viash package's test resources to the local filesystem (PR #7).
## MINOR CHANGES
* Add documentation to multiple components (PR #9).
* Bump Viash to 0.9.3 (PR #9).
## BUG FIXES
* `untar`: Fix usage of a deprecated environment variable (PR #8).
# craftbox 0.1.0
## NEW FEATURES
* `concat_text`: Concatenate a number of text files
* `csv2fasta`: Convert two columns from a CSV file to FASTA entries (PR #1).
* `untar`: Unpack a .tar file. When the contents of the .tar file is just a single directory,
put the contents of the directory into the output folder instead of that directory (PR #3).
## MINOR CHANGES
* Bump viash to 0.9.0

383
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,383 @@
# Contributing guidelines
We encourage contributions from the community. To contribute:
1. **Fork the Repository**: Start by forking this repository to your account.
2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
3. **Submit a Pull Request**: After testing your component, submit a pull request for review.
## Procedure of adding a component
### Step 1: Find a component to contribute
* Find a tool to contribute to this repo.
* Check whether it is already in the [Project board](https://github.com/orgs/viash-hub/projects/1).
* Check whether there is a corresponding [Snakemake wrapper](https://github.com/snakemake/snakemake-wrappers/blob/master/bio) or [nf-core module](https://github.com/nf-core/modules/tree/master/modules/nf-core) which we can use as inspiration.
* Create an issue to show that you are working on this component.
### Step 2: Add config template
Change all occurrences of `xxx` to the name of the component.
Create a file at `src/xxx/config.vsh.yaml` with contents:
```yaml
name: xxx
description: xxx
keywords: [tag1, tag2]
links:
homepage: yyy
documentation: yyy
issue_tracker: yyy
repository: yyy
references:
doi: 12345/12345678.yz
license: MIT/Apache-2.0/GPL-3.0/...
argument_groups:
- name: Inputs
arguments: <...>
- name: Outputs
arguments: <...>
- name: Arguments
arguments: <...>
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- <...>
runners:
- type: executable
- type: nextflow
```
### Step 3: Fill in the metadata
Fill in the relevant metadata fields in the config. Here is an example of the metadata of an existing component.
```yaml
functionality:
name: arriba
description: Detect gene fusions from RNA-Seq data
keywords: [Gene fusion, RNA-Seq]
links:
homepage: https://arriba.readthedocs.io/en/latest/
documentation: https://arriba.readthedocs.io/en/latest/
repository: https://github.com/suhrig/arriba
issue_tracker: https://github.com/suhrig/arriba/issues
references:
doi: 10.1101/gr.257246.119
bibtex: |
@article{
... a bibtex entry in case the doi is not available ...
}
license: MIT
```
### Step 4: Find a suitable container
Google `biocontainer <name of component>` and find the container that is most suitable. Typically the link will be `https://quay.io/repository/biocontainers/xxx?tab=tags`.
If no such container is found, you can create a custom container in the next step.
### Step 5: Create help file
To help develop the component, we store the `--help` output of the tool in a file at `src/xxx/help.txt`.
````bash
cat <<EOF > src/xxx/help.txt
```sh
xxx --help
```
EOF
docker run quay.io/biocontainers/xxx:tag xxx --help >> src/xxx/help.txt
````
Notes:
* This help file has no functional purpose, but it is useful for the developer to see the help output of the tool.
* Some tools might not have a `--help` argument but instead have a `-h` argument. For example, for `arriba`, the help message is obtained by running `arriba -h`:
```bash
docker run quay.io/biocontainers/arriba:2.4.0--h0033a41_2 arriba -h
```
### Step 6: Create or fetch test data
To help develop the component, it's interesting to have some test data available. In most cases, we can use the test data from the Snakemake wrappers.
To make sure we can reproduce the test data in the future, we store the command to fetch the test data in a file at `src/xxx/test_data/script.sh`.
```bash
cat <<EOF > src/xxx/test_data/script.sh
# clone repo
if [ ! -d /tmp/snakemake-wrappers ]; then
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers
fi
# copy test data
cp -r /tmp/snakemake-wrappers/bio/xxx/test/* src/xxx/test_data
EOF
```
The test data should be suitable for testing this component. Ensure that the test data is small enough: ideally <1KB, preferably <10KB, if need be <100KB.
### Step 7: Add arguments for the input files
By looking at the help file, we add the input arguments to the config file. Here is an example of the input arguments of an existing component.
For instance, in the [arriba help file](src/arriba/help.txt), we see the following:
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-o fusions.tsv [-O fusions.discarded.tsv] \
[OPTIONS]
-x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR
(Aligned.out.sam). Arriba extracts candidate reads from this file.
Based on this information, we can add the following input arguments to the config file.
```yaml
argument_groups:
- name: Inputs
arguments:
- name: --bam
alternatives: -x
type: file
description: |
File in SAM/BAM/CRAM format with main alignments as generated by STAR
(Aligned.out.sam). Arriba extracts candidate reads from this file.
required: true
example: Aligned.out.bam
```
Check the [documentation](https://viash.io/reference/config/functionality/arguments) for more information on the format of input arguments.
Several notes:
* Argument names should be formatted in `--snake_case`. This means arguments like `--foo-bar` should be formatted as `--foo_bar`, and short arguments like `-f` should receive a longer name like `--foo`.
* Input arguments can have `multiple: true` to allow the user to specify multiple files.
### Step 8: Add arguments for the output files
By looking at the help file, we now also add output arguments to the config file.
For example, in the [arriba help file](src/arriba/help.txt), we see the following:
Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
[-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-o fusions.tsv [-O fusions.discarded.tsv] \
[OPTIONS]
-o FILE Output file with fusions that have passed all filters.
-O FILE Output file with fusions that were discarded due to filtering.
Based on this information, we can add the following output arguments to the config file.
```yaml
argument_groups:
- name: Outputs
arguments:
- name: --fusions
alternatives: -o
type: file
direction: output
description: |
Output file with fusions that have passed all filters.
required: true
example: fusions.tsv
- name: --fusions_discarded
alternatives: -O
type: file
direction: output
description: |
Output file with fusions that were discarded due to filtering.
required: false
example: fusions.discarded.tsv
```
Note:
* Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory `foo/` containing files `foo/bar.txt` and `foo/baz.txt`, there should be two output arguments `--bar` and `--baz` (as opposed to one output argument which outputs the whole `foo/` directory).
### Step 9: Add arguments for the other arguments
Finally, add all other arguments to the config file. There are a few exceptions:
* Arguments related to specifying CPU and memory requirements are handled separately and should not be added to the config file.
* Arguments related to printing the information such as printing the version (`-v`, `--version`) or printing the help (`-h`, `--help`) should not be added to the config file.
### Step 10: Add a Docker engine
To ensure reproducibility of components, we require that all components are run in a Docker container.
```yaml
engines:
- type: docker
image: quay.io/biocontainers/xxx:0.1.0--py_0
```
The container should have your tool installed, as well as `ps`.
If you didn't find a suitable container in the previous step, you can create a custom container. For example:
```yaml
engines:
- type: docker
image: python:3.10
setup:
- type: python
packages: numpy
```
For more information on how to do this, see the [documentation](https://viash.io/guide/component/add-dependencies.html#steps-for-creating-a-custom-docker-platform).
Here is a list of base containers we can recommend:
* Bash: [`bash`](https://hub.docker.com/_/bash), [`ubuntu`](https://hub.docker.com/_/ubuntu)
* C#: [`ghcr.io/data-intuitive/dotnet-script`](https://github.com/data-intuitive/ghcr-dotnet-script/pkgs/container/dotnet-script)
* JavaScript: [`node`](https://hub.docker.com/_/node)
* Python: [`python`](https://hub.docker.com/_/python), [`nvcr.io/nvidia/pytorch`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
* R: [`eddelbuettel/r2u`](https://hub.docker.com/r/eddelbuettel/r2u), [`rocker/tidyverse`](https://hub.docker.com/r/rocker/tidyverse)
* Scala: [`sbtscala/scala-sbt`](https://hub.docker.com/r/sbtscala/scala-sbt)
### Step 11: Write a runner script
Next, we need to write a runner script that runs the tool with the input arguments. Create a Bash script named `src/xxx/script.sh` which runs the tool with the input arguments.
```bash
#!/bin/bash
## VIASH START
## VIASH END
xxx \
--input "$par_input" \
--output "$par_output" \
$([ "$par_option" = "true" ] && echo "--option")
```
When building a Viash component, Viash will automatically replace the `## VIASH START` and `## VIASH END` lines (and anything in between) with environment variables based on the arguments specified in the config.
As an example, this is what the Bash script for the `arriba` component looks like:
```bash
#!/bin/bash
## VIASH START
## VIASH END
arriba \
-x "$par_bam" \
-a "$par_genome" \
-g "$par_gene_annotation" \
-o "$par_fusions" \
${par_known_fusions:+-k "${par_known_fusions}"} \
${par_blacklist:+-b "${par_blacklist}"} \
${par_structural_variants:+-d "${par_structural_variants}"} \
$([ "$par_skip_duplicate_marking" = "true" ] && echo "-u") \
$([ "$par_extra_information" = "true" ] && echo "-X") \
$([ "$par_fill_gaps" = "true" ] && echo "-I")
```
### Step 12: Create test script
If the unit test requires test resources, these should be provided in the `test_resources` section of the component.
```yaml
functionality:
# ...
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
```
Create a test script at `src/xxx/test.sh` that runs the component with the test data. This script should run the component (available with `$meta_executable`) with the test data and check if the output is as expected. The script should exit with a non-zero exit code if the output is not as expected. For example:
```bash
#!/bin/bash
## VIASH START
## VIASH END
echo "> Run xxx with test data"
"$meta_executable" \
--input "$meta_resources_dir/test_data/input.txt" \
--output "output.txt" \
--option
echo ">> Checking output"
[ ! -f "output.txt" ] && echo "Output file output.txt does not exist" && exit 1
```
For example, this is what the test script for the `arriba` component looks like:
```bash
#!/bin/bash
## VIASH START
## VIASH END
echo "> Run arriba with blacklist"
"$meta_executable" \
--bam "$meta_resources_dir/test_data/A.bam" \
--genome "$meta_resources_dir/test_data/genome.fasta" \
--gene_annotation "$meta_resources_dir/test_data/annotation.gtf" \
--blacklist "$meta_resources_dir/test_data/blacklist.tsv" \
--fusions "fusions.tsv" \
--fusions_discarded "fusions_discarded.tsv" \
--interesting_contigs "1,2"
echo ">> Checking output"
[ ! -f "fusions.tsv" ] && echo "Output file fusions.tsv does not exist" && exit 1
[ ! -f "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv does not exist" && exit 1
echo ">> Check if output is empty"
[ ! -s "fusions.tsv" ] && echo "Output file fusions.tsv is empty" && exit 1
[ ! -s "fusions_discarded.tsv" ] && echo "Output file fusions_discarded.tsv is empty" && exit 1
```
### Step 12: Create a `/var/software_versions.txt` file
For the sake of transparency and reproducibility, we require that the versions of the software used in the component are documented.
For now, this is managed by creating a file `/var/software_versions.txt` in the `setup` section of the Docker engine.
```yaml
engines:
- type: docker
image: quay.io/biocontainers/xxx:0.1.0--py_0
setup:
- type: docker
run: |
echo "xxx: \"0.1.0\"" > /var/software_versions.txt
```

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024 Data Intuitive
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

72
README.md Normal file
View File

@@ -0,0 +1,72 @@
# 🪡📦 craftbox
[![ViashHub](https://img.shields.io/badge/ViashHub-craftbox-7a4baa.png)](https://web.viash-hub.com/packages/craftbox)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fcraftbox-blue.png)](https://github.com/viash-hub/craftbox)
[![GitHub
License](https://img.shields.io/github/license/viash-hub/craftbox.png)](https://github.com/viash-hub/craftbox/blob/main/LICENSE)
[![GitHub
Issues](https://img.shields.io/github/issues/viash-hub/craftbox.png)](https://github.com/viash-hub/craftbox/issues)
[![Viash
version](https://img.shields.io/badge/Viash-v0.9.0--RC7-blue)](https://viash.io)
A collection of custom-tailored scripts and applied tools.
## Objectives
- **Reusability**: Facilitating the use of components across various
projects and contexts.
- **Reproducibility**: Ensuring that components are reproducible and can
be easily shared.
- **Best Practices**: Adhering to established standards in software
development and bioinformatics.
## Contributing
We encourage contributions from the community. To contribute:
1. **Fork the Repository**: Start by forking this repository to your
account.
2. **Develop Your Component**: Create your Viash component, ensuring it
aligns with our best practices (detailed below).
3. **Submit a Pull Request**: After testing your component, submit a
pull request for review.
## Contribution Guidelines
The contribution guidelines describes which steps you should follow to
contribute a component to this repository.
1. Find a component to contribute
2. Add config template
3. Fill in the metadata
4. Find a suitable container
5. Create help file
6. Create or fetch test data
7. Add arguments for the input files
8. Add arguments for the output files
9. Add arguments for the other arguments
10. Add a Docker engine
11. Write a runner script
12. Create test script
13. Create a `/var/software_versions.txt` file
See the
[CONTRIBUTING](https://github.com/viash-hub/craftbox/blob/main/CONTRIBUTING.md)
file for more details.
## Support and Community
For support, questions, or to join our community:
- **Issues**: Submit questions or issues via the [GitHub issue
tracker](https://github.com/viash-hub/craftbox/issues).
- **Discussions**: Join our discussions via [GitHub
Discussions](https://github.com/viash-hub/craftbox/discussions).
## License
This repository is licensed under an MIT license. See the
[LICENSE](https://github.com/viash-hub/craftbox/blob/main/LICENSE) file
for details.

62
README.qmd Normal file
View File

@@ -0,0 +1,62 @@
---
format: gfm
---
```{r setup, include=FALSE}
project <- yaml::read_yaml("_viash.yaml")
license <- paste0(project$links$repository, "/blob/main/LICENSE")
contributing <- paste0(project$links$repository, "/blob/main/CONTRIBUTING.md")
```
# 🪡📦 `r project$name`
[![ViashHub](https://img.shields.io/badge/ViashHub-`r project$name`-7a4baa)](https://web.viash-hub.com/packages/`r project$name`)
[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2F`r project$name`-blue)](`r project$links$repository`)
[![GitHub License](https://img.shields.io/github/license/viash-hub/`r project$name`)](`r license`)
[![GitHub Issues](https://img.shields.io/github/issues/viash-hub/`r project$name`)](`r project$links$issue_tracker`)
[![Viash version](https://img.shields.io/badge/Viash-v`r gsub("-", "--", project$viash_version)`-blue)](https://viash.io)
`r project$description`
## Objectives
- **Reusability**: Facilitating the use of components across various projects and contexts.
- **Reproducibility**: Ensuring that components are reproducible and can be easily shared.
- **Best Practices**: Adhering to established standards in software development and bioinformatics.
## Contributing
We encourage contributions from the community. To contribute:
1. **Fork the Repository**: Start by forking this repository to your account.
2. **Develop Your Component**: Create your Viash component, ensuring it aligns with our best practices (detailed below).
3. **Submit a Pull Request**: After testing your component, submit a pull request for review.
## Contribution Guidelines
The contribution guidelines describes which steps you should follow to contribute a component to this repository.
```{r echo=FALSE}
lines <- readr::read_lines("CONTRIBUTING.md")
index_start <- grep("^### Step [0-9]*:", lines)
index_end <- c(index_start[-1] - 1, length(lines))
name <- gsub("^### Step [0-9]*: *", "", lines[index_start])
knitr::asis_output(
paste(paste0(" 1. ", name, "\n"), collapse = "")
)
```
See the [CONTRIBUTING](`r contributing`) file for more details.
## Support and Community
For support, questions, or to join our community:
- **Issues**: Submit questions or issues via the [GitHub issue tracker](`r project$links$issue_tracker`).
- **Discussions**: Join our discussions via [GitHub Discussions](`r project$links$repository`/discussions).
## License
This repository is licensed under an MIT license. See the [LICENSE](`r license`) file for details.

13
_viash.yaml Normal file
View File

@@ -0,0 +1,13 @@
name: craftbox
description: |
A collection of custom-tailored scripts and applied tools.
license: MIT
keywords: [scripts, custom, implementations]
links:
issue_tracker: https://github.com/viash-hub/craftbox/issues
repository: https://github.com/viash-hub/craftbox
viash_version: 0.9.3
config_mods: |
.requirements.commands := ['ps']

3
main.nf Normal file
View File

@@ -0,0 +1,3 @@
workflow {
print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
}

6
nextflow.config Normal file
View File

@@ -0,0 +1,6 @@
manifest {
name = "craftbox"
version = "minor_improvements"
defaultBranch = "main"
nextflowVersion = "!>=20.12.1-edge"
}

View File

@@ -0,0 +1,10 @@
name: Dorien Roosen
info:
links:
email: dorien@data-intuitive.com
github: dorien-er
linkedin: dorien-roosen
organizations:
- name: Data Intuitive
href: https://www.data-intuitive.com
role: Data Scientist

View File

@@ -0,0 +1,11 @@
name: Dries Schaumont
info:
links:
email: dries@data-intuitive.com
github: DriesSchaumont
orcid: "0000-0002-4389-0440"
linkedin: dries-schaumont
organizations:
- name: Data Intuitive
href: https://www.data-intuitive.com
role: Data Scientist

View File

@@ -0,0 +1,14 @@
name: Robrecht Cannoodt
info:
links:
email: robrecht@data-intuitive.com
github: rcannood
orcid: "0000-0003-3641-729X"
linkedin: robrechtcannoodt
organizations:
- name: Data Intuitive
href: https://www.data-intuitive.com
role: Data Science Engineer
- name: Open Problems
href: https://openproblems.bio
role: Core Member

View File

@@ -0,0 +1,9 @@
name: Toni Verbeiren
info:
links:
github: tverbeiren
linkedin: verbeiren
organizations:
- name: Data Intuitive
href: https://www.data-intuitive.com
role: Data Scientist and CEO

View File

@@ -0,0 +1,60 @@
name: concat_text
summary: Concatenate a number of text files
description: |
Concatenate a number of text files, handle gzipped text files gracefully and
optionally gzip the output text file.
This component is useful for concatening fastq files from different lanes, for instance.
authors:
- __merge__: /src/_authors/toni_verbeiren.yaml
roles: [ author, maintainer ]
- __merge__: /src/_authors/dries_schaumont.yaml
roles: [ reviewer ]
info:
improvements: |
This component could be improved in 2 ways:
1. Allow for a mix of zipped and plain input files
2. Allow to specify a compression algorithm for the output
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: A list of (gzipped) text files.
type: file
multiple: true
required: true
example: input?.txt.gz
- name: Output arguments
arguments:
- name: "--gzip_output"
type: boolean_true
description: Should the output be zipped?
- name: --output
description: File to write the output to, optionally gzipped.
type: file
direction: output
example: output.txt
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
engines:
- type: docker
image: alpine:latest
setup:
- type: apk
packages:
- bash
- procps
- file
runners:
- type: executable
- type: nextflow

34
src/concat_text/script.sh Normal file
View File

@@ -0,0 +1,34 @@
#!/usr/bin/env bash
set -euo pipefail
TMPDIR=$(mktemp -d "$meta_temp_dir/concat_text-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
par_input="$(echo "$par_input" | tr ';' ' ')"
echo -n ">> Check if input is gzipped... "
set +eo pipefail
file $par_input | grep -q 'gzip'
is_zipped="$?"
set -euo pipefail
[[ "$is_zipped" == "0" ]] && echo "yes" || echo "no"
if [[ "$is_zipped" == "0" ]]; then
echo ">> zcat gzipped files"
zcat $par_input > $TMPDIR/contents
else
echo ">> cat plain files"
cat $par_input > $TMPDIR/contents
fi
if [ "$par_gzip_output" == true ]; then
echo ">> Zip output file"
gzip $TMPDIR/contents
mv $TMPDIR/contents.gz $par_output
else
mv $TMPDIR/contents $par_output
fi

70
src/concat_text/test.sh Normal file
View File

@@ -0,0 +1,70 @@
#!/usr/bin/env bash
set -euo pipefail
echo ">> Creating test input files file[1-3].txt"
INPUT_FILE_1="file1.txt"
INPUT_FILE_2="file2.txt"
INPUT_FILE_3="file3.txt"
echo "one" > "$INPUT_FILE_1"
echo "two" > "$INPUT_FILE_2"
echo "three" > "$INPUT_FILE_3"
echo ">> Created input files"
echo ">> Creating zipped versions at file[1-3].txt.gz"
gzip -k $INPUT_FILE_1
gzip -k $INPUT_FILE_2
gzip -k $INPUT_FILE_3
echo ">> Creating expected output file expected_output.txt and zipped version"
cat > "expected_output.txt" <<EOF
one
two
three
EOF
gzip -k "expected_output.txt"
echo ">> Run component on 3 plain input files, plain output"
$meta_executable \
--input "$INPUT_FILE_1;$INPUT_FILE_2;$INPUT_FILE_3" \
--output "output1.txt"
[[ ! -f "output1.txt" ]] \
&& echo "Output file output1.txt not found!" && exit 1
[[ $(cmp "output1.txt" "expected_output.txt") ]] \
&& echo "Output file output1.txt is not as expected!" && exit 1
echo ">> Run component on 3 zipped input files, plain output"
$meta_executable \
--input "$INPUT_FILE_1.gz;$INPUT_FILE_2.gz;$INPUT_FILE_3.gz" \
--output "output2.txt"
[[ ! -f "output2.txt" ]] \
&& echo "Output file output2.txt not found!" && exit 1
[[ $(cmp "output2.txt" "expected_output.txt") ]] \
&& echo "Output file output2.txt is not as expected!" && exit 1
echo ">> Run component on 3 plain input files, zipped output"
$meta_executable \
--input "$INPUT_FILE_1;$INPUT_FILE_2;$INPUT_FILE_3" \
--output "output3.txt.gz" \
--gzip_output
[[ ! -f "output3.txt.gz" ]] \
&& echo "Output file output3.txt.gz not found!" && exit 1
[[ $(cmp "output3.txt.gz" "expected_output.txt.gz") ]] \
&& echo "Output file output3.txt.gz is not as expected!" && exit 1
echo ">> Run component on 3 zipped input files, zipped output"
$meta_executable \
--input "$INPUT_FILE_1.gz;$INPUT_FILE_2.gz;$INPUT_FILE_3.gz" \
--output "output4.txt.gz" \
--gzip_output
[[ ! -f "output4.txt.gz" ]] \
&& echo "Output file output4.txt.gz not found!" && exit 1
[[ $(cmp "output4.txt.gz" "expected_output.txt.gz") ]] \
&& echo "Output file output4.txt.gz is not as expected!" && exit 1
echo ">> Tests done"

View File

@@ -0,0 +1,110 @@
name: csv2fasta
summary: Convert a CSV file to FASTA entries
description: |
Convert two columns from a CSV file to FASTA entries. The CSV file can
contain an optional header and each row (other than the header) becomes
a single FASTA record. One of the two columns will be used as the names
for the FASTA entries, while the other become the sequences. The sequences
column must only contain characters that are valid IUPAC notation for
nucleotides or a group thereof (wildcard characters).
authors:
- __merge__: /src/_authors/dries_schaumont.yaml
roles: [ author, maintainer ]
- __merge__: /src/_authors/robrecht_cannoodt.yaml
roles: [ reviewer ]
argument_groups:
- name: Inputs
arguments:
- name: --input
type: file
direction: input
example: barcodes.csv
description: CSV file to be processed.
required: true
- name: --header
type: boolean_true
description: |
Parse the first line of the CSV file as a header.
- name: "CSV dialect options"
description: |
Options that can be used to override the automatically detected
dialect of the CSV file.
arguments:
- name: --delimiter
type: string
description: |
Overwrite the column delimiter character.
- name: --quote_character
type: string
description: |
Overwrite the character used to denote the start and end of a quoted item.
- name: "CSV column arguments"
description: |
Parameters for the selection of columns from the CSV file.
Only required when your CSV file contains more than 2 columns,
otherwise the first column will be used for the FASTA header
and the second for the FASTA nucleotide sequences. This default
can still be overwritten by using the options below.
arguments:
- name: --sequence_column
type: string
description: |
Name of the column containing the sequences. Implies 'header'.
Cannot be used together with 'sequence_column_index'.
required: false
- name: "--name_column"
type: string
description: |
Name of the column describing the FASTA headers. Implies 'header'.
Cannot be used together with 'name_column_index'.
required: false
- name: "--sequence_column_index"
type: integer
min: 0
description: |
Index of the column to use as the FASTA sequences, counter from the left and
starting from 0. Cannot be used in combination with the 'sequence_column' argument.
required: false
- name: "--name_column_index"
type: integer
min: 0
description: |
Index of the column to use as the FASTA headers, counter from the left and
starting from 0. Cannot be used in combination with 'name_column'.
required: false
- name: Outputs
arguments:
- name: "--output"
type: file
example: barcodes.fasta
direction: output
description: Output fasta file.
resources:
- type: python_script
path: script.py
test_resources:
- type: python_script
path: test_csv2fasta.py
engines:
- type: docker
image: python:slim
setup:
- type: apt
packages:
- procps
- type: python
packages:
- dnaio
test_setup:
- type: python
packages:
- pytest
- viashpy
runners:
- type: executable
- type: nextflow

102
src/csv2fasta/script.py Normal file
View File

@@ -0,0 +1,102 @@
from pathlib import Path
import dnaio
import csv
## VIASH START
par = {
}
## VIASH END
iupac = frozenset("ABCDGHKMNRSTUVWXY")
def resolve_header_name_to_index(header_entries, column_name):
try:
return header_entries.index(column_name)
except ValueError as e:
raise ValueError(f"Column name '{column_name}' could not "
"be found in the header of the CSV file.") from e
def csv_records(csv_file, delimiter, quote_character,
header, sequence_column, name_column,
sequence_column_index, name_column_index):
with open(csv_file, newline='') as csvfile:
# Deduce CSV dialect based on first 5 lines.
hint = "\n".join([csvfile.readline() for _ in range(5)])
csvfile.seek(0)
dialect = csv.Sniffer().sniff(hint)
reader_args = {"dialect": dialect}
delimiter_arg = {"delimiter": delimiter} if delimiter else {}
quotechar_arg = {"quotechar": quote_character} if delimiter else {}
all_args = reader_args | delimiter_arg | quotechar_arg
csv_reader = csv.reader(csvfile, **all_args)
for linenum, line in enumerate(csv_reader):
if not linenum: # First row
num_columns = len(line)
if header:
if sequence_column:
sequence_column_index = resolve_header_name_to_index(line, sequence_column)
if name_column:
name_column_index = resolve_header_name_to_index(line, name_column)
continue
if not (linenum - header): # First 'data' line
if (not sequence_column_index and not name_column_index and len(line) == 2):
name_column_index, sequence_column_index = 0, 1
if sequence_column_index == name_column_index:
raise ValueError("The same columns were selected for both the FASTQ sequences and "
"headers.")
if sequence_column_index is None:
raise ValueError("Either 'sequence_column_index' or 'sequence_column' needs "
"to be specified.")
if name_column_index is None:
raise ValueError("Either 'name_column' or 'name_column_index' needs to "
"be specified.")
if name_column_index >= num_columns:
raise ValueError(f"Requested to use column number {name_column_index} "
f"(0 based) for the FASTA headers, but only {num_columns} "
"were found on the first line.")
if sequence_column_index >= num_columns:
raise ValueError(f"Requested to use column number {sequence_column_index} "
f"(0 based) for the FASTA sequences, but only {num_columns} "
"were found on the first line.")
if len(line) != num_columns:
raise ValueError(f"Number of columns ({len(line)}) found on line {linenum+1} "
"is different compared to number of columns found "
f"previously ({num_columns}).")
sequence_name, sequence = line[name_column_index], line[sequence_column_index]
invalid_characters = set(sequence.upper()) - iupac
if set(sequence.upper()) - iupac:
raise ValueError(f"The sequence ('{sequence}') found on line {linenum+1} "
f"contains characters ({','.join(invalid_characters)}) "
"which are not valid IUPAC identifiers for nucleotides.")
yield sequence_name, sequence
def main(par):
par['input'], par['output'] = Path(par['input']), Path(par['output'])
sequence_column, name_column = par['sequence_column'], par['name_column']
sequence_column_index, name_column_index = par['sequence_column_index'], par['name_column_index']
if (sequence_column or name_column) and not par['header']:
par["header"] = True
if sequence_column_index and sequence_column:
raise ValueError("Cannot specify both 'sequence_column_index' and 'sequence_column'")
if name_column and name_column_index:
raise ValueError("Cannot specify both 'name_column_index' and 'name_column'")
if (sequence_column_index or name_column_index) and \
(sequence_column_index == name_column_index):
raise ValueError("The value specified for 'sequence_column_index' cannot be the same as "
"the value for 'name_column_index'.")
with dnaio.open(par['output'], mode='w', fileformat="fasta") as writer:
for header, sequence in csv_records(par['input'],
par['delimiter'],
par['quote_character'],
par['header'],
sequence_column,
name_column,
sequence_column_index,
name_column_index):
writer.write(dnaio.SequenceRecord(header, sequence))
if __name__ == "__main__":
main(par)

View File

@@ -0,0 +1,366 @@
import pytest
import re
import sys
from uuid import uuid4
from textwrap import dedent
from subprocess import CalledProcessError
## VIASH START
meta = {
'config': 'src/sequenceformats/csv2fasta/config.vsh.yaml',
'executable': 'target/executable/sequenceformats/csv2fasta'
}
## VIASH END
@pytest.fixture
def random_path(tmp_path):
def wrapper(extension=None):
extension = "" if not extension else f".{extension}"
return tmp_path / f"{uuid4()}{extension}"
return wrapper
@pytest.mark.parametrize("arg,val,expected_err", [("name_column", "barcode_name",
("sequence_column_index", "sequence_column")),
("sequence_column", "sequence",
("name_column", "name_column_index"))])
def test_csvtofasta_no_columns_selected_raises(run_component, random_path, arg, val, expected_err):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
args = [
"--input", input_path,
"--output", output_path,
"--header"
]
args.extend([f"--{arg}", val])
with pytest.raises(CalledProcessError) as err:
run_component(args)
assert f"ValueError: Either '{expected_err[0]}' or '{expected_err[1]}' needs to be specified." in \
err.value.stdout.decode('utf-8')
def test_csvtofasta_column_does_not_exist_raises(run_component, random_path):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
args = [
"--input", input_path,
"--output", output_path,
"--sequence_column", "foo",
]
with pytest.raises(CalledProcessError) as err:
run_component(args)
assert "ValueError: Column name 'foo' could not be found in the " + \
"header of the CSV file." in err.value.stdout.decode('utf-8')
def test_csvtofasta_same_column_selected_raises(run_component, random_path):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
args = [
"--input", input_path,
"--output", output_path,
"--sequence_column_index", "1",
"--name_column_index", "1",
]
with pytest.raises(CalledProcessError) as err:
run_component(args)
assert "ValueError: The value specified for 'sequence_column_index' cannot " + \
"be the same as the value for 'name_column_index'" in \
err.value.stdout.decode('utf-8')
@pytest.mark.parametrize("arg,val,expected_err", [("sequence_column_index", "3", "sequences"),
("name_column_index", "4", "headers")])
def test_csvtofasta_header_select_index_out_of_bounds_raises(run_component, random_path, arg, val, expected_err):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
other_column_map = {
"sequence_column_index": ["--name_column_index", "1"],
"name_column_index": ["--sequence_column_index", "2"],
}
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
args = [
"--input", input_path,
"--output", output_path,
"--header",
]
args += [f"--{arg}", val]
args += other_column_map[arg]
with pytest.raises(CalledProcessError) as err:
run_component(args)
assert f"ValueError: Requested to use column number {val} (0 based) for the FASTA " + \
f"{expected_err}, but only 3 were found on the first line." in \
err.value.stdout.decode('utf-8')
def test_csvtofasta_header_select_column_by_both_name_and_index(run_component, random_path):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--header",
"--name_column", "barcode_name",
"--sequence_column_index", "2",
]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_autodetect_dialect(run_component, random_path):
csv_contents = dedent("""\
barcode_name\tsome_other_column\tsequence
barcode1\tfoo\tACGT
barcode2\tbar\tTTTA
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--header",
"--name_column", "barcode_name",
"--sequence_column_index", "2",
]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
csv_contents = dedent("""\
"barcode_name"\t"some_other_column"\t"sequence"
"barcode1"\t"foo"\t"ACGT"
"barcode2"\t"bar"\t"TTTA"
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--header",
"--name_column", "barcode_name",
"--sequence_column_index", "2",
]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_header_select_column_by_name(run_component, random_path):
csv_contents = dedent("""\
barcode_name,some_other_column,sequence
barcode1,foo,ACGT
barcode2,bar,TTTA
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--header",
"--name_column", "barcode_name",
"--sequence_column", "sequence"
]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_header_2_columns(run_component, random_path):
csv_contents = dedent("""\
barcode_name,sequence
barcode1,ACGT
barcode2,TTTA
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--header"
]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_2_columns(run_component, random_path):
csv_contents = dedent("""\
barcode1,ACGT
barcode2,TTTA
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_2_columns_but_still_swap(run_component, random_path):
csv_contents = dedent("""\
ACGT,barcode1
TTTA,barcode2
""")
expected= dedent("""\
>barcode1
ACGT
>barcode2
TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
run_component([
"--input", input_path,
"--output", output_path,
"--sequence_column_index", "0",
"--name_column_index", "1"]
)
assert output_path.is_file()
with output_path.open('r') as open_output:
output_contents = open_output.read()
assert output_contents == expected
def test_csvtofasta_2_columns_but_not_valid_sequence(run_component, random_path):
csv_contents = dedent("""\
barcodes,sequences
barcode1,ACGT
barcode2,TTTA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
with pytest.raises(CalledProcessError) as err:
run_component([
"--input", input_path,
"--output", output_path]
)
assert re.search(r"ValueError: The sequence \('sequences'\) found on line "
r"1 contains characters \(.+\) which are not valid "
r"IUPAC identifiers for nucleotides\.",
err.value.stdout.decode('utf-8'))
csv_contents = dedent("""\
barcodes,sequences
barcode1,ACGT
barcode2,TTEA
""")
input_path = random_path("csv")
with input_path.open('w') as open_input:
open_input.write(csv_contents)
output_path = random_path("csv")
with pytest.raises(CalledProcessError) as err:
run_component([
"--input", input_path,
"--output", output_path,
"--header"]
)
assert re.search(r"ValueError: The sequence \('TTEA'\) found on line "
r"3 contains characters \(E\) which are not valid "
r"IUPAC identifiers for nucleotides\.",
err.value.stdout.decode('utf-8'))
if __name__ == "__main__":
sys.exit(pytest.main([__file__]))

View File

@@ -0,0 +1,63 @@
name: sync_resources
summary: Sync a Viash package's test resources to the local filesystem
description: |
Sync a Viash package's test resources to the local filesystem based on the
the `.info.test_resources` field in the `_viash.yaml` file. This is useful for
testing and debugging purposes.
usage: |
sync_resources
sync_resources --input _viash.yaml --output .
authors:
- __merge__: /src/_authors/robrecht_cannoodt.yaml
roles: [ author, maintainer ]
- __merge__: /src/_authors/dries_schaumont.yaml
roles: [ reviewer ]
argument_groups:
- name: Inputs
arguments:
- name: "--input"
alternatives: ["-i"]
type: file
description: "Path to the _viash.yaml project configuration file."
default: _viash.yaml
- name: Outputs
arguments:
- name: "--output"
alternatives: ["-o"]
type: file
default: .
direction: output
description: "Path to the directory where the resources will be synced to."
- name: Arguments
arguments:
- name: "--dryrun"
type: boolean_true
description: "Does not display the operations performed from the specified command."
- name: "--exclude"
type: "string"
multiple: true
description: Exclude all files or objects from the command that matches the specified pattern.
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
engines:
- type: docker
image: alpine:3
setup:
- type: apk
packages:
- bash
- rclone
- yq
- type: docker
run:
- rclone config create s3 s3 anonymous=true
- rclone config create gs gcs anonymous=true
runners:
- type: executable
- type: nextflow

View File

@@ -0,0 +1,32 @@
#!/bin/bash
## VIASH START
par_input='_viash.yaml'
par_output='.'
## VIASH END
extra_params=( )
if [ "$par_dryrun" == "true" ]; then
extra_params+=( "--dry-run" )
fi
if [ ! -z ${par_exclude+x} ]; then
IFS=";"
for var in $par_exclude; do
unset IFS
extra_params+=( "--exclude" "$var" )
done
fi
yq e \
'.info.test_resources[] | "{type: " + (.type // "s3") + ", path: " + .path + ", dest: " + .dest + "}"' \
"${par_input}" | \
while read -r line; do
path=$(echo "$line" | yq e '.path')
dest=$(echo "$line" | yq e '.dest')
echo "Syncing '$path' to '$dest'..."
rclone sync "$path" "$par_output/$dest"
done

View File

@@ -0,0 +1,23 @@
#!/bin/bash
## VIASH START
## VIASH END
cat > _viash.yaml << EOM
info:
test_resources:
- type: s3
path: s3://openproblems-data/resources_test/common/pancreas
dest: foo
EOM
echo ">> Run aws s3 sync"
"$meta_executable" \
--input _viash.yaml \
--output . \
--quiet
echo ">> Check whether the right files were copied"
[ ! -f foo/dataset.h5ad ] && echo test file should have been copied && exit 1
echo ">> Test succeeded!"

51
src/untar/config.vsh.yaml Normal file
View File

@@ -0,0 +1,51 @@
name: untar
summary: Unpack a .tar file
description: |
Unpack a .tar file. When the contents of the .tar file is just a single directory,
put the contents of the directory into the output folder instead of that directory.
authors:
- __merge__: /src/_authors/dries_schaumont.yaml
roles: [ author, maintainer ]
- __merge__: /src/_authors/robrecht_cannoodt.yaml
roles: [ reviewer ]
argument_groups:
- name: Input arguments
arguments:
- name: --input
description: Tarball file to be unpacked.
type: file
required: true
- name: Output arguments
arguments:
- name: --output
description: Directory to write the contents of the .tar file to.
type: file
direction: output
required: true
- name: "Other arguments"
arguments:
- name: "--exclude"
alternatives: ["-e"]
type: string
description: Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted.
example: "docs/figures"
required: false
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages:
- procps
runners:
- type: executable
- type: nextflow

41
src/untar/script.sh Normal file
View File

@@ -0,0 +1,41 @@
#!/usr/bin/env bash
set -eo pipefail
extra_args=()
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
# Check if tarball contains 1 top-level directory. If so, extract the contents of the
# directory to the output directory instead of the directory itself.
echo "Directory contents:"
tar -taf "${par_input}" > "$TMPDIR/tar_contents.txt"
cat "$TMPDIR/tar_contents.txt"
printf "Checking if tarball contains only a single top-level directory: "
if [[ $(grep -o -E '^[./]*[^/]+/$' "$TMPDIR/tar_contents.txt" | uniq | wc -l) -eq 1 ]]; then
echo "It does."
echo "Extracting the contents of the top-level directory to the output directory instead of the directory itself."
# The directory can be both of the format './<directory>' (or ././<directory>) or just <directory>
# Adjust the number of stripped components accordingly by looking for './' at the beginning of the file.
starting_relative=$(grep -oP -m 1 '^(./)*' "$TMPDIR/tar_contents.txt" | tr -d '\n' | wc -c)
n_strips=$(( ($starting_relative / 2)+1 ))
extra_args+=("--strip-components=$n_strips")
else
echo "It does not."
fi
if [ "$par_exclude" != "" ]; then
echo "Exclusion of files with wildcard '$par_exclude' requested."
extra_args+=("--exclude=$par_exclude")
fi
echo "Starting extraction of tarball '$par_input' to output directory '$par_output'."
mkdir -p "$par_output"
echo "executing 'tar --no-same-owner --no-same-permissions --directory=$par_output ${extra_args[@]} -xavf $par_input'"
tar --no-same-owner --no-same-permissions --directory="$par_output" ${extra_args[@]} -xavf "$par_input"

126
src/untar/test.sh Normal file
View File

@@ -0,0 +1,126 @@
#!/usr/bin/env bash
set -eo pipefail
# create tempdir
echo ">>> Creating temporary test directory."
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT
echo ">>> Created temporary directory '$TMPDIR'."
INPUT_FILE="$TMPDIR/test_file.txt"
echo ">>> Creating test input file at '$TMPDIR/test_file.txt'."
echo "foo" > "$INPUT_FILE"
echo ">>> Created '$INPUT_FILE'."
echo ">>> Creating tar.gz from '$INPUT_FILE'."
TARFILE="${INPUT_FILE}.tar.gz"
tar -C "$TMPDIR" -czvf ${INPUT_FILE}.tar.gz $(basename "$INPUT_FILE")
[[ ! -f "$TARFILE" ]] && echo ">>> Test setup failed: could not create tarfile." && exit 1
echo ">>> '$TARFILE' created."
echo ">>> Check whether tar.gz can be extracted"
echo ">>> Creating temporary output directory for test 1."
OUTPUT_DIR_1="$TMPDIR/output_test_1/"
mkdir "$OUTPUT_DIR_1"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_1'".
$meta_executable \
--input "$TARFILE" \
--output "$OUTPUT_DIR_1"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_1/test_file.txt" ]] && echo "Output file could not be found. Output directory contents: " && ls "$OUTPUT_DIR_1" && exit 1
echo ">>> Creating temporary output directory for test 2."
OUTPUT_DIR_2="$TMPDIR/output_test_2/"
mkdir "$OUTPUT_DIR_2"
echo ">>> Extracting '$TARFILE' to '$OUTPUT_DIR_2', excluding '$test_file.txt'".
$meta_executable \
--input "$TARFILE" \
--output "$OUTPUT_DIR_2" \
--exclude 'test_file.txt'
echo ">>> Check whether excluded file was not extracted"
[[ -f "$OUTPUT_DIR_2/test_file.txt" ]] && echo "File should have been excluded! Output directory contents:" && ls "$OUTPUT_DIR_2" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory."
mkdir "$TMPDIR/input_test_3/"
cp "$INPUT_FILE" "$TMPDIR/input_test_3/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_3.tar.gz" $(basename "$TMPDIR/input_test_3")
TARFILE_3="$TMPDIR/input_test_3.tar.gz"
echo ">>> Creating temporary output directory for test 3."
OUTPUT_DIR_3="$TMPDIR/output_test_3/"
mkdir "$OUTPUT_DIR_3"
echo "Extracting '$TARFILE_3' to '$OUTPUT_DIR_3'".
$meta_executable \
--input "$TARFILE_3" \
--output "$OUTPUT_DIR_3"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_3/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Check for tar archive that contains a single directory starting with './'."
mkdir "$TMPDIR/input_test_4/"
cp "$INPUT_FILE" "$TMPDIR/input_test_4/"
pushd "$TMPDIR/"
trap popd ERR
tar -czvf "$TMPDIR/input_test_4.tar.gz" ./input_test_4
popd
trap - ERR
OUTPUT_DIR_4="$TMPDIR/output_test_4/"
echo "Extracting '$TMPDIR/input_test_4.tar.gz' to '$OUTPUT_DIR_4'".
$meta_executable \
--input "$TMPDIR/input_test_4.tar.gz" \
--output "$OUTPUT_DIR_4"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_4/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing only 1 top-level directory, but it is nested."
mkdir -p "$TMPDIR/input_test_5/nested/"
cp "$INPUT_FILE" "$TMPDIR/input_test_5/nested/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_5.tar.gz" $(basename "$TMPDIR/input_test_5")
TARFILE_5="$TMPDIR/input_test_5.tar.gz"
echo ">>> Creating temporary output directory for test 5."
OUTPUT_DIR_5="$TMPDIR/output_test_5/"
mkdir "$OUTPUT_DIR_5"
echo "Extracting '$TARFILE_5' to '$OUTPUT_DIR_5'".
$meta_executable \
--input "$TARFILE_5" \
--output "$OUTPUT_DIR_5"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_5/nested/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
echo ">>> Creating test tarball containing two top-level directories."
mkdir -p "$TMPDIR/input_test_6/number_1/"
mkdir "$TMPDIR/input_test_6/number_2/"
cp "$INPUT_FILE" "$TMPDIR/input_test_6/number_1/"
tar -C "$TMPDIR" -czvf "$TMPDIR/input_test_6.tar.gz" $(basename "$TMPDIR/input_test_6")
TARFILE_6="$TMPDIR/input_test_6.tar.gz"
echo ">>> Creating temporary output directory for test 6."
OUTPUT_DIR_6="$TMPDIR/output_test_6/"
mkdir "$OUTPUT_DIR_6"
echo "Extracting '$TARFILE_6' to '$OUTPUT_DIR_6'".
$meta_executable \
--input "$TARFILE_6" \
--output "$OUTPUT_DIR_6"
echo ">>> Check whether extracted file exists"
[[ ! -f "$OUTPUT_DIR_6/number_1/test_file.txt" ]] && echo "Output file could not be found!" && exit 1
[[ ! -d "$OUTPUT_DIR_6/number_2" ]] && echo "Output directory could not be found!" && exit 1
echo ">>> Test finished successfully"

0
target/.build.yaml Normal file
View File

View File

@@ -0,0 +1,202 @@
name: "concat_text"
version: "minor_improvements"
authors:
- name: "Toni Verbeiren"
roles:
- "author"
- "maintainer"
info:
links:
github: "tverbeiren"
linkedin: "verbeiren"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist and CEO"
- name: "Dries Schaumont"
roles:
- "reviewer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "A list of (gzipped) text files."
info: null
example:
- "input?.txt.gz"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "boolean_true"
name: "--gzip_output"
description: "Should the output be zipped?"
info: null
direction: "input"
- type: "file"
name: "--output"
description: "File to write the output to, optionally gzipped."
info: null
example:
- "output.txt"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Concatenate a number of text files"
description: "Concatenate a number of text files, handle gzipped text files gracefully\
\ and\noptionally gzip the output text file.\n\nThis component is useful for concatening\
\ fastq files from different lanes, for instance.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info:
improvements: "This component could be improved in 2 ways:\n 1. Allow for a mix\
\ of zipped and plain input files\n 2. Allow to specify a compression algorithm\
\ for the output\n"
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "alpine:latest"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apk"
packages:
- "bash"
- "procps"
- "file"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/concat_text/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/concat_text"
executable: "target/executable/concat_text/concat_text"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,284 @@
name: "csv2fasta"
version: "minor_improvements"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
- name: "Robrecht Cannoodt"
roles:
- "reviewer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
argument_groups:
- name: "Inputs"
arguments:
- type: "file"
name: "--input"
description: "CSV file to be processed."
info: null
example:
- "barcodes.csv"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--header"
description: "Parse the first line of the CSV file as a header.\n"
info: null
direction: "input"
- name: "CSV dialect options"
description: "Options that can be used to override the automatically detected\n\
dialect of the CSV file.\n"
arguments:
- type: "string"
name: "--delimiter"
description: "Overwrite the column delimiter character.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--quote_character"
description: "Overwrite the character used to denote the start and end of a quoted\
\ item.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "CSV column arguments"
description: "Parameters for the selection of columns from the CSV file.\nOnly required\
\ when your CSV file contains more than 2 columns,\notherwise the first column\
\ will be used for the FASTA header\nand the second for the FASTA nucleotide sequences.\
\ This default\ncan still be overwritten by using the options below.\n"
arguments:
- type: "string"
name: "--sequence_column"
description: "Name of the column containing the sequences. Implies 'header'.\n\
Cannot be used together with 'sequence_column_index'.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--name_column"
description: "Name of the column describing the FASTA headers. Implies 'header'.\n\
Cannot be used together with 'name_column_index'.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--sequence_column_index"
description: "Index of the column to use as the FASTA sequences, counter from\
\ the left and\nstarting from 0. Cannot be used in combination with the 'sequence_column'\
\ argument.\n"
info: null
required: false
min: 0
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--name_column_index"
description: "Index of the column to use as the FASTA headers, counter from the\
\ left and\nstarting from 0. Cannot be used in combination with 'name_column'.\n"
info: null
required: false
min: 0
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Outputs"
arguments:
- type: "file"
name: "--output"
description: "Output fasta file."
info: null
example:
- "barcodes.fasta"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "python_script"
path: "script.py"
is_executable: true
summary: "Convert a CSV file to FASTA entries"
description: "Convert two columns from a CSV file to FASTA entries. The CSV file can\n\
contain an optional header and each row (other than the header) becomes\na single\
\ FASTA record. One of the two columns will be used as the names\nfor the FASTA\
\ entries, while the other become the sequences. The sequences\ncolumn must only\
\ contain characters that are valid IUPAC notation for \nnucleotides or a group\
\ thereof (wildcard characters).\n"
test_resources:
- type: "python_script"
path: "test_csv2fasta.py"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "python:slim"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
- type: "python"
user: false
packages:
- "dnaio"
upgrade: true
test_setup:
- type: "python"
user: false
packages:
- "pytest"
- "viashpy"
upgrade: true
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/csv2fasta/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/csv2fasta"
executable: "target/executable/csv2fasta/csv2fasta"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,224 @@
name: "sync_resources"
version: "minor_improvements"
authors:
- name: "Robrecht Cannoodt"
roles:
- "author"
- "maintainer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
- name: "Dries Schaumont"
roles:
- "reviewer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Inputs"
arguments:
- type: "file"
name: "--input"
alternatives:
- "-i"
description: "Path to the _viash.yaml project configuration file."
info: null
default:
- "_viash.yaml"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Outputs"
arguments:
- type: "file"
name: "--output"
alternatives:
- "-o"
description: "Path to the directory where the resources will be synced to."
info: null
default:
- "."
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Arguments"
arguments:
- type: "boolean_true"
name: "--dryrun"
description: "Does not display the operations performed from the specified command."
info: null
direction: "input"
- type: "string"
name: "--exclude"
description: "Exclude all files or objects from the command that matches the specified\
\ pattern."
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Sync a Viash package's test resources to the local filesystem"
description: "Sync a Viash package's test resources to the local filesystem based\
\ on the\nthe `.info.test_resources` field in the `_viash.yaml` file. This is useful\
\ for\ntesting and debugging purposes.\n"
usage: "sync_resources\nsync_resources --input _viash.yaml --output .\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "alpine:3"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apk"
packages:
- "bash"
- "rclone"
- "yq"
- type: "docker"
run:
- "rclone config create s3 s3 anonymous=true"
- "rclone config create gs gcs anonymous=true"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/sync_resources/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/sync_resources"
executable: "target/executable/sync_resources/sync_resources"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,209 @@
name: "untar"
version: "minor_improvements"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
- name: "Robrecht Cannoodt"
roles:
- "reviewer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Tarball file to be unpacked."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write the contents of the .tar file to."
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "string"
name: "--exclude"
alternatives:
- "-e"
description: "Prevents any file or member whose name matches the shell wildcard\
\ (pattern) from being extracted."
info: null
example:
- "docs/figures"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Unpack a .tar file"
description: "Unpack a .tar file. When the contents of the .tar file is just a single\
\ directory,\nput the contents of the directory into the output folder instead of\
\ that directory.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/untar/config.vsh.yaml"
runner: "executable"
engine: "docker|native"
output: "target/executable/untar"
executable: "target/executable/untar/untar"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

1172
target/executable/untar/untar Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,202 @@
name: "concat_text"
version: "minor_improvements"
authors:
- name: "Toni Verbeiren"
roles:
- "author"
- "maintainer"
info:
links:
github: "tverbeiren"
linkedin: "verbeiren"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist and CEO"
- name: "Dries Schaumont"
roles:
- "reviewer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "A list of (gzipped) text files."
info: null
example:
- "input?.txt.gz"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: true
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "boolean_true"
name: "--gzip_output"
description: "Should the output be zipped?"
info: null
direction: "input"
- type: "file"
name: "--output"
description: "File to write the output to, optionally gzipped."
info: null
example:
- "output.txt"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Concatenate a number of text files"
description: "Concatenate a number of text files, handle gzipped text files gracefully\
\ and\noptionally gzip the output text file.\n\nThis component is useful for concatening\
\ fastq files from different lanes, for instance.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info:
improvements: "This component could be improved in 2 ways:\n 1. Allow for a mix\
\ of zipped and plain input files\n 2. Allow to specify a compression algorithm\
\ for the output\n"
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "alpine:latest"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apk"
packages:
- "bash"
- "procps"
- "file"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/concat_text/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/concat_text"
executable: "target/nextflow/concat_text/main.nf"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'concat_text'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'minor_improvements'
description = 'Concatenate a number of text files, handle gzipped text files gracefully and\noptionally gzip the output text file.\n\nThis component is useful for concatening fastq files from different lanes, for instance.\n'
author = 'Toni Verbeiren, Dries Schaumont'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,106 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "concat_text",
"description": "Concatenate a number of text files, handle gzipped text files gracefully and\noptionally gzip the output text file.\n\nThis component is useful for concatening fastq files from different lanes, for instance.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: List of `file`, required, example: `input?.txt.gz`, multiple_sep: `\";\"`. A list of (gzipped) text files",
"help_text": "Type: List of `file`, required, example: `input?.txt.gz`, multiple_sep: `\";\"`. A list of (gzipped) text files."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"gzip_output": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Should the output be zipped?",
"help_text": "Type: `boolean_true`, default: `false`. Should the output be zipped?"
,
"default":false
}
,
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output.txt`, example: `output.txt`. File to write the output to, optionally gzipped",
"help_text": "Type: `file`, default: `$id.$key.output.txt`, example: `output.txt`. File to write the output to, optionally gzipped."
,
"default":"$id.$key.output.txt"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,284 @@
name: "csv2fasta"
version: "minor_improvements"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
- name: "Robrecht Cannoodt"
roles:
- "reviewer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
argument_groups:
- name: "Inputs"
arguments:
- type: "file"
name: "--input"
description: "CSV file to be processed."
info: null
example:
- "barcodes.csv"
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- type: "boolean_true"
name: "--header"
description: "Parse the first line of the CSV file as a header.\n"
info: null
direction: "input"
- name: "CSV dialect options"
description: "Options that can be used to override the automatically detected\n\
dialect of the CSV file.\n"
arguments:
- type: "string"
name: "--delimiter"
description: "Overwrite the column delimiter character.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--quote_character"
description: "Overwrite the character used to denote the start and end of a quoted\
\ item.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "CSV column arguments"
description: "Parameters for the selection of columns from the CSV file.\nOnly required\
\ when your CSV file contains more than 2 columns,\notherwise the first column\
\ will be used for the FASTA header\nand the second for the FASTA nucleotide sequences.\
\ This default\ncan still be overwritten by using the options below.\n"
arguments:
- type: "string"
name: "--sequence_column"
description: "Name of the column containing the sequences. Implies 'header'.\n\
Cannot be used together with 'sequence_column_index'.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "string"
name: "--name_column"
description: "Name of the column describing the FASTA headers. Implies 'header'.\n\
Cannot be used together with 'name_column_index'.\n"
info: null
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--sequence_column_index"
description: "Index of the column to use as the FASTA sequences, counter from\
\ the left and\nstarting from 0. Cannot be used in combination with the 'sequence_column'\
\ argument.\n"
info: null
required: false
min: 0
direction: "input"
multiple: false
multiple_sep: ";"
- type: "integer"
name: "--name_column_index"
description: "Index of the column to use as the FASTA headers, counter from the\
\ left and\nstarting from 0. Cannot be used in combination with 'name_column'.\n"
info: null
required: false
min: 0
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Outputs"
arguments:
- type: "file"
name: "--output"
description: "Output fasta file."
info: null
example:
- "barcodes.fasta"
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
resources:
- type: "python_script"
path: "script.py"
is_executable: true
summary: "Convert a CSV file to FASTA entries"
description: "Convert two columns from a CSV file to FASTA entries. The CSV file can\n\
contain an optional header and each row (other than the header) becomes\na single\
\ FASTA record. One of the two columns will be used as the names\nfor the FASTA\
\ entries, while the other become the sequences. The sequences\ncolumn must only\
\ contain characters that are valid IUPAC notation for \nnucleotides or a group\
\ thereof (wildcard characters).\n"
test_resources:
- type: "python_script"
path: "test_csv2fasta.py"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "python:slim"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
- type: "python"
user: false
packages:
- "dnaio"
upgrade: true
test_setup:
- type: "python"
user: false
packages:
- "pytest"
- "viashpy"
upgrade: true
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/csv2fasta/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/csv2fasta"
executable: "target/nextflow/csv2fasta/main.nf"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'csv2fasta'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'minor_improvements'
description = 'Convert two columns from a CSV file to FASTA entries. The CSV file can\ncontain an optional header and each row (other than the header) becomes\na single FASTA record. One of the two columns will be used as the names\nfor the FASTA entries, while the other become the sequences. The sequences\ncolumn must only contain characters that are valid IUPAC notation for \nnucleotides or a group thereof (wildcard characters).\n'
author = 'Dries Schaumont, Robrecht Cannoodt'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,194 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "csv2fasta",
"description": "Convert two columns from a CSV file to FASTA entries. The CSV file can\ncontain an optional header and each row (other than the header) becomes\na single FASTA record. One of the two columns will be used as the names\nfor the FASTA entries, while the other become the sequences. The sequences\ncolumn must only contain characters that are valid IUPAC notation for \nnucleotides or a group thereof (wildcard characters).\n",
"type": "object",
"definitions": {
"inputs" : {
"title": "Inputs",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required, example: `barcodes.csv`. CSV file to be processed",
"help_text": "Type: `file`, required, example: `barcodes.csv`. CSV file to be processed."
}
,
"header": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Parse the first line of the CSV file as a header",
"help_text": "Type: `boolean_true`, default: `false`. Parse the first line of the CSV file as a header.\n"
,
"default":false
}
}
},
"outputs" : {
"title": "Outputs",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output.fasta`, example: `barcodes.fasta`. Output fasta file",
"help_text": "Type: `file`, default: `$id.$key.output.fasta`, example: `barcodes.fasta`. Output fasta file."
,
"default":"$id.$key.output.fasta"
}
}
},
"csv dialect options" : {
"title": "CSV dialect options",
"type": "object",
"description": "Options that can be used to override the automatically detected\ndialect of the CSV file.\n",
"properties": {
"delimiter": {
"type":
"string",
"description": "Type: `string`. Overwrite the column delimiter character",
"help_text": "Type: `string`. Overwrite the column delimiter character.\n"
}
,
"quote_character": {
"type":
"string",
"description": "Type: `string`. Overwrite the character used to denote the start and end of a quoted item",
"help_text": "Type: `string`. Overwrite the character used to denote the start and end of a quoted item.\n"
}
}
},
"csv column arguments" : {
"title": "CSV column arguments",
"type": "object",
"description": "Parameters for the selection of columns from the CSV file.\nOnly required when your CSV file contains more than 2 columns,\notherwise the first column will be used for the FASTA header\nand the second for the FASTA nucleotide sequences. This default\ncan still be overwritten by using the options below.\n",
"properties": {
"sequence_column": {
"type":
"string",
"description": "Type: `string`. Name of the column containing the sequences",
"help_text": "Type: `string`. Name of the column containing the sequences. Implies \u0027header\u0027.\nCannot be used together with \u0027sequence_column_index\u0027.\n"
}
,
"name_column": {
"type":
"string",
"description": "Type: `string`. Name of the column describing the FASTA headers",
"help_text": "Type: `string`. Name of the column describing the FASTA headers. Implies \u0027header\u0027.\nCannot be used together with \u0027name_column_index\u0027.\n"
}
,
"sequence_column_index": {
"type":
"integer",
"description": "Type: `integer`. Index of the column to use as the FASTA sequences, counter from the left and\nstarting from 0",
"help_text": "Type: `integer`. Index of the column to use as the FASTA sequences, counter from the left and\nstarting from 0. Cannot be used in combination with the \u0027sequence_column\u0027 argument.\n"
}
,
"name_column_index": {
"type":
"integer",
"description": "Type: `integer`. Index of the column to use as the FASTA headers, counter from the left and\nstarting from 0",
"help_text": "Type: `integer`. Index of the column to use as the FASTA headers, counter from the left and\nstarting from 0. Cannot be used in combination with \u0027name_column\u0027.\n"
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/inputs"
},
{
"$ref": "#/definitions/outputs"
},
{
"$ref": "#/definitions/csv dialect options"
},
{
"$ref": "#/definitions/csv column arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,224 @@
name: "sync_resources"
version: "minor_improvements"
authors:
- name: "Robrecht Cannoodt"
roles:
- "author"
- "maintainer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
- name: "Dries Schaumont"
roles:
- "reviewer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
argument_groups:
- name: "Inputs"
arguments:
- type: "file"
name: "--input"
alternatives:
- "-i"
description: "Path to the _viash.yaml project configuration file."
info: null
default:
- "_viash.yaml"
must_exist: true
create_parent: true
required: false
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Outputs"
arguments:
- type: "file"
name: "--output"
alternatives:
- "-o"
description: "Path to the directory where the resources will be synced to."
info: null
default:
- "."
must_exist: true
create_parent: true
required: false
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Arguments"
arguments:
- type: "boolean_true"
name: "--dryrun"
description: "Does not display the operations performed from the specified command."
info: null
direction: "input"
- type: "string"
name: "--exclude"
description: "Exclude all files or objects from the command that matches the specified\
\ pattern."
info: null
required: false
direction: "input"
multiple: true
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Sync a Viash package's test resources to the local filesystem"
description: "Sync a Viash package's test resources to the local filesystem based\
\ on the\nthe `.info.test_resources` field in the `_viash.yaml` file. This is useful\
\ for\ntesting and debugging purposes.\n"
usage: "sync_resources\nsync_resources --input _viash.yaml --output .\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "alpine:3"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apk"
packages:
- "bash"
- "rclone"
- "yq"
- type: "docker"
run:
- "rclone config create s3 s3 anonymous=true"
- "rclone config create gs gcs anonymous=true"
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/sync_resources/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/sync_resources"
executable: "target/nextflow/sync_resources/main.nf"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'sync_resources'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'minor_improvements'
description = 'Sync a Viash package\'s test resources to the local filesystem based on the\nthe `.info.test_resources` field in the `_viash.yaml` file. This is useful for\ntesting and debugging purposes.\n'
author = 'Robrecht Cannoodt, Dries Schaumont'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,131 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "sync_resources",
"description": "Sync a Viash package\u0027s test resources to the local filesystem based on the\nthe `.info.test_resources` field in the `_viash.yaml` file. This is useful for\ntesting and debugging purposes.\n",
"type": "object",
"definitions": {
"inputs" : {
"title": "Inputs",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, default: `_viash.yaml`. Path to the _viash",
"help_text": "Type: `file`, default: `_viash.yaml`. Path to the _viash.yaml project configuration file."
,
"default":"_viash.yaml"
}
}
},
"outputs" : {
"title": "Outputs",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, default: `$id.$key.output.output`. Path to the directory where the resources will be synced to",
"help_text": "Type: `file`, default: `$id.$key.output.output`. Path to the directory where the resources will be synced to."
,
"default":"$id.$key.output.output"
}
}
},
"arguments" : {
"title": "Arguments",
"type": "object",
"description": "No description",
"properties": {
"dryrun": {
"type":
"boolean",
"description": "Type: `boolean_true`, default: `false`. Does not display the operations performed from the specified command",
"help_text": "Type: `boolean_true`, default: `false`. Does not display the operations performed from the specified command."
,
"default":false
}
,
"exclude": {
"type":
"string",
"description": "Type: List of `string`, multiple_sep: `\";\"`. Exclude all files or objects from the command that matches the specified pattern",
"help_text": "Type: List of `string`, multiple_sep: `\";\"`. Exclude all files or objects from the command that matches the specified pattern."
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/inputs"
},
{
"$ref": "#/definitions/outputs"
},
{
"$ref": "#/definitions/arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}

View File

@@ -0,0 +1,209 @@
name: "untar"
version: "minor_improvements"
authors:
- name: "Dries Schaumont"
roles:
- "author"
- "maintainer"
info:
links:
email: "dries@data-intuitive.com"
github: "DriesSchaumont"
orcid: "0000-0002-4389-0440"
linkedin: "dries-schaumont"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Scientist"
- name: "Robrecht Cannoodt"
roles:
- "reviewer"
info:
links:
email: "robrecht@data-intuitive.com"
github: "rcannood"
orcid: "0000-0003-3641-729X"
linkedin: "robrechtcannoodt"
organizations:
- name: "Data Intuitive"
href: "https://www.data-intuitive.com"
role: "Data Science Engineer"
- name: "Open Problems"
href: "https://openproblems.bio"
role: "Core Member"
argument_groups:
- name: "Input arguments"
arguments:
- type: "file"
name: "--input"
description: "Tarball file to be unpacked."
info: null
must_exist: true
create_parent: true
required: true
direction: "input"
multiple: false
multiple_sep: ";"
- name: "Output arguments"
arguments:
- type: "file"
name: "--output"
description: "Directory to write the contents of the .tar file to."
info: null
must_exist: true
create_parent: true
required: true
direction: "output"
multiple: false
multiple_sep: ";"
- name: "Other arguments"
arguments:
- type: "string"
name: "--exclude"
alternatives:
- "-e"
description: "Prevents any file or member whose name matches the shell wildcard\
\ (pattern) from being extracted."
info: null
example:
- "docs/figures"
required: false
direction: "input"
multiple: false
multiple_sep: ";"
resources:
- type: "bash_script"
path: "script.sh"
is_executable: true
summary: "Unpack a .tar file"
description: "Unpack a .tar file. When the contents of the .tar file is just a single\
\ directory,\nput the contents of the directory into the output folder instead of\
\ that directory.\n"
test_resources:
- type: "bash_script"
path: "test.sh"
is_executable: true
info: null
status: "enabled"
scope:
image: "public"
target: "public"
requirements:
commands:
- "ps"
license: "MIT"
links:
repository: "https://github.com/viash-hub/craftbox"
runners:
- type: "executable"
id: "executable"
docker_setup_strategy: "ifneedbepullelsecachedbuild"
- type: "nextflow"
id: "nextflow"
directives:
tag: "$id"
auto:
simplifyInput: true
simplifyOutput: false
transcript: false
publish: false
config:
labels:
mem1gb: "memory = 1000000000.B"
mem2gb: "memory = 2000000000.B"
mem5gb: "memory = 5000000000.B"
mem10gb: "memory = 10000000000.B"
mem20gb: "memory = 20000000000.B"
mem50gb: "memory = 50000000000.B"
mem100gb: "memory = 100000000000.B"
mem200gb: "memory = 200000000000.B"
mem500gb: "memory = 500000000000.B"
mem1tb: "memory = 1000000000000.B"
mem2tb: "memory = 2000000000000.B"
mem5tb: "memory = 5000000000000.B"
mem10tb: "memory = 10000000000000.B"
mem20tb: "memory = 20000000000000.B"
mem50tb: "memory = 50000000000000.B"
mem100tb: "memory = 100000000000000.B"
mem200tb: "memory = 200000000000000.B"
mem500tb: "memory = 500000000000000.B"
mem1gib: "memory = 1073741824.B"
mem2gib: "memory = 2147483648.B"
mem4gib: "memory = 4294967296.B"
mem8gib: "memory = 8589934592.B"
mem16gib: "memory = 17179869184.B"
mem32gib: "memory = 34359738368.B"
mem64gib: "memory = 68719476736.B"
mem128gib: "memory = 137438953472.B"
mem256gib: "memory = 274877906944.B"
mem512gib: "memory = 549755813888.B"
mem1tib: "memory = 1099511627776.B"
mem2tib: "memory = 2199023255552.B"
mem4tib: "memory = 4398046511104.B"
mem8tib: "memory = 8796093022208.B"
mem16tib: "memory = 17592186044416.B"
mem32tib: "memory = 35184372088832.B"
mem64tib: "memory = 70368744177664.B"
mem128tib: "memory = 140737488355328.B"
mem256tib: "memory = 281474976710656.B"
mem512tib: "memory = 562949953421312.B"
cpu1: "cpus = 1"
cpu2: "cpus = 2"
cpu5: "cpus = 5"
cpu10: "cpus = 10"
cpu20: "cpus = 20"
cpu50: "cpus = 50"
cpu100: "cpus = 100"
cpu200: "cpus = 200"
cpu500: "cpus = 500"
cpu1000: "cpus = 1000"
debug: false
container: "docker"
engines:
- type: "docker"
id: "docker"
image: "debian:stable-slim"
target_registry: "images.viash-hub.com"
target_tag: "minor_improvements"
namespace_separator: "/"
setup:
- type: "apt"
packages:
- "procps"
interactive: false
entrypoint: []
cmd: null
- type: "native"
id: "native"
build_info:
config: "src/untar/config.vsh.yaml"
runner: "nextflow"
engine: "docker|native"
output: "target/nextflow/untar"
executable: "target/nextflow/untar/main.nf"
viash_version: "0.9.3"
git_commit: "ea649a62ec4e05d72aff86ec804287629756416a"
git_remote: "https://github.com/viash-hub/craftbox"
git_tag: "v0.1.0-4-gea649a6"
package_config:
name: "craftbox"
version: "minor_improvements"
description: "A collection of custom-tailored scripts and applied tools.\n"
info: null
viash_version: "0.9.3"
source: "src"
target: "target"
config_mods:
- ".requirements.commands := ['ps']\n"
- ".engines += { type: \"native\" }"
- ".engines[.type == 'docker'].target_registry := 'images.viash-hub.com'"
- ".engines[.type == 'docker'].target_tag := 'minor_improvements'"
keywords:
- "scripts"
- "custom"
- "implementations"
license: "MIT"
organization: "vsh"
links:
repository: "https://github.com/viash-hub/craftbox"
issue_tracker: "https://github.com/viash-hub/craftbox/issues"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
manifest {
name = 'untar'
mainScript = 'main.nf'
nextflowVersion = '!>=20.12.1-edge'
version = 'minor_improvements'
description = 'Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n'
author = 'Dries Schaumont, Robrecht Cannoodt'
}
process.container = 'nextflow/bash:latest'
// detect tempdir
tempDir = java.nio.file.Paths.get(
System.getenv('NXF_TEMP') ?:
System.getenv('VIASH_TEMP') ?:
System.getenv('TEMPDIR') ?:
System.getenv('TMPDIR') ?:
'/tmp'
).toAbsolutePath()
profiles {
no_publish {
process {
withName: '.*' {
publishDir = [
enabled: false
]
}
}
}
mount_temp {
docker.temp = tempDir
podman.temp = tempDir
charliecloud.temp = tempDir
}
docker {
docker.enabled = true
// docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
}
process{
withLabel: mem1gb { memory = 1000000000.B }
withLabel: mem2gb { memory = 2000000000.B }
withLabel: mem5gb { memory = 5000000000.B }
withLabel: mem10gb { memory = 10000000000.B }
withLabel: mem20gb { memory = 20000000000.B }
withLabel: mem50gb { memory = 50000000000.B }
withLabel: mem100gb { memory = 100000000000.B }
withLabel: mem200gb { memory = 200000000000.B }
withLabel: mem500gb { memory = 500000000000.B }
withLabel: mem1tb { memory = 1000000000000.B }
withLabel: mem2tb { memory = 2000000000000.B }
withLabel: mem5tb { memory = 5000000000000.B }
withLabel: mem10tb { memory = 10000000000000.B }
withLabel: mem20tb { memory = 20000000000000.B }
withLabel: mem50tb { memory = 50000000000000.B }
withLabel: mem100tb { memory = 100000000000000.B }
withLabel: mem200tb { memory = 200000000000000.B }
withLabel: mem500tb { memory = 500000000000000.B }
withLabel: mem1gib { memory = 1073741824.B }
withLabel: mem2gib { memory = 2147483648.B }
withLabel: mem4gib { memory = 4294967296.B }
withLabel: mem8gib { memory = 8589934592.B }
withLabel: mem16gib { memory = 17179869184.B }
withLabel: mem32gib { memory = 34359738368.B }
withLabel: mem64gib { memory = 68719476736.B }
withLabel: mem128gib { memory = 137438953472.B }
withLabel: mem256gib { memory = 274877906944.B }
withLabel: mem512gib { memory = 549755813888.B }
withLabel: mem1tib { memory = 1099511627776.B }
withLabel: mem2tib { memory = 2199023255552.B }
withLabel: mem4tib { memory = 4398046511104.B }
withLabel: mem8tib { memory = 8796093022208.B }
withLabel: mem16tib { memory = 17592186044416.B }
withLabel: mem32tib { memory = 35184372088832.B }
withLabel: mem64tib { memory = 70368744177664.B }
withLabel: mem128tib { memory = 140737488355328.B }
withLabel: mem256tib { memory = 281474976710656.B }
withLabel: mem512tib { memory = 562949953421312.B }
withLabel: cpu1 { cpus = 1 }
withLabel: cpu2 { cpus = 2 }
withLabel: cpu5 { cpus = 5 }
withLabel: cpu10 { cpus = 10 }
withLabel: cpu20 { cpus = 20 }
withLabel: cpu50 { cpus = 50 }
withLabel: cpu100 { cpus = 100 }
withLabel: cpu200 { cpus = 200 }
withLabel: cpu500 { cpus = 500 }
withLabel: cpu1000 { cpus = 1000 }
}

View File

@@ -0,0 +1,119 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "untar",
"description": "Unpack a .tar file. When the contents of the .tar file is just a single directory,\nput the contents of the directory into the output folder instead of that directory.\n",
"type": "object",
"definitions": {
"input arguments" : {
"title": "Input arguments",
"type": "object",
"description": "No description",
"properties": {
"input": {
"type":
"string",
"description": "Type: `file`, required. Tarball file to be unpacked",
"help_text": "Type: `file`, required. Tarball file to be unpacked."
}
}
},
"output arguments" : {
"title": "Output arguments",
"type": "object",
"description": "No description",
"properties": {
"output": {
"type":
"string",
"description": "Type: `file`, required, default: `$id.$key.output.output`. Directory to write the contents of the ",
"help_text": "Type: `file`, required, default: `$id.$key.output.output`. Directory to write the contents of the .tar file to."
,
"default":"$id.$key.output.output"
}
}
},
"other arguments" : {
"title": "Other arguments",
"type": "object",
"description": "No description",
"properties": {
"exclude": {
"type":
"string",
"description": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted",
"help_text": "Type: `string`, example: `docs/figures`. Prevents any file or member whose name matches the shell wildcard (pattern) from being extracted."
}
}
},
"nextflow input-output arguments" : {
"title": "Nextflow input-output arguments",
"type": "object",
"description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
"properties": {
"publish_dir": {
"type":
"string",
"description": "Type: `string`, required, example: `output/`. Path to an output directory",
"help_text": "Type: `string`, required, example: `output/`. Path to an output directory."
}
,
"param_list": {
"type":
"string",
"description": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel",
"help_text": "Type: `string`, example: `my_params.yaml`. Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.\n\n* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ [\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027], [\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027] ]`.\n* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.\n* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]`.\n* A yaml blob can also be passed directly as a string. Example: `--param_list \"[ {\u0027id\u0027: \u0027foo\u0027, \u0027input\u0027: \u0027foo.txt\u0027}, {\u0027id\u0027: \u0027bar\u0027, \u0027input\u0027: \u0027bar.txt\u0027} ]\"`.\n\nWhen passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.",
"hidden": true
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input arguments"
},
{
"$ref": "#/definitions/output arguments"
},
{
"$ref": "#/definitions/other arguments"
},
{
"$ref": "#/definitions/nextflow input-output arguments"
}
]
}