Running MPRAsnakeflow on HPC Cluster
Snakemake gives us the opportunity to run MPRAsnakeflow in a cluster environment. Please check the Snakemake documentation for more information on how to set up a cluster environment. We use snakemake resources to set the main resources per rule. Most resources are generic and can be used on multipe clusters, environments or even local. We have a preddefined workflow profile with resources: config.yaml:
---
software-deployment-method: conda
default-resources:
slurm_partition: debug
mem: 2G
runtime: 60
# error: "logs/%x_%j_%N.err"
# output: "logs/%x_%j_%N.log"
##################
### ASSIGNMENT ###
##################
set-threads:
assignment_mapping_bwa: 30
assignment_mapping_bbmap: 30
assignment_collect: 30
assignment_collectBCs: 20
assignment_merge: 10
set-resources:
assigned_counts_combined_replicates_barcode_output:
runtime: 60
assignment_check_design:
runtime: 240
slurm_partition: medium
assignment_hybridFWRead_get_reads_by_length:
runtime: 1140
mem: 2G
slurm_partition: medium
assignment_hybridFWRead_get_reads_by_cutadapt:
runtime: 1200
mem: 4G
We used this workflow successfully in a SLURM environment using the slurm excecutor plugin from snakemake. therfore the partition is set with slurm_partition and has to be renamed maybe due to your environment.
Running with resources
having 30 cores and 10GB of memory.
snakemake --sdm conda --configfile config/config.yaml -c 30 --resources mem_mb=10000 --workflow-profile profiles/default
Running on an HPC using SLURM
Using the slurm excecutor plugin running 300 jobs in parallel.
snakemake --sdm conda --configfile config/config.yaml -j 300 --workflow-profile profiles/default --executor slurm
Snakemake 7
Here we used the cluster option which is not anymore avialable in snakemake 8. You can also use the predefined config/sbatch.yaml but this might be outdated and we highly recommend to use resources with the workfloe profile.
snakemake --use-conda --configfile config/config.yaml --cluster "sbatch --nodes=1 --ntasks={cluster.threads} --mem={cluster.mem} -t {cluster.time} -p {cluster.queue} -o {cluster.output}" --jobs 100 --cluster-config config/sbatch.yaml
Please note that the log folder of the cluster environment has to be generated first, e.g:
mkdir -p logs
Note
Please consult your cluster’s wiki page for cluster specific commands and change cluster Options to reflect these specifications. Additionally, for large libraries, more memory can be specified in this location.