Running MPRAsnakeflow on HPC Cluster

Snakemake gives us the opportunity to run MPRAsnakeflow in a cluster environment. Please check the Snakemake documentation for more information on how to set up a cluster environment. We use snakemake resources to set the main resources per rule. Most resources are generic and can be used on multipe clusters, environments or even local. We have a preddefined workflow profile with resources: config.yaml:

---
software-deployment-method: conda
default-resources:
  slurm_partition: debug
  mem: 2G
  runtime: 60
# error: "logs/%x_%j_%N.err"
# output: "logs/%x_%j_%N.log"
##################
### ASSIGNMENT ###
##################
set-threads:
  assignment_mapping_bwa: 30
  assignment_mapping_bbmap: 30
  assignment_collect: 30
  assignment_collectBCs: 20
  assignment_merge: 10
set-resources:
  assigned_counts_combined_replicates_barcode_output:
    runtime: 60
  assignment_check_design:
    runtime: 240
    slurm_partition: medium
  assignment_hybridFWRead_get_reads_by_length:
    runtime: 1140
    mem: 2G
    slurm_partition: medium
  assignment_hybridFWRead_get_reads_by_cutadapt:
    runtime: 1200
    mem: 4G

We used this workflow successfully in a SLURM environment using the slurm excecutor plugin from snakemake. therfore the partition is set with slurm_partition and has to be renamed maybe due to your environment.

Running with resources

having 30 cores and 10GB of memory.

snakemake --sdm conda --configfile config/config.yaml -c 30 --resources mem_mb=10000  --workflow-profile profiles/default

Running on an HPC using SLURM

Using the slurm excecutor plugin running 300 jobs in parallel.

snakemake --sdm conda --configfile config/config.yaml -j 300  --workflow-profile profiles/default --executor slurm

Snakemake 7

Here we used the cluster option which is not anymore avialable in snakemake 8. You can also use the predefined config/sbatch.yaml but this might be outdated and we highly recommend to use resources with the workfloe profile.

snakemake --use-conda --configfile config/config.yaml --cluster "sbatch --nodes=1 --ntasks={cluster.threads} --mem={cluster.mem} -t {cluster.time} -p {cluster.queue} -o {cluster.output}" --jobs 100 --cluster-config config/sbatch.yaml

Please note that the log folder of the cluster environment has to be generated first, e.g:

mkdir -p logs

Note

Please consult your cluster’s wiki page for cluster specific commands and change cluster Options to reflect these specifications. Additionally, for large libraries, more memory can be specified in this location.