.. _Cluster: ======================================= Running MPRAsnakeflow on HPC Cluster ======================================= Snakemake gives us the opportunity to run MPRAsnakeflow in a cluster environment. Please check the Snakemake documentation for more information on how to set up a cluster environment. We use `snakemake resources `_ to set the main resources per rule. Most resources are generic and can be used on multiple clusters, environments, or even locally. We have a predefined workflow profile with resources: :download:`config.yaml <../../profiles/default/config.yaml>`: .. literalinclude:: ../../profiles/default/config.yaml :language: yaml :lines: 1-30 We used this workflow successfully in a SLURM environment using the `slurm executor plugin `_ from Snakemake. Therefore, the partition is set with :code:`slurm_partition` and has to be renamed or removed to fit with your own SLURM configuration. Running with resources ---------------------- Example: Using 30 cores and 10GB of memory. .. code-block:: bash snakemake --sdm conda --configfile config/config.yaml -c 30 --resources mem_mb=10000 --workflow-profile profiles/default Performance tweaks: Running specific rules with different resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some rules will benefit from multithreading or more memory. This can be specified within your profile, workflow profile, or in the command line interface using :code:`--set-resources RULE_NAME:RESOURCE_NAME=VALUE` or :code:`--set-threads RULE_NAME=VALUE`. Before changing resources, make sure that you really need the rule by running a dry run to get the list of executed rules only: .. code-block:: bash snakemake -n --quiet rules Possible rules to tweak: :Assignment: :assignment_hybridFWRead_get_reads_by_cutadapt: Only needed when using the linker option in the config. You can add more threads using :code:`--set-threads assignment_hybridFWRead_get_reads_by_cutadapt=4`. Default is always 1 thread. :assignment_mapping_bbmap: Only needed when using bbmap for mapping. Memory and threads can be optimized, e.g., via :code:`--set-threads assignment_mapping_bbmap=30 --set-resources assignment_mapping_bbmap:mem_mb=10000`. Default is 1 thread and 4GB memory, but we recommend using 30 threads and 10GB if available. :assignment_mapping_bwa: Only needed when using bwa for mapping. Memory and threads can be optimized, e.g., via :code:`--set-threads assignment_mapping_bwa=30 --set-resources assignment_mapping_bwa:mem_mb=10000`. Default is 1 thread, but we recommend using 30 threads and 10GB if available. :assignment_collectBCs: Threads can be optimized, e.g., via :code:`--set-threads assignment_collectBCs=30`. Default is 1 thread, but we recommend using 30 threads if available. :Experiment: :counts_onlyFW_raw_counts_by_cutadapt: Only needed when you have only FW reads and use the adapter option. Threads can be optimized, e.g., via :code:`--set-threads experiment_counts_onlyFW_raw_counts_by_cutadapt=30`. Default is 1 thread. Running on an HPC using SLURM ----------------------------- Using the SLURM executor plugin to run 300 jobs in parallel: .. code-block:: bash snakemake --sdm conda --configfile config/config.yaml -j 300 --workflow-profile profiles/default --executor slurm Snakemake 7 (not supported anymore) ----------------------------------- In Snakemake 7, we used the :code:`--cluster` option, which is not available in Snakemake 8. You can also use the predefined `config/sbatch.yaml`, but this might be outdated. We highly recommend using resources with the workflow profile. .. code-block:: bash snakemake --use-conda --configfile config/config.yaml --cluster "sbatch --nodes=1 --ntasks={cluster.threads} --mem={cluster.mem} -t {cluster.time} -p {cluster.queue} -o {cluster.output}" --jobs 100 --cluster-config config/sbatch.yaml Please note that with this :code:`--cluster` option, the log folder of the cluster environment (see :code:`-o {cluster.output}`) has to be generated first, e.g.: .. code-block:: bash mkdir -p logs .. note:: Please consult your cluster's wiki page for cluster-specific commands and change cluster options to reflect these specifications. Additionally, for large libraries, more memory can be specified in this location.