.. _Cluster:

=======================================
Running MPRAsnakeflow on HPC Cluster
=======================================

Snakemake gives us the opportunity to run MPRAsnakeflow in a cluster environment. Please check the Snakemake documentation for more information on how to set up a cluster environment. We use `snakemake resources <https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources>`_ to set the main resources per rule. Most resources are generic and can be used on multipe clusters, environments or even local. We have a preddefined workflow profile with resources: :download:`config.yaml <../profiles/default/config.yaml>`:

.. literalinclude:: ../profiles/default/config.yaml
    :language: yaml
    :lines: 1-30


We used this workflow successfully in a SLURM environment using the `slurm excecutor plugin <https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html>`_ from snakemake. Therfore the partition is set with :code:`slurm_partition` and has to be renamed or removed to fith with your own SLURM configuration.

Running with resources
----------------------

Having 30 cores and 10GB of memory.

.. code-block:: bash

    snakemake --sdm conda --configfile config/config.yaml -c 30 --resources mem_mb=10000  --workflow-profile profiles/default

Performance tweaks: Running specific rules with different resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some of the rule swill benefit from multithreading or more memory. This can be specified within your profile, worflow profile or in the command line interface using :code:`--set-resources RULE_NAME:RESOURCE_NAME=VALUE` or :code:`---set-threads RULE_NAME=VALUE`. Before changing resources make sure that you really need the rule by running a dry run getting the list of executed rules only::code:`snakemamake -n --quiet rules`.

Possible rules to tweaks:

:Assignment:

    :assignment_hybridFWRead_get_reads_by_cutadapt:
        Only needed when using linker option in config. You can add more threads using :code:`--set-threads assignment_hybridFWRead_get_reads_by_cutadapt=4`. Default is always 1 thread.

    :assignment_mapping_bbmap:
        Only needed when using bbmap for mapping. Memory and threads can be optimized e.g. via :code:`--set-threads assignment_mapping_bbmap=30 --set-resources assignment_mapping_bbmap:mem_mb=10000`. Default is 1 thread and 4GB memory but we recommend to use 30 threads and 10GB if available.

    :assignment_mapping_bwa:
        Only needed when using bwa for mapping. Memory and threads can be optimized e.g. via :code:`--set-threads assignment_mapping_bwa=30 --set-resources assignment_mapping_bwa:mem_mb:10000`. Default is 1 thread but we recommend to use 30 threads and 10GB if available.

    :assignment_collectBCs:
        Threads can be optimized e.g. via :code:`--set-threads assignment_collectBCs=30`. Default is 1 thread but we recommend to use 30 threads if available.

:Experiment:

    :counts_onlyFW_raw_counts_by_cutadapt:
        Only needed when you have only FW reads and use the adapter option. Threads can be optimized e.g. via :code:`--set-threads counts_onlyFW_raw_counts_by_cutadapt=30`. Default is 1 thread.


Running on an HPC using SLURM
-----------------------------

Using the slurm excecutor plugin running 300 jobs in parallel.

.. code-block:: bash

    snakemake --sdm conda --configfile config/config.yaml -j 300  --workflow-profile profiles/default --executor slurm


Snakemake 7 (not supported anymore)
-------------------------------------

Here we used the :code:`--cluster` option which is not available in snakemake 8. You can also use the predefined `config/sbatch.yaml` but this might be outdated and we highly recommend to use resources with the workfloe profile. 

.. code-block:: bash

    snakemake --use-conda --configfile config/config.yaml --cluster "sbatch --nodes=1 --ntasks={cluster.threads} --mem={cluster.mem} -t {cluster.time} -p {cluster.queue} -o {cluster.output}" --jobs 100 --cluster-config config/sbatch.yaml

Please note that with this :code:`--cluster` option the log folder of the cluster environment (see :code:` -o {cluster.output}`) has to be generated first, e.g:

.. code-block:: bash

    mkdir -p logs

.. note:: Please consult your cluster's wiki page for cluster specific commands and change cluster Options to reflect these specifications. Additionally, for large libraries, more memory can be specified in this location.