experiment.csvin the format below, including the header. DNA_F or RNA_F is name of the gzipped fastq of the forward read of the DNA or RNA from the defined condition and replicate. DNA_UMI or RNA_UMI is the corresponding index read with UMIs (excluding sample barcodes), and DNA_R or RNA_R of the reverse read.
Multiple fastq files can be used for each column by seperating them with
Right now an UMI have to be used. If you want to use MPRAsnakeflow without an UMI please sitch to MPRAflow or contact us.
Here is an example of an
experiment.csvfile and it can be downloaded
If you would like each insert to be colored based on different user-specified categories, such as positive control, negative control, shuffled control, and putative enhancer. To assess the overall quality, you can create a
label.tsvin the format below that maps the name to category as shown here:
insert1_name label1 insert2_name label1 insert3_name label2
The insert names must exactly match the names in the design FASTA file.
Set up the config file
The config file is the heart of MPRAsnakflow. Here different runs can be configured. We recommend to use one config file per MPRA experiment or MPRA roject. But in theory many different experiments can be configured in only one file. It is divided into
global (generell settings),
assignments (assigment workflow), and
experiments (count workflow including variants).
See Config File for more details about the config file. Here is an example running only the count experiments and using a provided assignment file.
conda activate snakemake snakemake --configfile config/config.yaml --use-conda -p --cores 4
This will run in local mode using 4 cores. Please submit this command to your cluster’s queue if you would like to run a highly parallelized version.
Be sure that the files,
config.yamlare correct. All fastq files for the count/experiment part must be in the same folder given by the
data_folderoption. Please specify your barcode length and umi-length with
The count files generated by the count workflow, are named:
<condition>_<replicate>.merged.config.<config>.tsv.gzand can be found in the
results/experiments/<project>/assigned_counts/folder inside the project folder.