Examples Overview
This page summarizes all example datasets in MPRAsnakeflow and links to the full step-by-step instructions for each workflow.
Core Workflow Examples
-
Basic assignment-only workflow on 5’/5’ WT MPRA data in HepG2 from Klein et al. (2019). Use this example to learn barcode-to-oligo assignment from raw reads.
-
Basic experiment/count workflow on 5’/5’ WT MPRA data in HepG2 from Klein et al. (2019), using a precomputed assignment file.
-
Combined assignment + experiment workflow on the same HepG2 WT MPRA dataset from Klein et al. (2019), useful as an end-to-end minimal example.
Published Dataset Examples
ENCODE data (Gosai et al. Plasmid based)
ENCODE plasmid-based MPRA in A549 from Tewhey lab, published in Gosai et al. (Nature 2024). Demonstrates assignment preprocessing when barcodes are attached to forward reads and experiment setup with shared DNA input across replicates.
GSE174534 (Abell et al. Complex Read Structure)
Complex-read-structure MPRA from Abell et al. (Science 2022), GEO GSE174534. Focuses on handling non-trivial read layouts, trimming/adapters, strand-sensitive assignment, and reverse-complement design handling.
GSE306816 (Zhang et al. dpSTR MPRA)
Deep perturbation STR MPRA from Zhang et al. (bioRxiv 2025), GEO GSE306816. Shows assignment and counting for repeat-rich constructs and comparison against published assignment files.
GSE293036 (Granitto et al. GWAS variant MPRA)
Multiple sclerosis variant MPRA from Granitto et al. (G3 2025), GEO GSE293036. Demonstrates conversion from supplementary design tables, assignment generation, and condition-specific experiment counting.
GSE316891 (Yan et al. L1a1 MPRA)
L1a1 MPRA dataset from Yan et al., GEO GSE316891. A peer-reviewed paper or preprint link is not currently available in the example source data. Includes assignment generation, experiment counting, and direct comparison between workflow-generated and GEO-provided assignment files.
GSE284330 (Zaratiana et al. STARR-seq like assay)
Processed HepG2 sub1 MPRA data from Zaratiana et al., GEO GSE284330. STARR-seq like assay, therefore not optimal for the workflow. But it uses designed oligonucleotides, and we demonstrate experiment-only processing using oligonucleotides as barcodes and shared DNA input across RNA replicates.
-
Synthetic promoter MPRA from Zahm et al. (Nat Commun. 2024), GEO GSE271608. Demonstrates experiment-only processing using an externally supplied barcode dictionary when the raw assignment-building reads are not available.
-
Splicing-focused reporter assay from Koplik et al. (bioRxiv 2025), GEO GSE307247. Shows how to use MPRAsnakeflow for count quantification in a non-standard MPRA setting with supplied barcode references and UMI-aware counting.
GSE325670 (Hauser et al. Variant MPRA)
Variation/saturation mutagenesis MPRA from Hauser et al. (2026), GEO umbrella GSE325670 (example uses GSE325256). Highlights challenging library assignment settings, including bbmap/bwa-additional-filtering, strand-sensitive assignment, and downstream comparison in experiment counting.