Frequently Asked Questions

If you have more questions, please write us a ticket on GitHub.

Is it possible to differentiate between sense and antisense?

Usually not, because reads will map to both sequence strands equally. Then, the assignment of the barcode becomes ambiguous and is discarded. However, we have a workaround that adds unique sequence adapters to both ends of the oligos for the reference FASTA and the FASTQs. Now, all mapping strategies should be able to differentiate between sense and antisense. To enable this, use the config option: strand_sensitive: {enable: true}.

The design/reference file check failed. Why?

The design file must meet the following requirements:

Unique headers: Each sequence must have a unique sequence ID starting from > to the first whitespace or newline.
No special characters in headers: Mapping tools create a reference dictionary and cannot handle all characters. Additionally, most databases (like SRA) have a restricted character set for headers. Headers must follow this naming rules:
- The first character must be one of: 0-9, A-Z, a-z, or ! # $ % & + . / : ; ? @ ^ _ | ~ - (notably, * and = are NOT allowed as the first character)
- Subsequent characters may include all of the above, plus * and =
- This prevents headers from starting with * or =, which may be reserved or problematic in downstream tools.
Unique sequences: Sequences must be different in both sense and antisense directions. Otherwise, the mapper places the read to both IDs, and the barcode becomes ambiguous and is discarded.

When you allow min/max start/lengths for sequences (e.g., in BWA mapping), ensure that the smallest substring is unique across all other (sub)sequences. If you have antisense collisions and want to keep strand sensitivity, enable it using the option strand_sensitive: {enable: true} in the config file (see the previous question).

My paired reads of the designed oligos do not overlap. Can I use this pipeline?

No, MPRAsnakeflow currently requires overlapping paired-end reads to reconstruct the full oligo sequence. If your reads do not overlap, consider using single-end (FWD only option) of the assignment workflow or do a resequencing with longer reads to ensure overlap.

MPRAsnakeflow is not able to create a Conda environment

If you encounter an error like:

Caused by: json.decoder.JSONDecodeError: Extra data: line 1 column 2785 (char 2784)#

Try the following steps:

Remove the incomplete metadata:

rm -r .snakemake/metadata .snakemake/incomplete

Retry running MPRAsnakeflow. If the error persists, delete the entire .snakemake folder and rerun:
rm -r .snakemake

Can I use STARR-seq with MPRAsnakeflow?

No, not yet ;-)

The pipeline is giving an error “BUG: Out of jobs ready to be started, but not all files built yet.” How can I fix this?

This error is likely caused by internal issues in Snakemake. Please update Snakemake to the latest version to resolve this issue.