Tool Overview

Optimal Prime Primer Selection Webtool

This webtool for primer selection is implemented as discussed in the paper about the Local Splicing Variation (LSV-Seq) Method:

Machine learning-optimized targeted detection of alternative splicing by Kevin Yang, Nathaniel Islas, San Jewell, Anupama Jha, Caleb M. Radens, Jeffrey A. Pleiss, Kristen W. Lynch, Yoseph Barash, Peter S. Choi (https://www.biorxiv.org/content/10.1101/2024.09.20.614162v1)

Given a list of ~50-100 base-pair RNA sequences to target for reverse transcription, the webtool will retrieve predictions for all possible primers, and predict the top DNA primers by predicted performance for each input RNA sequence. While “full” mode is more accurate, it focuses on targeting the > 190,000 pre-processed splicing events from the human GTEx dataset across 52 tissues. Splicing events are defined in terms of local splicing variations (LSVs), as defined in the original MAJIQ paper (https://elifesciences.org/articles/11752). “Lite” mode is more flexible, working for a variety of inputs that make it possible to target any transcriptome/any specified sequence of any species.

There are two primary use modes

  • Lite mode. More flexible and requires less tissue-specific information, though less accurate. Works for ANY transcriptome via FASTA file input. Also includes built-in ways to specify chromosomal coordinates for human and mouse transcriptome targeting. Inputs allowed:
    • BED6 file specifying mouse or human chromosome coordinates, with columns 1 to 6 described here (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). Note that the score in column 5 is not used and can be arbitrarily 1, for instance.
    • a FASTA file
    • a selection from a dropdown list of all target LSVs identified in the human GTEx dataset
  • Full mode. More accurate but requires more tissue- and experiment-specific information. Currently works for pre-processed human local splicing variations (LSVs) from the GTEx tissue dataset. LSVs are a way of conceptualizing splicing Inputs allowed:
    • A selection from a dropdown list of all target LSVs identified in the human GTEx dataset

Both modes output a bed file in the same format:

  • A list of the top 10 primer sequences for each region and their corresponding prediction scores, with information columns as follows:
    • “input_id” provides the name of the input region
    • “Final_lsvseq_primer” provides the LSV-Seq primer to order for convenience. The required additional primer sequence containing second sequencing adapter and in vitro transcription promoter is automatically added to rt_priming_reigon to provide the full LSV-Seq primer
    • “rt_priming_region” directly provides the reverse transcription primer sequence assessed by the model
    • “Length”, “Expression”, and “Tm" are primer length, target expression, and primer melting temperature respectively. “Expression” column is left empty for Lite model since it is not used in model predictions”
    • “Predicted_specificity_value”, s, is the specificity model prediction output. It should be interpreted as a proportion bound between 0 and 1. Higher is better.
    • “Predicted_amplification_value”, a, is the amplification model prediction output. It should be interpreted as a log value. Higher is better.
    • “Transformed_predicted_values” is the final combined value across the 2 model predictions used to rank the primers, specifically using the formula s*(λa). Lambda is set to 1.2 by default.

Evaluation of the performance of each mode is described in the paper.