The miRspring document is a html document that reproduces a complete HTS data set and is compatible with all major browser types. It is significantly smaller (typically < 3MB) than a BAM file and does not need any internet connectivity to function. The miRspring document can display sequence data as well as analysing the data for a number of processing events and provides visualisation tools to help interpret the data.

The miRspring document starts by loading the "Global" view page which is comprised of a x-y scatter plot and a tabulated list of RNA. The x-y scatter plot can be customised to select a processing feature of your choice while the table has inbuilt sorting functions. A full description of the functions is described by selecting the Global tab above. The "Focused" view displays the sequencing data of a particular miRNA selected from the "Global" view page. A full description of features is described by selecting the "Focused" tab above.

Some features of the miRspring document can be optimised for a viewing session (refer to "Options" tab above). Tabulated stats of all miRNAs can be generated upon selecting a button allowing you to import them into your favourite statistical package for further comparitive analysis. I hope you enjoy using this tool, when it comes to publishing your results please site the following reference and include the miRspring document in your supplementry information.

, Humphreys et al., Nucleic Acids Research 2013

Global view

This is the first page loaded when opening the document and comprises of a X-Y scatter plot and a tabulated list of small RNAs. The X-Y scatter plot can be selected to display a number of processing features and quality control parameters of the data set. When navigating the mouse over a data point on the X-Y scatter plot will display the gene ID and when selected the document will navigate to the Focused view of that particular miRNA.

Global visualisation
       

5' and 3' IsomiRs

Imprecise cutting by Drosha and Dicer can result in sequence variations at both the 5' and 3' ends. The miRspring document provides a X-Y scatter plot showing the either 5' or 3' isomiRs that exist within the dataset. The X-axis represents the abundance of a miRNA while the Y-axis represents the % of tags derived from the hairpin that are a isomiR.

miRNA 5' isomiR plot miRNA 3' isomiR plot

Note the miRspring default setting accepts miRNAs ± 3nt from the entry defined in miRbase. This can be adjusted if needed (see options tab above), anything that falls outside this boundary is then classed as non-canonically processed.

View screencast »

       

Arm processing

Drosha and Dicer cutting of the miRNA precursor leave a dsRNA from which one strand is preferentially loaded into RISC. Typically there is a strong bias for strand selection and historically these processed miRNA were termed mature, while the other strand was called the star ("*"). There are a small number of precursors where there is no strong preference in strand selection, and these processed miRNAs were labelled with an 5p or 3p extension to identify the "arm" they were derived from. HTS has the potential to identify processed miRs from both arms and now all miRNAs are labelled with the 5p or 3p terminology.

The miRspring document graphs the strand selection on a x-y scatter plot. The x-axis represents the abundance of a miRNA in the data set (log scale) while the y-axis represents 5p/(5p+3p). Therefore points that have a low Y-axis value represents hairpins that process mature miRNAs from the 3p arm and vice versa. Data points that sit in the middle of the Y-axis are hairpins that process mature miRNAs from either strand.

5p 3p processing graph

View screencast »

 

Non-canonical processing

The miRspring default setting considers a sequence tag a miRNAs if it is within 3nt of the sequence defined in miRbase. This can be adjusted if needed (see options tab above), anything that falls outside this boundary is then classed as non-canonically processed. Upon selecting the non-canonical processing the X-Y scatter plot graphs abundance on the X-axis (log scale) and the percent of sequence tags from a hairpin that are non-canonically processed.

Non Canonical

We typically observe the hairpins that are processing a small number of miRNA are more likely to have a higher proportion of non-canonically processed tags.

 

miRNA length

Typically most miRNA have a length of 22nt. Sequences derived from a hairpin can vary in length which can be a result of a number of factors ranging from hairpin secondary structure to RNA degradation. The miRspring document can be selected to graph the abundance (X-axis, log scale) versus the average length of all sequences derived from a specific hairpin.

miRNA size XY scatter plot

We typically observe the hairpins that have a low abundance of counts are more likely to be small in size. Note small sequences are much harder to map accurately. \ The default setting when creating the miRspring document is to accept sequences >=18nt, you may want to consider adjust this if there are a large number of short sequences in your data set.

 

Mismatches in data set (RNA Editing?)

The miRspring document records up to one mismatch per sequence tag. This may be a result of an error in the sequencing run OR it can represent a real RNA editing event. This can be a great tool to see the mapping efficiency of your data set, as well as identify potential editing candidates. The X-Y scatter plot graphs abundance on the X-axis and the percent of mapped tags that have a mismatch on the Y-Axis.

miRNA editing graph

The performance of displaying a mismatch can be noticable on the abundant miRNAs in larger data sets. You can select not to display any sequence tags with errors or only sequence tags that represent potential editing events.

 

miRNA clusters

A miRNA cluster are those that are in close proximity to other miRNAs. The genomic window used to define a cluster can be defined or through the option menu or in the "focused" view. Upon selecting miRNA clusters ALL potential clusters are listed on the screen. A cluster is represented as a bar graph where the height of each bar reflects the relative abundance of processed miRs from that hairpin. If the user mouse clicks on a bar it will navigate to the "Focused" view of that miRNA. The transcript direction is highlighted by an text arrow, clusters that have miRNAs transcribed in the same direction could potentially be derived from a polycistronic transcript.

Polyciston

Instances where miRNA are derived from the same genomic location but from opposite strands are highlighted by a text bracket.

Bidirecitonal
 

The "focused" visualisation component of the miRspring document contains the sequencing data for a individual miRNA.

Focused view

The miRspring document has a novel format for displaying sequences where sequences having a common 5' end are displayed on one line. The 3' variations are assigned a colour and their relative abundances are represented in a bar graph using the same colour scheme. The length of the most abundant miRNA on each line is listed in the last column using the corresponding colour.

The two screen shots below are of the same miRNA where sequence tags are listed in either 5' condensed format (top image) or 5' Verbose

5' condensed of miR-449a of Cloonan SREK placenta 5' verbose of miR-449a of Cloonan SREK placenta

The nucleotide frequency graph is aligned above the precursor sequence towards the top of the "focused" visualisation page. This information alone describes what nucleotides of the hairpin are most frequently transcribed. Nucleotides of mature sequences annotated in miRBase are coloured green and the remainder of the hairpin is coloured red, and aligns above the precursor sequence in which the nucleotides have the same coloured scheme. The window used to accept sequence tags a miRNA is highlighted by the black bar (the default is +- 3nt ).

Nucleotide frequency of miR-374a of ago-1 IP libraries

The default miRspring document accepts all sequence tags that are ±3nt from the sequence defined by miRbase. These sequences are padded with dots (ie '.') which align them to the corresponding region on the hairpin. All sequences that do not fall within this boundary are padded with dashes (ie '-'). Note the window used to define canonical from non-canonical can be changed in the options menu of the miRspring document. The default can also be set upon making a miRspring document.

Note that any sequence that is not padded on the 3' size has a mismatch to the reference. See the mismatch side tab for more information.

Non Cannonical processing from miR-452a locus of Cloonan Heart WTAK NGS libraries

The miRspring can be configured to align a user defined length of flanking sequence, and may help users identify mor miRNAs or other non-canonical processing events. The default miRspring document option is to only display the miRbase defined hairpin sequence (ie no flanking sequence), as this helps minimise the need for scrolling on the Y-axis. The number of sequence tags that align to the flanking sequence are shown in the stats panel, and when this number is greater than 0 it is displayed within a selectable button as shown below.

Flanking stats of miR-516a-2 from Cloonan placenta SREK

Selecting the flanking number button will instruct the miRspring document to display the precursor plus flanking sequence. If the miRspring document has been setup as instructed then the flanking sequence will be in lower case, while the precursor sequence will be displayed in upper case. Once the miRspring document has displayed flanking sequence the flanking tag count will be set to a value of zero which when selected will instruct miRspring to revert to displaying only the precursor sequence.

Nucleotide mismatches in the data set may be due to sequencing errors or RNA editing events. The miRspring document can display sequence tags containing up to 1 mismatch to the reference. Mismatched nucleotides are highlighted as a lower case letter in the hairpin sequence, and an upper case in the flanking sequence. Additionally the 3' padding characters (ie '.' or '-') are not displayed after sequence having mismatches.

miR-1248 from Cloonan placenta SREK

Nearby known miRNA that are located within the defined window are displayed as a bargraph (the same format as the cluster miRNA global tool). The height of the bars are relative to each other. The miRNA in focus is coloured green, while the others are red. Clicking on a red bar graph will display the focused view of that particular miRNA.

miR-20a from Cloonan placenta SREK

The text arrow underneath each bar represents the transcription direction (right = +ve strand and vice versa). A text bracket indicates that each miRNA is derived from the same genomic location but transcribed from different directions.

In the upper right corner are stats for the displayed miRNA. This includes the 5' and 3' isomiR counts and percentages for miRbase processed miRNAs derived from each strand. Also shown here are counts for sequences that align to flanking regions. If the number is displayed as a button and selected then the length of the reference sequence will be modified.

A number of configerable options are available for the miRspring document including the following:

  • miRBase window (refer to isomiR tab)
  • Collapsed 5' sequence format
  • Number and type of mismatches
  • Normalisation
  • Cluster window
  • Seed analysis options

Note that if any of these options are reset they cannot be saved and will revert to the default the next time the document is loaded.

The default window to accept a sequence tag as a miRbase processed miRNA is ± 3nt. Any sequence tag that starts within this window will contribute to the total counts of that processed miRNA (i.e. numbers in the 5p or 3p column for the "global view" table). Additionally this window is also used to calculate the proportion of isomiRs.

Users have the option to increase or decrease this window. Any changes made in a viewing session cannot be saved and the default setting will load upon re-opening the document. Note that the default window can be set when first creating the miRspring document.

The miRspring document is designed to show one data set. When comparing one dataset to another it must be normalised, and the miRpsring document can transform the counts using the most basic form of normalisation : RPM. Note that this transformation is only applied to the "Global view" table and not the counts in the "focused view". Note that the default setting is to NOT normalise the data.

The miRspring document can display up to one mismatch per sequence tag. Through the options tab the user can choose to display all mismatch types or selectively only display mismatches that resemble RNA editing events (A to G or C to T). Alternatively the user can opt to display only perfect matching sequences. Note The speed of displaying data will be slower when allowing mismatches.

The miRspring document can display the relative expression level of all miRNAs processed from a hairpin that are in close proximity (ie miRNA clusters) to one another. Through the options menu (and also located in the "focused view" page) the user can adjust the distance required to consider miRNAs being part of a cluster.

The seed region of a miRNA are nucleotides 2 to 8 which anneal with the complementary regions on target RNA. The miRspring document contains a research tool that tabulates the counts of all sequences having common seeds. As the document can be configured to contain other small non-miRNA sequences this function can be set to calculate the seed distribution of all sequences or only miRNA sequences.

 

© Victor Chang Cardiac Research Institute 2012