Load information from NGS data.
Next-generation sequencing (NGS) methods are a set of different strategies to sequence DNA and related molecules. Because of extreme price reduction of sequencing per basepair as well as the huge spectrum of implementation, NGS is becoming a method of choice for many studies. Particularly, expression analysis is one of the most common tasks. In this sense NGS data is complementary to mass spectrometry-based proteomics data. We can therefore cover all steps of expression regulation starting from epigenetics to the protein level. That's why it is important to include an NGS upload activity to the Perseus workflow.
It specifies the type of NGS experiment the data is derived from (default: RNA sequencing and ribosome profiling).
With the two buttons “Add file” and “Remove file” it is possible to select BAM (Binary-sequence Alignment Format) files. This format is generally used as a compact way to save an alignment and at the same time allowing efficient random access. Perseus doesn't require an index file.
It describes which RNA library preparation method was used. Depending on this, reads will mostly be aligned to the same (sense) or opposite (anti-sense) strand as the feature. Alternatively reads can be aligned to both strands (not stranded).
Let's demonstrate an expected distribution of reads according to an imaginary gene, consisting of one transcript, using each type of strand specificity. All reads marked red are those which we will take into account. Reads marked grey will be excluded from the coverage calculation. We rule out reads with a mark because there is no intersection between them and the gene. Also, we skip b marked reads because they have an opposite direction to that which we expect. Lastly, we eliminate reads which don't fit with the annotation (c) although such reads can potentially be evidence for another isoform of the gene.
Paired end sequencing produces two reads from one fragment and they should be from different strands (if reads is aligned to the same strand, we exclude such pairs). For simplicity of visualization let's represent paired end reads like this
Definition of a, b and c marked reads is similar to single end reads' case.
Paired End Reads | |
---|---|
Stranded (sense) or first read from the pair is on sense strand | ![]() |
Stranded (anti-sense) or first read is on anti-sense strand | ![]() |
Not-Stranded | ![]() |
It's worth to notice that for paired end reads the Persues by default calculates number of fragments for each gene, in other words it doesn't count twice two reads of one pair.
Hint: In case the experimental design isn’t known, we recommend to use “Not stranded” as “Strand specificity”.
Currently the plugin supports GTF file format containing coordinates of genome regions for which the coverage will be calculated. We strongly recommend to download an annotation from ensemble' FTP server.
“empty” columns are denoted with a “.”. Each line with “cds” and “exon” feature should contain “gene_id” or “transcript_id” tags.
Specified file path to the genome annotation.
It is possible to specify a feature name that will be used (third column of GTF).
Hint: It makes sense to set “Feature type name” parameter to “Exons” for RNA-seq analysis and choose “CDS” for ribosome profiling.
Specifies the number of used threads for uploading NGS data.