Supplemental PDF

Additional figures and tables are available as seperate PDF.

LOSS: tree-based (L)og-(O)dds (S)ubstitution (S)cores

Proof-of-concept perl implementation: LOSS.tar.gz

The script evaluates substitution patterns of multiple fasta files along a phylogenetic tree. A log-odds score is assigned to each alignment. Scores above 0 indicate that the alignment contains substitutions typical for real splice sites. Negative scores indicate the opposite. Boxplots of the score distributions are given in the supplemental PDF (they help to further interprete the score). Keep in mind that the model was trained on the UCSC 44-way multiz alignments with hg18 as a reference.

Predicted splice sites

The project was actually realized with hg18. Here, we also offer genomic coordinates for hg19 as given by UCSC's liftOver tool.
hg18hg19
Novel donor (5') splice sites: BED (927,693 entries, p>0.5) BED.mapped (927,660 entries, p>0.5)
Novel acceptor (3') splice sites: BED (2,497,067 entries, p>0.5) BED.mapped (2,496,984 entries, p>0.5)

Splice site derived exons

hg18hg19
Novel exons (which passed the EST-SVM): BED FASTA (8,832 entries, p>0.5) BED.mapped (8,829 entries, p>0.5)
Putative coding exons: BED FASTA (938 entries, p>0.5) BED.mapped (937 entries, p>0.5)
Putative non-coding exons: BED FASTA (7,894 entries, p>0.5) BED.mapped (7,892 entries, p>0.5)

Inferred gene structures

hg18hg19
Predicted genes (exon-cluster): GFF (336 genes, 734 exons, p>0.5) GFF.mapped (336 genes, 734 exons, p>0.5)
Putative coding genes: GFF (48 genes, 114 exons, p>0.5) GFF.mapped (48 genes, 114 exons, p>0.5)
Putative non-coding genes: GFF (241 genes, 503 exons, p>0.5) GFF.mapped (241 genes, 503 exons, p>0.5)

Bulk download

hg18hg19
Download all custom tracks: hg18.tar.gz (55 Mb) hg19.tar.gz (57 Mb)

UCSC Genome Browser links

hg18hg19
Selected custom tracks at the UCSC GB.:
(EST-SVM exons, exon-cluster)
hg18.customTracks hg19.customTracks