Command lines:
Run A: meme -dna -mod zoops -nmotifs 20 -evt 5 -maxw 20
Run B: meme -dna -mod zoops -nmotifs 20 -evt 5 -minw 12 -maxw 25
Run C: meme -dna -mod zoops -nmotifs 20 -evt 5 -minw 12 -maxw 28
For further analysis, the results of Run A were modified/combined with
additional motifs as follows:
Motif_2 (Box A) replaced by Motif_2 of Run B
Motif_11 (DSE) replaced by Motif_10 of Run C
Motif_12 added Motif_11 of Run C
The corresponding combined meme file was then
used to annotate the training set (Figure 2 of the main text) and to
classify candidate vRNAs obtained from low-coverage genomes and the NCBI
trace archive.
Command line: mast selbstbau.txt -d Mamm-fin.mfa
The distribution of the elements across the 40 training sequences is tabulated below:
motif | known | pcdh | SMAD4 |
---|---|---|---|
TOTAL | 28 | 12 | |
distal elements | |||
motif 6 | ? | 25 | -- |
motif 8 | ? | 20 | -- |
motif 11 | DSE2 | 19 | -- |
motif 5 | ? | -- | 11 |
motif 9 | ? | -- | 11 |
motif 12 | ? | 3 | 10 |
proximal elements | |||
motif 4 | CRE (PSE) | 25 | 9 |
motif 7 | TATA(-like) | 23 | -- |
vault RNA | |||
motif 2 | BoxA | 28 | 12 |
motif 1 | BoxB | 28 | 11 |
motif 3 | termination | 26 | 11 |
downstream elements | |||
motif 10 | ? | 6 | -- |
motif 13 | ? | -- | 4 |
Notes
(1) There is no DSE1 element (ca. -440nt of TSS) within 500nt upstream of
any human vault RNA, as was describe in "Multiple Human Vault RNAs",
by van Zon et al. 2001.
(2) All rodent vaultRNAs examined (pcdh-locus) have a derived TATA box
(as described in "Identification of conserved vault RNA expression
elements and a non-expre ssed mouse vault RNA gene", by Kickhoefer et
al. 2003, "The Rat Vault RNA Gene Contains a Unique RNA Polymerase III
Promoter Composed of Both External and Internal Elements that Function
Synergistically", by Vilalta et al. 1994), that deviates from a
TATAAT-consensus sequence in more then one nucleotide.