MEME analysis of Eutheria

Training data set: Mamm-fin.mfa

Command lines:
Run A:   meme -dna -mod zoops -nmotifs 20 -evt 5 -maxw 20
Run B:   meme -dna -mod zoops -nmotifs 20 -evt 5 -minw 12 -maxw 25
Run C:   meme -dna -mod zoops -nmotifs 20 -evt 5 -minw 12 -maxw 28

For further analysis, the results of Run A were modified/combined with additional motifs as follows:
Motif_2 (Box A) replaced by Motif_2 of Run B
Motif_11 (DSE) replaced by Motif_10 of Run C
Motif_12 added Motif_11 of Run C

The corresponding combined meme file was then used to annotate the training set (Figure 2 of the main text) and to classify candidate vRNAs obtained from low-coverage genomes and the NCBI trace archive.
Command line: mast selbstbau.txt -d Mamm-fin.mfa

The distribution of the elements across the 40 training sequences is tabulated below:

motif known pcdh SMAD4
TOTAL 28 12
distal elements
motif 6 ? 25 --
motif 8 ? 20 --
motif 11 DSE2 19 --
motif 5 ? -- 11
motif 9 ? -- 11
motif 12 ? 3 10
proximal elements
motif 4 CRE (PSE) 25 9
motif 7 TATA(-like) 23 --
vault RNA
motif 2 BoxA 28 12
motif 1 BoxB 28 11
motif 3 termination 26 11
downstream elements
motif 10 ? 6 --
motif 13 ? -- 4

Notes (1) There is no DSE1 element (ca. -440nt of TSS) within 500nt upstream of any human vault RNA, as was describe in "Multiple Human Vault RNAs", by van Zon et al. 2001.
(2) All rodent vaultRNAs examined (pcdh-locus) have a derived TATA box (as described in "Identification of conserved vault RNA expression elements and a non-expre ssed mouse vault RNA gene", by Kickhoefer et al. 2003, "The Rat Vault RNA Gene Contains a Unique RNA Polymerase III Promoter Composed of Both External and Internal Elements that Function Synergistically", by Vilalta et al. 1994), that deviates from a TATAAT-consensus sequence in more then one nucleotide.