Identification of Bichir PCR Sequences

For the sequences from paralog groups 1, 2, 3, 8, 9, 10, 11, and 13 it ist straight forward to determine membership in the paralog group by means of the inferred protein sequence. For the middle group genes of PGs 4-7 we used quartett mapping as well as neighborjoining trees (not shown). In the quartett mapping analysis we first use PG4, PG5, and the combined sequences from PG6/7. In the second step we recompute the QM analysis for the PG5-7 sequences.

Table 1: Quartett Mapping for Middle Group Genes

Sequence Note 4 5 6/7   5 6 7 PG
Ps_X1-M 0.3009 0.3189 0.3801 6/7 0.2922 0.4679 0.2400 6
Ps_X2-M 0.3356 0.3103 0.3541 6/7 0.3004 0.4155 0.2841 6
Ps_X4-M A4 0.4364 0.2656 0.2980 4 4
Ps_X5-M 0.3834 0.3476 0.2689 4 4
Ps_X6-M 0.3916 0.2899 0.3185 4 4
PsA5_c A5 0.2987 0.4040 0.2972 5 0.3962 0.3328 0.2710 5
PsA6_c A6 0.3514 0.2758 0.3728 6/7 0.2707 0.4749 0.2544 6
Pb5-7-1 B5 0.3217 0.3888 0.2896 5 0.3921 0.3463 0.2617 5

Table 2: Neighbor joining and parsimony trees. Trees were computed using phylip with 1000 bootstrap replicates.

PG alignment neighor joining parsimony
1 phylip tree tree
2 phylip tree tree
3 phylip tree tree
4 phylip tree tree
5 phylip tree tree
6 phylip tree tree
8 phylip tree tree
9 phylip tree tree
10 phylip tree tree
11 phylip tree tree
13 phylip tree tree

Table 3: Identification of PCR fragments. For neighbor-joining and parsimony trees we list bootstrap value >20% for the most significant association of the query sequence with a group of sequences from the listed paralogy group. For quartett mapping we list the support (>0.40) for the preferred tree. [Note that 0.333 corresponds to the absence of phylogenetic information.] For paralog groups 2, 6, 8, and 11 sequences from only three clusters were available. For paralog groups 1, 3, 4, 5, 9, 10, and 13 there are known sequences from all four paralog groups. In this case we consider all trees of the form (Q,C1)(C2,(C3,C4)) and (Q,(C3,C4))(C1,C2) where Q is the query sequence, C1 through C4 are the collections of known sequences from the four clusters and (C3,C4) are combined into a single set for the purpose of the quartett mapping analysis.

Sequence Lab.No. Genomic seq. NJ Parsi quartett(3) quartett(4) confidence
Ps_X1-1 88-1-2 A1 [Chiu:04] A1 (0.62) A1 (0.42) (A1) A1 *****
Ps_X2-1 33A-7 - ? ? B1 [0.41] B1 *
Ps_X1-2 92-1-18 A2 [Chiu:04] A2 (0.31) ? A2 [0.49] A2 *****
Ps_X2-2 104-1a-27 - (B2) ? B2 [0.43] B2 ***
Ps_X3-2 133-2-23 D2 [AC135508] ? ? ? D2 *****
Ps_X1-3 104-1a-2 A3 [Chiu:04] A3 (0.25) A3 (0.20) A3 [0.41] A3 *****
Ps_X2-3 71-2-7 - C3 (0.21) C3 (0.18) C3 [0.43] C3 **
Pb8_9-3 D3 [AC135508] D3 (0.45) D3 (0.36) D3 [0.52] D3 *****
Ps_X4-M 71-3-3 A4 [Chiu:04] A4 (0.55) A4 (0.41) A4 [0.45] A4 *****
Ps_X5-M 8-5 - C4 (0.88) C4 (0.79) C4 [0.63] C4 ****
Ps_X6-M 71-3-10 - B4 (0.55) B4 (0.36) B4 [0.58] B4 ***
PsA5_c A5 [Chiu:04] A5 (0.57) A5 (0.37) A5 [0.46] A5 *****
PsB5-7-1 B5 [AC138147] B5 (0.33) D5 (0.24) D5 [0.49] B5 *****
Ps_X1-M 104-1a-6 - C6 (0.27) C6 (0.21) C6 [0.44] C6 **
Ps_X2-M 104-1a-12 - A6 (0.22) A6 (0.25) (A6) B6 *
PsA6_c A6 [Chiu:04] A6 (0.54) A6 (0.33) (A6) A6 *****
PsB7-x1 B7 [AC138147] -- -- -- -- B7 *****
Ps_X1-8 104-1a-63 - C8 (0.28) ? C8 [0.41] C8 **
Pb7_8 B8 [AC138147] B8 (0.68) B8 (0.50) B8 [0.44] B8 *****
Ps_X1-9 18-4 - C9 (0.79) C9 (0.45) C9 [0.46] C9 ***
Ps_X2-9 76-9 - D9 (0.62) D9 (0.64) (C9/D9) D9 ***
Ps_X3-9 31A-1 - B9 (0.72) B9 (0.54) ??? B9 **
Ps_X4-9 76-7 A9 [Chiu:04] ? A9 (0.29) (A9) A9 *****
Ps_X1-10 133-1-11 A10 [Chiu:04] A10 (0.49) A10 (0.38) A10 [0.52] A10 *****
Ps_X3-10 128-2-15 - D10 (0.37) D10 (0.19) D10 [0.44] D10 ***
Ps_X4-10 61l-1-4 B10 [AC138147] ? ? (A10/B10) B10 *****
Ps_X5-10 75-15 - C10 (0.28) C10 (0.24) C10 [0.42] C10 ***
Ps_X1-11 18-2 A11 [Chiu:04] A11 (0.54) A11 (0.44) A11 [0.48] A11 *****
Ps_X2-11 36-15 - D11 (0.21) ? (D11) D11 **
Ps_2-12 D12 [AC138742] -- -- -- -- D12 *****
Ps_X2-13 128-1-22 - C13 (0.34) C13 (0.27) C13 [0.55] C13 ***
Ps_X4-13 36-3 - D13 (0.33) D13 (0.16) (A13/D13) D13 **
PsA13_c A13 [Chiu:04] ? A13 (0.12) A13 [0.50] A13 *****
Pb7_13 B13 [AC138147] B13 (0.68) B13 (0.46) B13 [0.50] B13 *****

Note that the phylogenetic trees and/or quartett mapping correctly identify the paralog group of 13 (or 14 if one counts HoxA9) of the 16 homeobox fragments for which extended genomic DNA is known. In the remaining cases (HoxD2 and HoxB10) there is no strong alternative signal. For HoxD2 there is no sequence for comparison, for HoxB10 only a single sequence from the zebrafish can be used for comparison, so that we cannot expect a correct assignment from either tree reconstructions or quartett mapping.