Genome-Enabled Insights into Legume Biology - University of ...

More documents

Recommendations

Info

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org by University of Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only. accessions. This had been suggested in previous genetic experiments that found biased segregation ratios involving crosses with A17 (43), but the sequencing project was able to pinpoint two breakpoints on chromosomes 4 and 8 to regions roughly the size of BAC clones. The Lj genome was published in 2008 (79) and was actually the first legume genome to appear, though it is still the most incomplete. As in Mt, the strategy was to focus on gene-rich portions of the genome through the sequencing of large insert clones (in this case, so-called transformation-competent artificial chromosomes). The published Lj genome sequence is 315 Mb in length, corresponding to 67% of the Lj genome (472 Mb), but only 130 Mb is high quality and anchored to chromosomes. A more recent version of the Lj genome sequence is now available through the Web site of the lead sequencing group in Kazuza, Japan (ftp://ftp.kazusa.or.jp/pub/lotus/lotus_r2.5/ pseudomolecule), and it provides a much more robust platform for Lj genomics. This updated version (Lj 2.5) contains anchored pseudomolecules 268 Mb in length throughout the euchromatic portion of Lj plus 33 Mb of sequence as yet unanchored. What Can We Learn from Sequenced Legume Genomes? What have we learned about legume genomes from this first generation of sequencing projects? In the broadest sense, sequenced legume genomes look very much like those of other dicots, though comparisons with Arabidopsis can be complicated by its unusually small genome size and complex duplication history (3). A closer look at the Gm genome finds that ∼57% of the overall sequence is found in repeat-rich, low-recombination heterochromatin, while most genes (78%) are found in euchromatic chromosome arms (81). Of course, this also implies that substantial numbers of Gm genes (22%) lie within the pericentromeric heterochromatin, a somewhat surprising and potentially important result. As expected, crossovers are profoundly reduced near centromeres, with the ratio of genetic to physical distance dropping by 27-fold between the euchromatic and pericentromeric portions of the genome. Genome organization in Mt seems largely comparable, though the evidence for this is based on a combination of the BAC-based euchromatin sequence, FISH microscopy, and optical mapping (100). Notably, the estimated proportion of the genome located in pericentromeres is much lower in Mt compared with Gm (∼22% versus ∼57%), something that presumably plays a role in the difference in genome size. In both Gm and Mt, gene density is generally high throughout euchromatic arms, with only limited indications of a gene density gradient rising from centromere to telomere. In Mt, for example, the gene density is estimated at 16.9 per 100 kb (1 gene every 5.9 kb) throughout the euchromatin, with the average gene being 2,211 bp in length and containing four introns. By way of comparison, Mt values are similar to those in Arabidopsis (2,174 bp) and Oryza (3,403 bp). Altogether, the Gm genome is reported to have 46,430 “high-confidence” protein-coding loci, which represents a culled set of gene models from an original set that included ∼20,000 predicted with lower confidence (81). In Mt, a total of 62,152 genes were annotated, a value that drops to 47,845 when retaining only those genes with experimental or database support. The similarity in gene counts between the two systems is surprising and significant, because the lineage leading to present-day soybean is known to have undergone a whole-genome duplication (WGD) at 13 Mya or later, a duplication that is absent in the Mt lineage (there is much more about this important evolutionary event below). Thus, one might have expected higher gene numbers in Gm compared with Mt. TheGm genome is also reported to have 313,125 retrotransposons and 294,937 DNA transposons (spanning 403 Mb and 157 Mb, respectively), whereas the Mt genome has 253,048 retrotransposons and 34,529 DNA transposons (spanning 88 Mb 286 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org by University of Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only. and 9.4 Mb, respectively). The lower numbers in Mt presumably reflect the lower amount of pericentromeric sequencing (also supported by the twofold difference in genome size), but may also indicate real genomic differences between the two species. Detailed examination of the genome sequences also provides insights into interesting or unusual gene families. The Gm genome is reported to have 283 legume-specific gene families (81), an estimate that increases to 670 with the analysis of the more recent Mt genome sequence (100). Both Gm and Mt contain higher numbers of nucleotide-binding-site leucine-rich repeats (NBS-LRRs, also called NB-ARCs—i.e., nucleotide-binding adaptors shared by APAF-1, R proteins, and CED-4) containing disease-resistance genes than other plant genomes sequenced to date. In Mt, for example, there are 764 NBS-LRR-related genes, with at least 550 expressed based on RNA-Seq (100). Outside of legumes, O. sativa is reported to have the largest number so far (519) (98). More than 90% of Mt NBS-LRRs reside in clusters that contain on average 7.4 members, including two megaclusters—one on Mt06 with 30 NBS-LRRs and another on Mt03 with 21. However, the conclusion that NBS-LRRs are overrepresented in legumes (or indeed in any plant family) needs to be tempered by the recent observation that there is considerable variation in NBS-LRR number between different accessions within a single species, including Gm (102). Legumes have higher numbers and increased complexity in other gene families: lipoxygenases (83), LysM receptor kinases (103), and flavonoid biosynthetic enzymes, such as chalcone synthase (100). It may be important that LysM receptors and flavonoids are both known to play important roles in nodulation. Finally, all three sequenced legumes contain unusually high numbers of F-box domain genes compared with other plant species, with Mt possessing three times the number of F-box domain genes compared with either Gm or Lj (100). The Mt genome is also notable for the presence of a large and novel gene family, the nodule-related cysteine-rich peptides (NCRs), which are members of the larger group of defensin-like sequences (DEFLs) (31). Notably, this group of genes has been observed only in members of the so-called inverted repeat-lacking clade (IRLC) [97a; http://tolweb.org/IRLC_(Inverted_Repeatlacking_clade)] of legumes, a subgroup of cool-season legumes that includes genera such as Pisum, Vicia, and Trifolium. The IRLC represents a clade of legumes known to have lost one copy of the 25-kb inverted repeat in its plastid genome—hence its name. Genome analysis demonstrates that the gene family is entirely missing from the sequences of Gm and Lj. DEFLs are known to act as antimicrobials in plants (27), although recently, Mt NCRs were also found to play a role in signaling terminal differentiation in rhizobial bacteria during nodulation (92). Notably, Mt and related genera develop an indeterminate nodule quite different than the one observed in Gm, Lj, or other papilionoids (89). Altogether, there are 593 NCRs in Mt along with 778 genes within the larger DEFL gene family. Like NBS-LRRs, NCRs are tightly clustered within the Mt genome, with 74% found in tandem clusters. Given their absence from the Gm and Lj genome sequences, NCRs must have expanded relatively rapidly within the IRLC clade. If so, some mechanism of propagation, such as ectopic movement followed by tandem duplication, may have led to their expansion. Sequencing in Nonreference Legumes Beyond the sequencing of reference species, genome-scale analysis is rapidly moving into less characterized legume species. Indeed, a draft genome sequence of pigeon pea (Cajanus cajan) has recently been published, including scaffolds representing 73% of the pigeon pea genome (94). All this is possible owing to the recent development of next-generation sequencing technologies, where billions of base pairs (Gb) can be sequenced at very high efficiency (57). In chickpea (C. arietinum), both www.annualreviews.org • Genome-Enabled Insights into Legume Biology 287
Page 1 and 2: Annu. Rev. Plant Biol. 2012.63:283-
Page 3: Annu. Rev. Plant Biol. 2012.63:283-
Page 9 and 10: *Lj03N *Lj03S Lj04N *Lj04S Annu. Re
Page 15 and 16: Mt/Gm split ~54 Mya Gm WGD ~13 Mya
Page 25: Annu. Rev. Plant Biol. 2012.63:283-

Genome-Enabled Insights into Legume Biology - University of ...

Create successful ePaper yourself

Delete template?

Save as template?