Genome-Enabled Insights into Legume Biology - University of ...
Genome-Enabled Insights into Legume Biology - University of ...
Genome-Enabled Insights into Legume Biology - University of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
accessions. This had been suggested in previous<br />
genetic experiments that found biased segregation<br />
ratios involving crosses with A17 (43), but<br />
the sequencing project was able to pinpoint two<br />
breakpoints on chromosomes 4 and 8 to regions<br />
roughly the size <strong>of</strong> BAC clones.<br />
The Lj genome was published in 2008 (79)<br />
and was actually the first legume genome to<br />
appear, though it is still the most incomplete.<br />
As in Mt, the strategy was to focus on gene-rich<br />
portions <strong>of</strong> the genome through the sequencing<br />
<strong>of</strong> large insert clones (in this case, so-called<br />
transformation-competent artificial chromosomes).<br />
The published Lj genome sequence is<br />
315 Mb in length, corresponding to 67% <strong>of</strong><br />
the Lj genome (472 Mb), but only 130 Mb is<br />
high quality and anchored to chromosomes. A<br />
more recent version <strong>of</strong> the Lj genome sequence<br />
is now available through the Web site <strong>of</strong><br />
the lead sequencing group in Kazuza, Japan<br />
(ftp://ftp.kazusa.or.jp/pub/lotus/lotus_r2.5/<br />
pseudomolecule), and it provides a much<br />
more robust platform for Lj genomics. This<br />
updated version (Lj 2.5) contains anchored<br />
pseudomolecules 268 Mb in length throughout<br />
the euchromatic portion <strong>of</strong> Lj plus 33 Mb <strong>of</strong><br />
sequence as yet unanchored.<br />
What Can We Learn from Sequenced<br />
<strong>Legume</strong> <strong>Genome</strong>s?<br />
What have we learned about legume genomes<br />
from this first generation <strong>of</strong> sequencing<br />
projects? In the broadest sense, sequenced<br />
legume genomes look very much like those<br />
<strong>of</strong> other dicots, though comparisons with<br />
Arabidopsis can be complicated by its unusually<br />
small genome size and complex duplication<br />
history (3). A closer look at the Gm genome<br />
finds that ∼57% <strong>of</strong> the overall sequence<br />
is found in repeat-rich, low-recombination<br />
heterochromatin, while most genes (78%) are<br />
found in euchromatic chromosome arms (81).<br />
Of course, this also implies that substantial<br />
numbers <strong>of</strong> Gm genes (22%) lie within the<br />
pericentromeric heterochromatin, a somewhat<br />
surprising and potentially important result. As<br />
expected, crossovers are pr<strong>of</strong>oundly reduced<br />
near centromeres, with the ratio <strong>of</strong> genetic<br />
to physical distance dropping by 27-fold<br />
between the euchromatic and pericentromeric<br />
portions <strong>of</strong> the genome. <strong>Genome</strong> organization<br />
in Mt seems largely comparable, though the<br />
evidence for this is based on a combination <strong>of</strong><br />
the BAC-based euchromatin sequence, FISH<br />
microscopy, and optical mapping (100). Notably,<br />
the estimated proportion <strong>of</strong> the genome<br />
located in pericentromeres is much lower in<br />
Mt compared with Gm (∼22% versus ∼57%),<br />
something that presumably plays a role in the<br />
difference in genome size. In both Gm and<br />
Mt, gene density is generally high throughout<br />
euchromatic arms, with only limited indications<br />
<strong>of</strong> a gene density gradient rising from<br />
centromere to telomere. In Mt, for example,<br />
the gene density is estimated at 16.9 per 100 kb<br />
(1 gene every 5.9 kb) throughout the euchromatin,<br />
with the average gene being 2,211 bp in<br />
length and containing four introns. By way <strong>of</strong><br />
comparison, Mt values are similar to those in<br />
Arabidopsis (2,174 bp) and Oryza (3,403 bp).<br />
Altogether, the Gm genome is reported to<br />
have 46,430 “high-confidence” protein-coding<br />
loci, which represents a culled set <strong>of</strong> gene models<br />
from an original set that included ∼20,000<br />
predicted with lower confidence (81). In Mt,<br />
a total <strong>of</strong> 62,152 genes were annotated, a value<br />
that drops to 47,845 when retaining only those<br />
genes with experimental or database support.<br />
The similarity in gene counts between the two<br />
systems is surprising and significant, because<br />
the lineage leading to present-day soybean is<br />
known to have undergone a whole-genome<br />
duplication (WGD) at 13 Mya or later, a<br />
duplication that is absent in the Mt lineage<br />
(there is much more about this important<br />
evolutionary event below). Thus, one might<br />
have expected higher gene numbers in Gm<br />
compared with Mt. TheGm genome is also<br />
reported to have 313,125 retrotransposons and<br />
294,937 DNA transposons (spanning 403 Mb<br />
and 157 Mb, respectively), whereas the Mt<br />
genome has 253,048 retrotransposons and<br />
34,529 DNA transposons (spanning 88 Mb<br />
286 Young·Bharti