Genome-Enabled Insights into Legume Biology - University of ...
Genome-Enabled Insights into Legume Biology - University of ...
Genome-Enabled Insights into Legume Biology - University of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Annu. Rev. Plant Biol. 2012. 63:283–305<br />
First published online as a Review in Advance on<br />
January 30, 2012<br />
The Annual Review <strong>of</strong> Plant <strong>Biology</strong> is online at<br />
plant.annualreviews.org<br />
This article’s doi:<br />
10.1146/annurev-arplant-042110-103754<br />
Copyright c○ 2012 by Annual Reviews.<br />
All rights reserved<br />
1543-5008/12/0602-0283$20.00<br />
∗ Corresponding author.<br />
<strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong><br />
<strong>into</strong> <strong>Legume</strong> <strong>Biology</strong><br />
Nevin D. Young 1,∗ and Arvind K. Bharti 2<br />
1 Department <strong>of</strong> Plant Pathology and Department <strong>of</strong> Plant <strong>Biology</strong>, <strong>University</strong> <strong>of</strong><br />
Minnesota, St. Paul, Minnesota 55108; email: neviny@umn.edu<br />
2 National Center for <strong>Genome</strong> Resources, Santa Fe, New Mexico 87505;<br />
email: akb@ncgr.org<br />
Keywords<br />
comparative genomics, genome duplication, microsynteny,<br />
nodulation, symbiosis<br />
Abstract<br />
<strong>Legume</strong>s are the third-largest family <strong>of</strong> angiosperms, the secondmost-important<br />
crop family, and a key source <strong>of</strong> biological nitrogen in<br />
agriculture. Recently, the genome sequences <strong>of</strong> Glycine max (soybean),<br />
Medicago truncatula, andLotus japonicus were substantially completed.<br />
Comparisons among legume genomes reveal a key role for duplication,<br />
especially a whole-genome duplication event approximately 58 Mya<br />
that is shared by most agriculturally important legumes. A second<br />
and more recent genome duplication occurred only in the lineage<br />
leading to soybean. Outcomes <strong>of</strong> genome duplication, including gene<br />
fractionation and sub- and ne<strong>of</strong>unctionalization, have played key roles<br />
in shaping legume genomes and in the evolution <strong>of</strong> legume-specific<br />
traits. Analysis <strong>of</strong> legume genome sequences also enables the discovery<br />
<strong>of</strong> legume-specific gene families and provides a framework<br />
for genome-wide association mapping that will target phenotypes <strong>of</strong><br />
special importance in legumes. Translating genomic resources from<br />
sequenced species to less studied but still important “orphan” legumes<br />
will enhance prospects for world food production.<br />
283
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Contents<br />
INTRODUCTION.................. 284<br />
SEQUENCING LEGUME<br />
GENOMES....................... 284<br />
Reference <strong>Legume</strong> <strong>Genome</strong>s . . . . . . . 284<br />
What Can We Learn from<br />
Sequenced <strong>Legume</strong> <strong>Genome</strong>s? . . 286<br />
Sequencing in Nonreference<br />
<strong>Legume</strong>s....................... 287<br />
From <strong>Genome</strong> Sequencing<br />
toResequencing................ 288<br />
COMPARATIVE GENOMICS AND<br />
THE SEARCH FOR THE<br />
PRIMORDIAL LEGUME<br />
GENOME........................ 289<br />
Strategies for Comparative<br />
Genomic Analysis . . . . . . . . . . . . . . . 289<br />
Comparing <strong>Legume</strong> <strong>Genome</strong>s . . . . . . 290<br />
Envisioning the Ancestral<br />
<strong>Legume</strong><strong>Genome</strong>............... 294<br />
GENOME DUPLICATIONS<br />
IN LEGUME BIOLOGY . . . . . . . . . 294<br />
Whole-<strong>Genome</strong> Duplication Events<br />
in the History <strong>of</strong> <strong>Legume</strong>s. . . . . . . 294<br />
THE AFTERMATH OF GENOME<br />
DUPLICATION AND ITS<br />
IMPACT ON LEGUME<br />
BIOLOGY........................ 296<br />
The Fates <strong>of</strong> Duplicated Genes . . . . . 296<br />
Impacts <strong>of</strong> <strong>Genome</strong> Duplication<br />
on <strong>Legume</strong> <strong>Biology</strong> . . . . . . . . . . . . . 297<br />
<strong>Genome</strong> Duplication and the<br />
Evolution <strong>of</strong> Nodulation . . . . . . . . 298<br />
PERSPECTIVES ON LEGUME<br />
GENOMICS...................... 299<br />
INTRODUCTION<br />
<strong>Legume</strong>s (Fabaceae or Leguminosae) are the<br />
third-largest family <strong>of</strong> flowering plants and<br />
the second-most-important plant family in<br />
agriculture. They are especially interesting because<br />
most have the capacity to fix atmospheric<br />
nitrogen through mutualistic interactions with<br />
rhizobial soil bacteria, a trait that is both<br />
ecologically and agriculturally important (32).<br />
Indeed, without the nitrogen fixed each year<br />
by legumes, humans would need to consume<br />
288 billion kg <strong>of</strong> additional fuel in the Haber-<br />
Bosch process to generate anhydrous ammonia<br />
for agriculture (47). Given their importance to<br />
people, legumes are now the target <strong>of</strong> extensive<br />
sequence-based genomics research, which is<br />
revolutionizing our understanding <strong>of</strong> legume<br />
evolution and its connection to biologically<br />
important traits. Of particular significance are<br />
the recently completed and annotated genomes<br />
<strong>of</strong> three legume species—Glycine max (soybean)<br />
(Gm) (81), Medicago truncatula (Mt) (100), and<br />
Lotus japonicus (Lj) (79). This review focuses on<br />
genomics research carried out in legume biology,<br />
emphasizing comparisons among legume<br />
genomes and the critical role <strong>of</strong> genome duplication<br />
and its aftermath in shaping present-day<br />
legume genomes and traits.<br />
With the recent publication <strong>of</strong> three legume<br />
genome sequences—and, very recently, a<br />
fourth (76)—and the rapid development <strong>of</strong> genomics<br />
tools for multiple legume species, there<br />
are already several excellent scientific reviews<br />
available to researchers. These reviews have<br />
emphasized the structural analyses <strong>of</strong> legume<br />
genomes (13, 78), translational opportunities<br />
provided by reference genome sequences (101),<br />
and the prospects for extending genome sequence<br />
data to less studied “orphan” legume<br />
species (13, 95). Therefore, we endeavor here<br />
to complement and expand the scope <strong>of</strong> these<br />
existing reviews with our focus on genome evolution<br />
and genome duplication, and on their<br />
impact on legume biology.<br />
SEQUENCING LEGUME<br />
GENOMES<br />
Reference <strong>Legume</strong> <strong>Genome</strong>s<br />
The genome sequences <strong>of</strong> Gm, Mt, and Lj<br />
form the foundation for much <strong>of</strong> our current<br />
understanding about legume genomics. All<br />
three species are members <strong>of</strong> Papilionoideae,<br />
a subfamily that diverged from the two<br />
other legume subfamilies (Mimosoideae and<br />
284 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Caesalpinoideae) approximately 60 Mya (52).<br />
Most cultivated legumes are found within<br />
two sister clades <strong>of</strong> the papilionoids: the<br />
millettoid/phaseoloid clade [warm-season<br />
legumes, including Gm, pigeon pea (Cajanus<br />
cajan), common bean (Phaseolus vulgaris),<br />
mung bean (Vigna radiata), and cowpea (Vigna<br />
unguiculata)] and the temperate galegoid<br />
clade [cool-season legumes, including Mt, Lj,<br />
and species such as alfalfa (Medicago sativa),<br />
chickpea (Cicer arietinum), clover (Trifolium<br />
sp.), lentil (Lens sp.), and garden pea (Pisum<br />
sativum)]. Papilionoideae also includes two<br />
minor clades: the genistoid (lupin, Lupinus sp.)<br />
and the dalbergioid (peanut, Arachis hypogaea).<br />
Because all these species are reasonably close<br />
phylogenetically, insights from the Gm, Mt,<br />
and Lj genomes should be highly relevant<br />
when transferred among cultivated legume<br />
crops. However, the current emphasis on<br />
papilionoids also means that many interesting<br />
legume species—especially mimosoids (Mimosa,<br />
Acacia, Prosopis, and Chamaecrista, for<br />
example) and caesalpinioids (Caesalpinia, Senna,<br />
and tamarind, for example)—are quite distant<br />
evolutionarily from the nexus <strong>of</strong> genomics research.<br />
Researchers have noted this previously<br />
and highlighted the importance <strong>of</strong> developing<br />
genomics resources in additional nodes<br />
throughout the legume evolutionary tree (87).<br />
The Gm genome sequence, published<br />
in 2010, is currently the most thoroughly<br />
characterized legume genome (81). More than<br />
950 million base pairs (Mb) <strong>of</strong> the overall<br />
1,115-Mb genome were completed through<br />
the use <strong>of</strong> 8x Sanger whole-genome shotgun<br />
(WGS) sequencing. Many <strong>of</strong> the resulting<br />
pseudomolecules extend all the way from centromeres<br />
(as indicated by scaffolds extending<br />
<strong>into</strong> centromeric repeats) out to telomeres<br />
(with scaffolds extending <strong>into</strong> telomeric repeats).<br />
The Gm sequence is also impressive for<br />
the very large size <strong>of</strong> the resulting sequence<br />
scaffolds. These are the physically defined<br />
assemblies <strong>of</strong> sequence contigs that are built<br />
<strong>into</strong> Gm’s 20 chromosome pseudomolecules.<br />
In Gm assembly Glyma 1.0, the so-called L50<br />
(a common metric to describe scaffold size<br />
that is calculated by summing the lengths <strong>of</strong><br />
all scaffolds from longest to shortest and then<br />
finding the scaffold size where you reach 50%<br />
<strong>of</strong> the overall length) is 47.8 Mb. By comparison,<br />
nearly all other published WGS plant<br />
genome sequences have notably shorter L50s<br />
[with the notable exception <strong>of</strong> Sorghum bicolor<br />
(70), another very high-quality assembly].<br />
It is especially noteworthy that nearly all <strong>of</strong><br />
the published Gm sequence (98%) could be<br />
anchored to specific chromosomal positions.<br />
The Mt genome was sequenced by a combination<br />
<strong>of</strong> Sanger-based bacterial artificial chromosome<br />
(BAC) clones (with genomic inserts<br />
approximately 80–120 kb in length) and ∼40x<br />
Illumina WGS (100). In this case, the sequencing<br />
effort was focused on euchromatic arms<br />
outside centromeric regions through the use<br />
<strong>of</strong> fluorescence in situ hybridization (FISH)<br />
(49) and optical mapping (104) to define physical<br />
location. Altogether, 367 Mb <strong>of</strong> the approximately<br />
470-Mb Mt genome (http://data.<br />
kew.org/cvalues) is included in the published<br />
assembly. Because <strong>of</strong> the emphasis on<br />
BAC-based sequencing, the quality in BACsequenced<br />
regions is quite high, although scaffolds<br />
tend to be relatively short (overall L50<br />
<strong>of</strong> 1.27 Mb) and only the BAC-based portion<br />
<strong>of</strong> the sequence (245 Mb, or 67%) could be<br />
anchored to specific chromosomal locations.<br />
Another 17 Mb <strong>of</strong> BAC-based sequence could<br />
not be anchored. The remaining portion <strong>of</strong><br />
the Mt sequence consists <strong>of</strong> Illumina WGS<br />
(104 Mb), with the Illumina contigs being quite<br />
short (L50 <strong>of</strong> 2.4 kb, largest 31 kb) and primarily<br />
useful as a way to recover missing portions <strong>of</strong><br />
the genome for gene discovery. Still, Mt chromosome<br />
5 is noteworthy in being a nearly intact<br />
BAC-based pseudomolecule that is complete on<br />
either side <strong>of</strong> the centromere. Throughout the<br />
entire pseudomolecule <strong>of</strong> Mt chromosome 5,<br />
there are just four sequence gaps, which is comparable<br />
in quality to the Arabidopsis thaliana (3)<br />
or Oryza sativa (40) genomes. One surprising<br />
result <strong>of</strong> the Mt sequencing project was the discovery<br />
<strong>of</strong> a large chromosomal translocation in<br />
the accession used as a template for sequencing<br />
( Jemalong-A17) compared with other Mt<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 285
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
accessions. This had been suggested in previous<br />
genetic experiments that found biased segregation<br />
ratios involving crosses with A17 (43), but<br />
the sequencing project was able to pinpoint two<br />
breakpoints on chromosomes 4 and 8 to regions<br />
roughly the size <strong>of</strong> BAC clones.<br />
The Lj genome was published in 2008 (79)<br />
and was actually the first legume genome to<br />
appear, though it is still the most incomplete.<br />
As in Mt, the strategy was to focus on gene-rich<br />
portions <strong>of</strong> the genome through the sequencing<br />
<strong>of</strong> large insert clones (in this case, so-called<br />
transformation-competent artificial chromosomes).<br />
The published Lj genome sequence is<br />
315 Mb in length, corresponding to 67% <strong>of</strong><br />
the Lj genome (472 Mb), but only 130 Mb is<br />
high quality and anchored to chromosomes. A<br />
more recent version <strong>of</strong> the Lj genome sequence<br />
is now available through the Web site <strong>of</strong><br />
the lead sequencing group in Kazuza, Japan<br />
(ftp://ftp.kazusa.or.jp/pub/lotus/lotus_r2.5/<br />
pseudomolecule), and it provides a much<br />
more robust platform for Lj genomics. This<br />
updated version (Lj 2.5) contains anchored<br />
pseudomolecules 268 Mb in length throughout<br />
the euchromatic portion <strong>of</strong> Lj plus 33 Mb <strong>of</strong><br />
sequence as yet unanchored.<br />
What Can We Learn from Sequenced<br />
<strong>Legume</strong> <strong>Genome</strong>s?<br />
What have we learned about legume genomes<br />
from this first generation <strong>of</strong> sequencing<br />
projects? In the broadest sense, sequenced<br />
legume genomes look very much like those<br />
<strong>of</strong> other dicots, though comparisons with<br />
Arabidopsis can be complicated by its unusually<br />
small genome size and complex duplication<br />
history (3). A closer look at the Gm genome<br />
finds that ∼57% <strong>of</strong> the overall sequence<br />
is found in repeat-rich, low-recombination<br />
heterochromatin, while most genes (78%) are<br />
found in euchromatic chromosome arms (81).<br />
Of course, this also implies that substantial<br />
numbers <strong>of</strong> Gm genes (22%) lie within the<br />
pericentromeric heterochromatin, a somewhat<br />
surprising and potentially important result. As<br />
expected, crossovers are pr<strong>of</strong>oundly reduced<br />
near centromeres, with the ratio <strong>of</strong> genetic<br />
to physical distance dropping by 27-fold<br />
between the euchromatic and pericentromeric<br />
portions <strong>of</strong> the genome. <strong>Genome</strong> organization<br />
in Mt seems largely comparable, though the<br />
evidence for this is based on a combination <strong>of</strong><br />
the BAC-based euchromatin sequence, FISH<br />
microscopy, and optical mapping (100). Notably,<br />
the estimated proportion <strong>of</strong> the genome<br />
located in pericentromeres is much lower in<br />
Mt compared with Gm (∼22% versus ∼57%),<br />
something that presumably plays a role in the<br />
difference in genome size. In both Gm and<br />
Mt, gene density is generally high throughout<br />
euchromatic arms, with only limited indications<br />
<strong>of</strong> a gene density gradient rising from<br />
centromere to telomere. In Mt, for example,<br />
the gene density is estimated at 16.9 per 100 kb<br />
(1 gene every 5.9 kb) throughout the euchromatin,<br />
with the average gene being 2,211 bp in<br />
length and containing four introns. By way <strong>of</strong><br />
comparison, Mt values are similar to those in<br />
Arabidopsis (2,174 bp) and Oryza (3,403 bp).<br />
Altogether, the Gm genome is reported to<br />
have 46,430 “high-confidence” protein-coding<br />
loci, which represents a culled set <strong>of</strong> gene models<br />
from an original set that included ∼20,000<br />
predicted with lower confidence (81). In Mt,<br />
a total <strong>of</strong> 62,152 genes were annotated, a value<br />
that drops to 47,845 when retaining only those<br />
genes with experimental or database support.<br />
The similarity in gene counts between the two<br />
systems is surprising and significant, because<br />
the lineage leading to present-day soybean is<br />
known to have undergone a whole-genome<br />
duplication (WGD) at 13 Mya or later, a<br />
duplication that is absent in the Mt lineage<br />
(there is much more about this important<br />
evolutionary event below). Thus, one might<br />
have expected higher gene numbers in Gm<br />
compared with Mt. TheGm genome is also<br />
reported to have 313,125 retrotransposons and<br />
294,937 DNA transposons (spanning 403 Mb<br />
and 157 Mb, respectively), whereas the Mt<br />
genome has 253,048 retrotransposons and<br />
34,529 DNA transposons (spanning 88 Mb<br />
286 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
and 9.4 Mb, respectively). The lower numbers<br />
in Mt presumably reflect the lower amount <strong>of</strong><br />
pericentromeric sequencing (also supported<br />
by the tw<strong>of</strong>old difference in genome size), but<br />
may also indicate real genomic differences<br />
between the two species.<br />
Detailed examination <strong>of</strong> the genome sequences<br />
also provides insights <strong>into</strong> interesting<br />
or unusual gene families. The Gm genome<br />
is reported to have 283 legume-specific gene<br />
families (81), an estimate that increases to<br />
670 with the analysis <strong>of</strong> the more recent Mt<br />
genome sequence (100). Both Gm and Mt contain<br />
higher numbers <strong>of</strong> nucleotide-binding-site<br />
leucine-rich repeats (NBS-LRRs, also called<br />
NB-ARCs—i.e., nucleotide-binding adaptors<br />
shared by APAF-1, R proteins, and CED-4)<br />
containing disease-resistance genes than other<br />
plant genomes sequenced to date. In Mt, for example,<br />
there are 764 NBS-LRR-related genes,<br />
with at least 550 expressed based on RNA-Seq<br />
(100). Outside <strong>of</strong> legumes, O. sativa is reported<br />
to have the largest number so far (519) (98).<br />
More than 90% <strong>of</strong> Mt NBS-LRRs reside in<br />
clusters that contain on average 7.4 members,<br />
including two megaclusters—one on Mt06 with<br />
30 NBS-LRRs and another on Mt03 with 21.<br />
However, the conclusion that NBS-LRRs are<br />
overrepresented in legumes (or indeed in any<br />
plant family) needs to be tempered by the<br />
recent observation that there is considerable<br />
variation in NBS-LRR number between different<br />
accessions within a single species, including<br />
Gm (102). <strong>Legume</strong>s have higher numbers<br />
and increased complexity in other gene families:<br />
lipoxygenases (83), LysM receptor kinases<br />
(103), and flavonoid biosynthetic enzymes, such<br />
as chalcone synthase (100). It may be important<br />
that LysM receptors and flavonoids are both<br />
known to play important roles in nodulation.<br />
Finally, all three sequenced legumes contain<br />
unusually high numbers <strong>of</strong> F-box domain genes<br />
compared with other plant species, with Mt possessing<br />
three times the number <strong>of</strong> F-box domain<br />
genes compared with either Gm or Lj (100).<br />
The Mt genome is also notable for the<br />
presence <strong>of</strong> a large and novel gene family,<br />
the nodule-related cysteine-rich peptides<br />
(NCRs), which are members <strong>of</strong> the larger<br />
group <strong>of</strong> defensin-like sequences (DEFLs)<br />
(31). Notably, this group <strong>of</strong> genes has been<br />
observed only in members <strong>of</strong> the so-called<br />
inverted repeat-lacking clade (IRLC) [97a;<br />
http://tolweb.org/IRLC_(Inverted_Repeatlacking_clade)]<br />
<strong>of</strong> legumes, a subgroup <strong>of</strong><br />
cool-season legumes that includes genera such<br />
as Pisum, Vicia, and Trifolium. The IRLC<br />
represents a clade <strong>of</strong> legumes known to have<br />
lost one copy <strong>of</strong> the 25-kb inverted repeat in<br />
its plastid genome—hence its name. <strong>Genome</strong><br />
analysis demonstrates that the gene family is<br />
entirely missing from the sequences <strong>of</strong> Gm and<br />
Lj. DEFLs are known to act as antimicrobials in<br />
plants (27), although recently, Mt NCRs were<br />
also found to play a role in signaling terminal<br />
differentiation in rhizobial bacteria during<br />
nodulation (92). Notably, Mt and related<br />
genera develop an indeterminate nodule quite<br />
different than the one observed in Gm, Lj,<br />
or other papilionoids (89). Altogether, there<br />
are 593 NCRs in Mt along with 778 genes<br />
within the larger DEFL gene family. Like<br />
NBS-LRRs, NCRs are tightly clustered within<br />
the Mt genome, with 74% found in tandem<br />
clusters. Given their absence from the Gm<br />
and Lj genome sequences, NCRs must have<br />
expanded relatively rapidly within the IRLC<br />
clade. If so, some mechanism <strong>of</strong> propagation,<br />
such as ectopic movement followed by tandem<br />
duplication, may have led to their expansion.<br />
Sequencing in Nonreference <strong>Legume</strong>s<br />
Beyond the sequencing <strong>of</strong> reference species,<br />
genome-scale analysis is rapidly moving <strong>into</strong><br />
less characterized legume species. Indeed,<br />
a draft genome sequence <strong>of</strong> pigeon pea<br />
(Cajanus cajan) has recently been published, including<br />
scaffolds representing 73% <strong>of</strong> the pigeon<br />
pea genome (94). All this is possible owing<br />
to the recent development <strong>of</strong> next-generation<br />
sequencing technologies, where billions <strong>of</strong> base<br />
pairs (Gb) can be sequenced at very high efficiency<br />
(57). In chickpea (C. arietinum), both<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 287
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Hiremath et al. (34) and Garg et al. (28) have<br />
used next-generation sequence technology to<br />
rapidly sequence the chickpea transcriptome.<br />
In the process, they developed an inventory for<br />
most chickpea expressed sequences, assigned<br />
predicted functions based on homology and<br />
gene ontology analysis, and aligned the assembled<br />
sequences to the Mt genome sequence.<br />
Next-generation sequencing in chickpea also<br />
led to the development <strong>of</strong> hundreds <strong>of</strong> different<br />
single-nucleotide polymorphism (SNP)<br />
and conserved genetic marker sequences useful<br />
in mapping. Córdoba et al. (18) have taken<br />
a very different approach to expanding the set<br />
<strong>of</strong> genomic tools in common bean (P. vulgaris).<br />
Analysis <strong>of</strong> nearly 90,000 BAC clones enabled<br />
the discovery <strong>of</strong> >600 simple sequence repeat<br />
markers. Mapping these repeats provided<br />
a basis for integrating the physical and genetic<br />
maps <strong>of</strong> Phaseolus. Many <strong>of</strong> the next-generation<br />
transcriptome assemblies and related data for<br />
orphan legume species are being collected<br />
and made available through the U.S. Department<br />
<strong>of</strong> Agriculture–supported <strong>Legume</strong> Information<br />
System (http://www.comparativelegumes.org)<br />
on its “Species” page.<br />
Inevitably, extending the power <strong>of</strong> wholegenome<br />
sequencing to nonreference legumes<br />
will require the creation <strong>of</strong> true whole-genome<br />
sequences for those species. This may soon be<br />
realistic given the ongoing increase in short<br />
read throughput coupled with decline in costs.<br />
However, de novo assembly <strong>of</strong> next-generation<br />
sequence data at the whole-genome scale remains<br />
challenging (2, 38). Nevertheless, there<br />
is intense work in this area aimed toward optimum<br />
contig assembly and improved scaffolding<br />
options (8, 29, 59, 86). Moreover,<br />
high-throughput physical mapping by wholegenome<br />
pr<strong>of</strong>iling (93) together with the launch<br />
<strong>of</strong> third-generation sequencing technologies<br />
such as those <strong>of</strong> Pacific Biosciences (PacBio)<br />
(63) will further enhance superscaffolding <strong>of</strong><br />
genome assemblies <strong>into</strong> large pseudomolecules.<br />
Despite relatively high error rates, PacBio<br />
“strobed” multiple reads extending over long<br />
physical distances have great potential to contribute<br />
toward this goal.<br />
From <strong>Genome</strong> Sequencing<br />
to Resequencing<br />
Sequencing in legumes has not been limited to<br />
the development <strong>of</strong> reference genomes: Nextgeneration<br />
sequencing technologies also enable<br />
the resequencing <strong>of</strong> plant genomes (50, 77, 82),<br />
and resequencing opens the door to genomewide<br />
association studies. Here, statistical associations<br />
between sequence variation and naturally<br />
occurring phenotypic variation—detected<br />
at very high density through the process <strong>of</strong><br />
resequencing—enable the discovery and localization<br />
<strong>of</strong> potential causative loci (4). But to<br />
make genome-wide association studies practical,<br />
insights <strong>into</strong> the architecture <strong>of</strong> sequence<br />
variation, haplotype size, population structure,<br />
and linkage disequilibrium (LD) are critically<br />
important (64). These subject areas, therefore,<br />
have been explored extensively in sequenced<br />
legume genomes (11, 50), just as in other wellcharacterized<br />
plant genomes (4, 36, 90).<br />
One example has been deep next-generation<br />
sequencing <strong>of</strong> the wild ancestor <strong>of</strong> cultivated<br />
soybean, Glycine soja, followed by comparison<br />
with the published Gm reference (45). Here,<br />
researchers generated more than 48 Gb <strong>of</strong><br />
G. soja sequence, aligned it to the published Gm<br />
reference, and obtained more than 97% coverage.<br />
In the process, they discovered 2.5 Mb in<br />
total SNP variation between the genomes and<br />
found that 35.6% <strong>of</strong> all high-confidence genes<br />
contained at least one SNP. Additionally, they<br />
observed 406 kb <strong>of</strong> small insertions or deletions,<br />
32.4 Mb <strong>of</strong> unaligned and presumably<br />
deleted sequence from G. soja, and 8.3 Mb <strong>of</strong><br />
novel, G. soja–specific sequence compared with<br />
Gm. Altogether, then, Gm and G. soja differ by<br />
0.31%, a value less than among Arabidopsis accessions<br />
(69) or between O. sativa ssp. indica<br />
and O. sativa ssp. japonica (40). Analysis <strong>of</strong> synonymous<br />
(K s ) values involving 6,780 genes also<br />
indicates that Gm and G. soja diverged approximately<br />
267,000 years ago, long before the<br />
domestication <strong>of</strong> soybean by humans.<br />
Focusing on genome variation within cultivated<br />
soybean itself, Lam et al. (50) utilized<br />
Illumina sequencing technology to survey 14<br />
288 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
cultivated and 17 wild Glycine accessions. Here,<br />
researchers obtained ∼5x coverage <strong>of</strong> the Gm<br />
genome for each <strong>of</strong> the 31 accessions. Not surprisingly,<br />
the wild accessions had much higher<br />
levels <strong>of</strong> genetic diversity (approximately 56%<br />
higher) and smaller LD blocks (approximately<br />
twice the frequency <strong>of</strong> LD blocks less than<br />
20 kb) compared with cultivated accessions.<br />
Indeed, they found that LD decays quite slowly<br />
in cultivated soybeans, with some LD blocks<br />
extending more than 1 Mb. Such results are expected<br />
during the domestication process, which<br />
presumably resulted in one or more genetic bottlenecks,<br />
lower diversity among cultivars, and<br />
large LD blocks. Separately, a scan for genome<br />
regions with high levels <strong>of</strong> differentiation between<br />
wild and cultivated soybeans and/or very<br />
low sequence diversity uncovered candidate regions<br />
associated with domestication. Two such<br />
regions <strong>of</strong> special interest were discovered, including<br />
a 200-kb region on Gm chromosome<br />
10 that overlaps known quantitative trait loci<br />
(QTLs) for harvest index, yield, and vitamin E<br />
content (37, 53). Analytical strategies like this<br />
involving a search for potential sites <strong>of</strong> selection<br />
are some <strong>of</strong> the most promising outcomes<br />
<strong>of</strong> genome resequencing research.<br />
The Mt genome has also been the target<br />
<strong>of</strong> genome resequencing (11). Twenty-six<br />
Mt accessions were sequenced to nearly 30x<br />
coverage, discovering more than 3 million<br />
total SNPs at a genome-wide density <strong>of</strong> 0.004–<br />
0.006 (i.e., 4–6 sequence variants every 1 kb),<br />
significantly higher than in wild and cultivated<br />
soybeans (50) or in Arabidopsis (17). LD decays<br />
quickly in Mt, reaching half its initial value<br />
within 3–4 kb, quite similar to that <strong>of</strong> Arabidopsis<br />
(46). Two gene families, the NBS-LRRs<br />
and NCRs, were found to harbor significantly<br />
higher levels <strong>of</strong> sequence diversity, especially in<br />
nonsynonymous sites. NBS-LRRs are known<br />
from other studies to be highly diverse (17), but<br />
it is intriguing to find that NCRs are also highly<br />
diverse given their recently discovered role in<br />
rhizobium signaling (92). Finally, resequence<br />
data in Mt revealed four genome regions<br />
as potential sites for selection, this time by<br />
searching for contiguous windows <strong>of</strong> very low<br />
sequence diversity. Three <strong>of</strong> these regions were<br />
located at telomeric ends <strong>of</strong> chromosomes,<br />
though the significance <strong>of</strong> this is unknown, and<br />
only a few examples <strong>of</strong> genes with suggestive<br />
functions (an isolated NBS-LRR, ENOD92)<br />
were found within candidate regions.<br />
Population genomic analysis can also reveal<br />
candidate regions associated with local<br />
adaptation, as demonstrated by Friesen et al.<br />
(25). In this case, 12 inbreds derived from four<br />
wild Tunisian populations <strong>of</strong> Mt were analyzed<br />
using Affymetrix GeneChip technology.<br />
Here, sequence variation is revealed by analysis<br />
<strong>of</strong> single-feature polymorphisms (SFPs), which<br />
are hundreds <strong>of</strong> thousands <strong>of</strong> probes located<br />
throughout the genome and all interrogated simultaneously<br />
by hybridization. The underlying<br />
logic <strong>of</strong> the study was to search for SFPs among<br />
inbreds and then to target loci that assorted by<br />
population. Altogether, 7% <strong>of</strong> all Affymetrix<br />
features segregated among inbreds, but only<br />
3% differentiated populations. By design, these<br />
Mt populations could be split <strong>into</strong> two groups<br />
according to their original habitats: two populations<br />
from saline environments versus two from<br />
nonsaline environments. A total <strong>of</strong> 18 genome<br />
regions defined by 52 probes showed consistent<br />
differences between the two habitats, results<br />
that could be validated by assaying a subset<br />
<strong>of</strong> the SFPs on a larger set <strong>of</strong> individuals in<br />
contrasting populations.<br />
COMPARATIVE GENOMICS AND<br />
THE SEARCH FOR THE<br />
PRIMORDIAL LEGUME GENOME<br />
Strategies for Comparative<br />
Genomic Analysis<br />
It has long been known that species in the<br />
same taxonomic family share extensive tracts <strong>of</strong><br />
homologous genes, <strong>of</strong>ten in the same or similar<br />
gene order (1, 20, 26). This is commonly called<br />
synteny, though colinearity is probably a better<br />
term whenever gene order is maintained.<br />
<strong>Legume</strong>s are no exception, with a growing<br />
number <strong>of</strong> studies demonstrating genome-scale<br />
synteny, especially among papilionoids (5, 10,<br />
16). Synteny is discovered either through the<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 289
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
genetic mapping <strong>of</strong> sequence-based markers<br />
segregating in multiple related species or<br />
by large-scale similarity searches between<br />
sequenced genomes. A hybrid approach, where<br />
sequenced genetic markers in one species are<br />
compared with a sequenced genome, is useful<br />
in translating insights from a reference to a less<br />
well-characterized species (5, 61, 65). Comparative<br />
genomics makes it possible to infer the<br />
structural changes that have led to present-day<br />
species while also enabling the reconstruction<br />
<strong>of</strong> primordial genome structure—the architecture<br />
<strong>of</strong> ancestral chromosomes and the<br />
underlying repertoire <strong>of</strong> genes (72). From a<br />
practical point <strong>of</strong> view, comparative genomics<br />
expands the range <strong>of</strong> genomics tools available<br />
for positional gene cloning (99) and discovery<br />
<strong>of</strong> new genetic markers (19, 33, 35), especially<br />
in species with few genomic tools. <strong>Legume</strong>s<br />
fit this description nicely, with dozens <strong>of</strong><br />
agriculturally important but less well-studied<br />
crop species. This list includes valuable food<br />
crops like garden pea (P. sativum), chickpea<br />
(C. arietinum), alfalfa (M. sativa), common bean<br />
(P. vulgaris), and cowpea (V. unguiculata), all<br />
well-positioned phylogenetically with respect<br />
to the sequenced genomes <strong>of</strong> Gm, Mt, andLj<br />
(95).<br />
Visualization is key to successful comparative<br />
genomics, and there are various methods<br />
to visualize genome comparisons. One popular<br />
technique involves Circos diagrams (48),<br />
where chromosomes are placed end to end<br />
along the outside <strong>of</strong> a circle, and then colored<br />
arcs connecting homologous segments are<br />
joined within the circle (for notable legume<br />
examples, see References 50 and 100). An<br />
especially attractive feature <strong>of</strong> Circos diagrams<br />
is their ability to visualize multiple genomes<br />
while also illustrating synteny at reasonably<br />
high resolution. Alternatively, synteny can be<br />
visualized through the use <strong>of</strong> dot-plot diagrams<br />
(Figures 1 and 2). Here, one genome (or<br />
genome segment) is laid along the horizontal<br />
axis and a second genome (or segment) is laid<br />
along the vertical axis. A mark is then made at<br />
intersections where the two genomes display<br />
sequence similarity above some cut<strong>of</strong>f value.<br />
This results in significant stretches <strong>of</strong> synteny<br />
appearing as diagonal lines, with cases <strong>of</strong> parallel<br />
diagonal lines spanning the same portion<br />
<strong>of</strong> a genome indicating a potential duplication<br />
event. Of course, the dot-plot method can<br />
easily be applied to genetic marker comparisons<br />
and does not require sequenced genomes<br />
(42). Notably, both visualization methods can<br />
be used to compare a genome with itself in<br />
a search for within-species synteny, thereby<br />
investigating duplication events and helping to<br />
reveal the genomic history <strong>of</strong> a given species.<br />
Comparing <strong>Legume</strong> <strong>Genome</strong>s<br />
Although comparative genomics is most powerful<br />
when comparing sequenced genomes,<br />
there are only a few such legume genome<br />
sequences available today. Consequently, most<br />
legume comparative genomics studies to date<br />
have involved comparisons based on genetic<br />
markers. This raises the question, how are<br />
large numbers <strong>of</strong> shared and polymorphic<br />
markers discovered for multiple species? One<br />
successful strategy has been to design exonic<br />
polymerase chain reaction (PCR) primers<br />
that amplify across (shared) introns using<br />
available genomic sequence data as the basis<br />
for primer design. The idea here is that exonic<br />
sequences tend to be highly conserved, whereas<br />
intronic sequences tend to be variable, thereby<br />
providing both the conservation needed across<br />
species for successful PCR as well as the polymorphism<br />
needed for segregation mapping.<br />
As an example, Choi et al. (16) developed<br />
hundreds <strong>of</strong> potential cross-species legume<br />
markers based on Mt and Arabidopsis sequence<br />
data and demonstrated extensive synteny across<br />
papilionoids through detailed analysis <strong>of</strong> ∼50<br />
such markers. These markers demonstrated<br />
conservation that stretched from millettoids<br />
[Gm, mungbean(V. radiata)] all the way to<br />
galegoids [Mt, garden pea (P. sativum), alfalfa<br />
(M. sativa)]. In the process, they established<br />
the first integrated view <strong>of</strong> legume synteny in<br />
the form <strong>of</strong> a concentric graphic view (60) and<br />
illustrated the overall topology <strong>of</strong> pan-legume<br />
synteny. More recently, Hougaard et al. (35)<br />
290 Young·Bharti
*Lj03N<br />
*Lj03S<br />
Lj04N<br />
*Lj04S<br />
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Lotus japonicus genome<br />
Figure 1<br />
*Lj02N<br />
*Lj06S<br />
Lj05S<br />
Lj05N<br />
Lj02S<br />
Lj01S<br />
Lj01N<br />
*Lj06N<br />
3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N<br />
Medicago truncatula genome<br />
Whole-genome dot-plot <strong>of</strong> two cool-season legume species, Medicago truncatula (Mt) ssp. truncatula<br />
Jemalong-A17 (horizontal axis) andLotus japonicus (Lj) cultivar Miyakojima MG-20 (vertical axis). An asterisk<br />
next to a chromosome number indicates reverse complement. The numbers/letters on the axes represent the<br />
chromosome number and north/south arms, respectively; these have been rearranged so that synteny blocks<br />
line up along the center diagonal, which makes the comparison easier to visualize. Many synteny blocks are<br />
nearly the lengths <strong>of</strong> whole chromosome arms (red circle), whereas others are disrupted by rearrangements<br />
( green circle). Secondary synteny blocks outside the main diagonal (orange circles) represent the wholegenome<br />
duplication in Papilionoideae ∼58 Mya. Two notable genome regions where synteny is totally<br />
lacking between the two species (Mt06N with Lj06S and Mt03N/Mt04N with Lj03N) are circled in purple.<br />
took the intron-spanning approach a step further<br />
by showing that 50% <strong>of</strong> intron-spanning<br />
markers designed from Arabidopsis work successfully<br />
in both common bean (P. vulgaris)<br />
and peanut (A. hypogaea). This is significant<br />
because peanut, although still a papilionoid,<br />
is in the dalbergioid clade, which is phylogenetically<br />
separate from the more frequently<br />
characterized millettoid and galegoid clades.<br />
Another strategy for comparative genomics<br />
begins with the mining <strong>of</strong> existing<br />
expressed sequence tag (EST) databases to<br />
search for SNPs or other types <strong>of</strong> mappable<br />
polymorphisms. Once positioned genetically,<br />
the underlying ESTs can be compared with<br />
one <strong>of</strong> the sequenced legume genomes as a<br />
basis for discovering shared synteny. Bertioli<br />
et al. (5) adopted this approach and extended<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 291
Gm12S<br />
*Gm15S<br />
Gm20N<br />
Gm05N<br />
Gm12N<br />
Gm06S<br />
Gm07S<br />
Gm17N<br />
*Gm08N<br />
*Gm16S<br />
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Glycine max genome<br />
*Gm05S<br />
*Gm13N<br />
*Gm07N<br />
*Gm19N<br />
Gm09N<br />
*Gm13S<br />
Gm15N<br />
*Gm20S<br />
Gm10S<br />
*Gm10N<br />
*Gm14N<br />
Gm14S<br />
*Gm17S<br />
Gm02S<br />
*Gm11N<br />
*Gm02N<br />
Gm01S<br />
*Gm01N<br />
Gm03S<br />
*Gm09S<br />
*Gm03N<br />
Gm16N<br />
Gm19S<br />
Gm18S<br />
Gm06N<br />
Gm11S<br />
Gm08S<br />
Gm04N<br />
*Gm04S<br />
*Gm18N<br />
3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N<br />
Medicago truncatula genome<br />
292 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
the earlier work <strong>of</strong> Hougaard et al. (35) by<br />
focusing on ∼126 cross-species ESTs mapped<br />
in Arachis and compared with available Mt and<br />
Lj sequences. They found that most synteny<br />
blocks align to a single region in either genome,<br />
an important observation because it implies<br />
that a previously predicted papilionoid WGD<br />
event (see below) predated the divergence <strong>of</strong><br />
Arachis from galegoids and phaseoloids, and<br />
so occurred very early in the evolution <strong>of</strong> the<br />
subfamily. In Muchero et al. (61), more than<br />
10,000 SNPs were discovered within available<br />
EST databases <strong>of</strong> cowpea (V. unguiculata) and<br />
then used in map construction leading to 928<br />
positioned cowpea loci through the use <strong>of</strong> a<br />
medium-throughput Illumina GoldenGate<br />
assay system. Comparison with Gm revealed<br />
85% macrosynteny, while macrosynteny with<br />
the more distantly related Mt was still high<br />
at 82%. In a similar study, McClean et al.<br />
(56) examined >300 gene-based Phaseolus loci<br />
coming from EST and BAC-end sequence data<br />
and compared them with the Gm reference<br />
genome sequence, discovering 55 synteny<br />
blocks on 35 <strong>of</strong> Gm’s 40 chromosome arms.<br />
Syntenic blocks averaged 32 centimorgans<br />
in length in Phaseolus, a genetic distance that<br />
corresponded to an average physical distance <strong>of</strong><br />
4.9 Mb in Gm. Using this set <strong>of</strong> synteny blocks<br />
as reference points, they could tentatively position<br />
another 15,000 Phaseolus gene sequences<br />
solely based on the Gm genome sequence.<br />
Side-by-side comparison <strong>of</strong> sequenced<br />
genomes is the most powerful way to learn<br />
about genome histories. In such comparisons<br />
it becomes possible to estimate the fraction <strong>of</strong><br />
shared genes, the size distribution <strong>of</strong> synteny<br />
blocks, or differences between genomes in gene<br />
density or organization. Going a step further,<br />
one can examine genome rearrangements at the<br />
macroscale, whether they are shared or lineage<br />
specific, or drill down to the base-pair level to<br />
dissect the fine structure <strong>of</strong> conserved colinear<br />
genes. Ultimately, as more sequenced species<br />
are added to the analysis, we begin to see the actual<br />
step-by-step changes that distinguish one<br />
genome from another.<br />
One <strong>of</strong> the first sequence-based comparisons<br />
in legumes was between Mt and Gm. Focusing<br />
on a genome region surrounding a nematoderesistance<br />
gene in Gm (rhg1) on chromosome<br />
Gm18, Mudge et al. (62) found that 75% <strong>of</strong><br />
genes were colinear between Mt and Gm in a<br />
region spanning ∼150 genes, including a remarkable<br />
stretch where 33 <strong>of</strong> 35 genes (94%)<br />
were conserved and colinear, a phenomenon<br />
they termed hypersynteny. Cannon et al. (15)<br />
later carried out a genome-scale sequence comparison<br />
based on the partially completed Mt and<br />
Lj genomes available at the time. In the case <strong>of</strong><br />
one large synteny block between Mt05N and<br />
Lj02S, they found that 58 <strong>of</strong> 94 genes (62%) existed<br />
as colinear orthologous pairs between the<br />
syntenic segments. Indeed, synteny between Mt<br />
and Lj was found to extend nearly genomewide,<br />
despite a time span <strong>of</strong> 40–50 Mya since<br />
speciation.<br />
Figure 1 shows an updated dot-plot<br />
comparison <strong>of</strong> Mt and Lj based on versions<br />
<strong>of</strong> the genomes available in mid-2011<br />
(Reference 100 and ftp://ftp.kazusa.or.<br />
jp/pub/lotus/lotus_r2.5/pseudomolecule).<br />
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−<br />
Figure 2<br />
Whole-genome dot-plot <strong>of</strong> the cool-season legume Medicago truncatula (Mt) ssp. truncatula Jemalong-A17<br />
(horizontal axis) and the warm-season legume Glycine max (Gm) var. Williams 82 (vertical axis). The<br />
pericentromeric regions <strong>of</strong> Gm chromosomes have been removed for this analysis. An asterisk next to a<br />
chromosome number indicates reverse complement. Chromosome arms have been rearranged so that the<br />
synteny blocks line up along the center diagonal, which makes the comparison easier to visualize. The<br />
presence <strong>of</strong> two synteny diagonals for almost every Mt region indicates an additional recent whole-genome<br />
duplication (
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Here, chromosome arms for both Mt and<br />
Lj (based on the estimated positions <strong>of</strong> centromeric<br />
regions) have been reordered and<br />
in some cases flipped (noted by an asterisk)<br />
to align synteny blocks <strong>into</strong> a single coherent<br />
line. The result highlights the genome-scale<br />
synteny observed between the two species.<br />
If perfect synteny existed between Mt and<br />
Lj, the roughly 45 ◦ dot-plot line would be<br />
straight and continuous, and would reach<br />
all the way from one end to the other. The<br />
fact that the actual result produces a line that<br />
approaches this ideal is overwhelming evidence<br />
for genome-scale synteny between the two<br />
species. Synteny blocks are nearly the lengths<br />
<strong>of</strong> whole chromosome arms, and overall they<br />
span more than 75% <strong>of</strong> both species. One<br />
striking example between Mt05N and Lj02S<br />
is circled in red. Still, there are also breaks in<br />
synteny—for example, Mt07S and its synteny<br />
with Lj01S (circled in green). Here, rather<br />
than a contiguous diagonal line, one sees a<br />
cloud <strong>of</strong> shorter synteny blocks, broken <strong>into</strong><br />
six pieces with two <strong>of</strong> them flipped around.<br />
Apparently, one or both syntenic chromosomes<br />
experienced major reorganization events since<br />
the separation <strong>of</strong> Mt and Lj.Therearealsonotable<br />
genome regions where synteny is totally<br />
lacking between the two species. Mt06N with<br />
Lj06S and Mt03N/Mt04N with Lj03N (circled<br />
in purple) are striking examples. Significantly,<br />
these genome regions coincide with higher<br />
densities <strong>of</strong> NBS-LRRs and retrotransposons<br />
compared with the remainder <strong>of</strong> the genome, a<br />
relationship that may be biologically significant<br />
(5) and similar in terms <strong>of</strong> degraded synteny to<br />
observations made in A. hypogaea (76).<br />
Envisioning the Ancestral<br />
<strong>Legume</strong> <strong>Genome</strong><br />
Inevitably, as more legumes are sequenced it<br />
will become possible to reconstruct the ancestral<br />
legume genome, or at least the ancestral<br />
papilionoid genome. Such an effort is underway<br />
by integrating the sequenced legume genomes<br />
with comparably high-density marker/map data<br />
from species such as chickpea (C. arietinum)<br />
and pigeon pea (C. cajan) (D. Cook, personal<br />
communication). Comparisons <strong>of</strong> the Gm, Mt,<br />
and Lj genomes already provide a glimpse <strong>into</strong><br />
the large-scale architecture <strong>of</strong> the ancestral<br />
legume genome. Despite the complexities resulting<br />
from the 13-Mya Glycine WGD event<br />
(discussed in further detail below), comparisons<br />
among Gm, Mt,andLj (Figures 1 and 2)<br />
suggest a limited number <strong>of</strong> ancestral synteny<br />
blocks that have been rearranged to generate<br />
present-day papilionoid genomes. In both comparisons,<br />
a conservative examination reveals just<br />
14 largely coherent blocks that span the majority<br />
<strong>of</strong> all three genomes. Notably, this estimate<br />
agrees nicely with the apparent basal chromosome<br />
number <strong>of</strong> seven for papilionoids (74).<br />
GENOME DUPLICATIONS<br />
IN LEGUME BIOLOGY<br />
Whole-<strong>Genome</strong> Duplication Events<br />
in the History <strong>of</strong> <strong>Legume</strong>s<br />
One <strong>of</strong> the most striking lessons coming out<br />
<strong>of</strong> plant comparative genomics has been the<br />
critical role <strong>of</strong> genome duplication in the evolutionary<br />
history <strong>of</strong> many, if not most, plant<br />
species (21). This is especially true in the case<br />
<strong>of</strong> legumes. Gm provided an early hint <strong>into</strong> the<br />
importance <strong>of</strong> WGD in genome restructuring<br />
in a study showing that restriction fragment<br />
length polymorphisms were duplicated on average<br />
2.55 times and localized to a homoeologous<br />
segment (paralogous sequences resulting<br />
from WGD) nearly as long as whole chromosomes<br />
(84). Later, as large amounts <strong>of</strong> genome<br />
sequence data became available, it became clear<br />
that most present-day plant genomes are the<br />
products <strong>of</strong> ancient genome-scale duplication<br />
events (examples include 3, 40, 41, 91). Subsequent<br />
studies have gone on to reveal the wide<br />
range <strong>of</strong> plant families that have experienced<br />
genome duplications and the architecture <strong>of</strong> retained<br />
duplication blocks, and have established<br />
reasonably precise estimates for the timing <strong>of</strong><br />
key duplication events (7, 73, 85). We know,<br />
for example, that many dicots share an ancient<br />
(130–140 Mya) triploidization event based on<br />
294 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
synteny analysis <strong>of</strong> Vitus vinifera and the fact<br />
that each Vitus region typically shows synteny to<br />
three corresponding regions in other sequenced<br />
dicots (41). We also know that a surprisingly<br />
large number <strong>of</strong> plant WGD events followed<br />
closely after the Cretaceous-Tertiary boundary<br />
event ∼65 Mya. This led Fawcett et al. (22)<br />
to suggest that polyploids might have higher<br />
adaptability and greater tolerance to extreme<br />
conditions, something that would have come in<br />
quite handy during a time <strong>of</strong> widespread species<br />
extinction. Finally, we are beginning to discover<br />
the details about the aftermath <strong>of</strong> WGD events<br />
(summarized in 24)—and it is this final point,<br />
the consequence <strong>of</strong> genome duplication, that<br />
is especially relevant to our consideration <strong>of</strong><br />
legume genome biology.<br />
<strong>Genome</strong> duplication is easy to see when<br />
looking at a dot-plot comparison. A closer look<br />
at Figure 1 reveals numerous secondary synteny<br />
blocks lying to one side or the other <strong>of</strong> the<br />
main diagonal. One notable example is where<br />
the primary synteny block involving Mt01N<br />
and Lj05N is paralleled by another synteny<br />
block lower down, between Mt01N and Lj01N<br />
(orange circles connected by an orange line).<br />
There are dozens <strong>of</strong> such duplicated synteny<br />
blocks in the comparison between these two<br />
species, and the simplest interpretation is an ancient<br />
WGD preceding the speciation between<br />
Mt and Lj. In a comparison like this, synteny<br />
blocks lying along the main diagonal represent<br />
the speciation event, whereas the <strong>of</strong>f-center<br />
diagonals show regions <strong>of</strong> synteny resulting<br />
from one or more shared WGD events. Apparently,<br />
a WGD event that took place in the<br />
ancestor <strong>of</strong> Mt and Lj was followed quickly by<br />
a period <strong>of</strong> significant genome rearrangement<br />
and gene loss before speciation, rapidly degrading<br />
the quality <strong>of</strong> duplicate synteny blocks<br />
observed. (Loss <strong>of</strong> synteny in duplicate blocks<br />
is important in understanding the impact <strong>of</strong><br />
duplication on legume biology and is discussed<br />
in more detail below.) The existence <strong>of</strong> such a<br />
WGD in the legume family has been indicated<br />
through multiple sources <strong>of</strong> evidence, especially<br />
K s (synonymous substitution) estimates<br />
between paralogs (6, 73, 80) and topology <strong>of</strong><br />
phylogenetic tree analysis (12, 15). Integrating<br />
all these different sources <strong>of</strong> data leads to a<br />
best estimate for the timing <strong>of</strong> this WGD <strong>of</strong><br />
58 Mya. This date would have preceded the<br />
Mt/Lj split (approximately 50 Mya) as well as<br />
the split with Gm (54 Mya) (52). Indeed, peanut<br />
(A. hypogaea), an earlier diverging papilionoid,<br />
also shares this WGD event (5). By contrast,<br />
a recent study in Chamaecrista indicates that<br />
this species (and presumably the Mimosoideae<br />
and Caesalpinioideae subfamilies) do not share<br />
the 58-Mya WGD event (12). In other words,<br />
we know with remarkable precision both the<br />
timing and evolutionary window for this pivotal<br />
WGD event in the history <strong>of</strong> legumes. Given<br />
the range <strong>of</strong> species that share this duplication,<br />
we will refer to it as the papilionoid WGD.<br />
But the papilionoid WGD is not the only<br />
one to play an important role in legume evolution.<br />
Figure 2 displays a comparison <strong>of</strong> the<br />
Mt and Gm genome sequences (based on their<br />
recently published sequences). This comparison<br />
illustrates important similarities but also<br />
striking differences with the Mt/Lj dot-plot in<br />
Figure 1. Gm and Mt clearly display extensive<br />
synteny, with many long, coherent synteny<br />
blocks. A quick count reveals as many as 30<br />
large-scale synteny blocks running the length<br />
<strong>of</strong> chromosome arms or nearly so. However,<br />
there is not a single 45 ◦ diagonal stretching<br />
across the genomes; instead, there are pairs <strong>of</strong><br />
diagonals in Gm corresponding to individual<br />
chromosome arms <strong>of</strong> Mt. One example (circled<br />
in red) highlights synteny between Mt05S and<br />
two different Gm chromosomes/arms, Gm02S<br />
and Gm14N/Gm14S. A WGD is again the<br />
explanation, but this time, one that occurred<br />
more recently (estimated at 13 Mya) and only<br />
in the lineage leading to Gm (84). This duplication<br />
event explains the observation that there<br />
are two Gm blocks for each Mt genome region.<br />
Comparable levels <strong>of</strong> contiguity observed in<br />
each pair <strong>of</strong> synteny blocks are explained by the<br />
fact that both trace back to a single WGD event,<br />
and so the evolutionary distance between Mt<br />
and both <strong>of</strong> the Gm syntenic segments must be<br />
identical. This Glycine-specific WGD had been<br />
predicted previously (84), but the publication<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 295
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
<strong>of</strong> the Gm genome revealed just how pervasive<br />
and fundamental it is in understanding the<br />
architecture <strong>of</strong> the present-day Gm genome.<br />
Figure 2 also illustrates exceptions to this<br />
pattern, demonstrating two important points.<br />
First, synteny blocks like the ones circled in<br />
orange show more ancient synteny blocks that<br />
trace back to the papilionoid WGD discussed<br />
above. Clearly, Mt and Gm (as well as Lj—and,<br />
indeed, all papilionoids) are expected to share<br />
the 58-Mya WGD event. Second, there are<br />
frequent cases <strong>of</strong> rearrangements—some that<br />
are simple, like the one involving Mt01S and<br />
Gm10N/Gm10S and Gm20S (circled in green),<br />
but others that are quite complex (one example<br />
circled in purple). These rearrangements are<br />
best explained by significant levels <strong>of</strong> reshuffling<br />
among the duplicated Glycine genome<br />
segments after the 13-Mya WGD event.<br />
THE AFTERMATH OF GENOME<br />
DUPLICATION AND ITS IMPACT<br />
ON LEGUME BIOLOGY<br />
The Fates <strong>of</strong> Duplicated Genes<br />
WGDs obviously have a pr<strong>of</strong>ound impact on<br />
genome architecture. However, genome duplications<br />
play an equally important role in the<br />
evolution <strong>of</strong> individual genes and gene families.<br />
Other types <strong>of</strong> gene duplication exist—<br />
tandem gene duplication, segmental duplication,<br />
transposition—and they are certainly important<br />
in genomic and biological evolution<br />
(24). However, WGD events are worthy <strong>of</strong> special<br />
consideration because when they occur, every<br />
gene in the genome is suddenly present in<br />
two copies. In effect, the entire evolutionary trajectory<br />
<strong>of</strong> a lineage becomes primed to move in<br />
a novel direction. In the case <strong>of</strong> legumes, there<br />
is growing evidence that WGD events had an<br />
especially significant impact on nodulation and<br />
symbiosis with rhizobial bacteria (100). After<br />
duplications, there are only a small number <strong>of</strong><br />
potential fates for duplicated gene pairs (24):<br />
Both paralogs are maintained and they share<br />
the function <strong>of</strong> their progenitor; both paralogs<br />
are maintained and one takes on an entirely new<br />
function; or one <strong>of</strong> the two progeny genes is lost<br />
and only a single copy is maintained. The first<br />
outcome (both genes maintained with shared<br />
function) is <strong>of</strong>ten called subfunctionalization,<br />
as the two paralogs have split up the function<br />
<strong>of</strong> their ancestor (23). The second (both<br />
maintained, one taking on a new function) is<br />
called ne<strong>of</strong>unctionalization, for obvious reasons<br />
(55). The other possibility (only one gene retained,<br />
the other deleted) is fractionation (51)<br />
or, equivalently, diploidization. Still other outcomes<br />
are possible, such as pseudogenization<br />
without loss <strong>of</strong> one <strong>of</strong> the duplicates, but are<br />
not considered in detail here. Ultimately, biological<br />
function is expected to play a critical<br />
role in the fate <strong>of</strong> duplicated genes, with some<br />
functional classes (those most interconnected)<br />
retained more frequently than others (proteins<br />
that generally act solo) (24). Understanding<br />
gene fate following WGDs sheds light on important<br />
biological phenomena in legumes, including<br />
properties such as the generation <strong>of</strong><br />
novel disease-resistance specificities and the appearance<br />
<strong>of</strong> novel developmental functions.<br />
To illustrate the fates <strong>of</strong> duplicated genes in<br />
legumes, Figure 3 displays a pair <strong>of</strong> duplicated<br />
segments in Mt roughly 150 kb in size each (located<br />
on Mt01 and Mt07) and shown alongside<br />
the four corresponding syntenic regions <strong>of</strong><br />
Gm. This figure was created using the PLAZA<br />
genome analysis suite (75) and is based on the<br />
published sequences <strong>of</strong> Mt and Gm. The results<br />
are striking. Each Mt segment exhibits remarkable<br />
conservation with the pair <strong>of</strong> most closely<br />
related Gm segments, but far less conservation<br />
with its duplicate Mt pair. In this example, just<br />
7 <strong>of</strong> 19 genes (37%) in the duplicated blocks<br />
<strong>of</strong> Mt are maintained. These are homoeologs<br />
(WGD-derived paralogs) that trace back to the<br />
papilionoid WGD at 58 Mya. By contrast, the<br />
Mt07 segment shares 13 <strong>of</strong> 16 genes (81%) with<br />
either Gm03 or Gm19, whereas the Mt01 segment<br />
shares 11 <strong>of</strong> 13 (85%) with either Gm02<br />
or Gm10. These are orthologous relationships<br />
that derive from the millettoid/galegoid speciation<br />
event separating Mt and Gm at ∼55 Mya<br />
(52). It is noteworthy that the time span between<br />
the papilionoid WGD and the Mt/Gm<br />
296 Young·Bharti
Mt/Gm split<br />
~54 Mya<br />
Gm WGD<br />
~13 Mya<br />
WGD<br />
~58 Mya<br />
Ancestral<br />
legume<br />
Gm03<br />
Gm19<br />
Mt07<br />
Mt01<br />
Gm02<br />
Gm10<br />
81%<br />
37%<br />
85%<br />
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Figure 3<br />
A 150-kb region on the Glycine max (Gm) andMedicago truncatula (Mt) genomes illustrating the differential gene loss between the<br />
duplicated regions, which took place after the split between warm- and cool-season legumes ∼54 Mya. In this example, only 37% <strong>of</strong> the<br />
genes are retained in both duplicated blocks <strong>of</strong> Mt, while the Mt duplicates retain 81%–85% with their Gm counterparts. By contrast,<br />
the number <strong>of</strong> retained gene pairs among Gm03/Gm19 (69%) and Gm02/Gm10 (100%) duplicates is much higher, at least in part due<br />
to the fact that the whole-genome duplication (WGD) in Gm is fairly recent (
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
translocated <strong>into</strong> the pericentromeric region <strong>of</strong><br />
the chromosome. Between the two Gm genome<br />
regions, 77% <strong>of</strong> gene duplicates were retained.<br />
However, this high level <strong>of</strong> retention did not<br />
extend to NBS-LRRs, which existed as clusters<br />
in both genome regions, but with significant<br />
homoeolog-specific duplications and losses.<br />
The pericentromeric region was especially<br />
reduced in surviving NBS-LRRs. Clearly,<br />
NBS-LRR genes are subject to much higher<br />
levels <strong>of</strong> fractionation than other gene classes.<br />
Local duplications, deletions, and recombination<br />
are apparently acting preferentially on<br />
WGD-derived NBS-LRR clusters, with the<br />
pericentromeric NBS-LRR cluster experiencing<br />
much higher levels <strong>of</strong> fractionation. This<br />
pattern has been noted in other plant species,<br />
with NBS-LRRs frequently underrepresented<br />
in duplicated genome regions (14, 64), potentially<br />
reflecting a fitness cost associated with<br />
excess NBS-LRRs (58).<br />
In a similar study by Kim et al. (44), a different<br />
pair <strong>of</strong> homoeologous genome regions<br />
(1.96–4.60 Mb) on Gm05 and Gm17 and centeredaroundtheRxp<br />
bacterial leaf pustule–<br />
resistance gene were examined and compared<br />
with the homologous Mt genome regions. In<br />
this case, fractionation in Mt was observed to<br />
extend to the level <strong>of</strong> gene blocks (in which<br />
multiple linked genes were retained in one duplicate)<br />
but lost from the other (contrasting<br />
with the apparent gene-by-gene fractionation<br />
illustrated in Figure 3). In the case <strong>of</strong> Gm<br />
and the more recent 13-Mya WGD, duplicates<br />
were also retained as blocks rather than individual<br />
genes, though some <strong>of</strong> the gene blocks<br />
were not lost, but were instead translocated to<br />
a different location in the Gm genome. Notably,<br />
the locations <strong>of</strong> homoeologs coincided<br />
with known QTLs for leaf pustule resistance,<br />
leading the authors to suggest that duplicated<br />
resistance genes may have retrained their ancestral<br />
function and then diverged in a pathogen<br />
strain–specific manner.<br />
Finally, Lin et al. (54) examined two<br />
∼1-Mb homoeologous regions containing<br />
NBS-LRR clusters in Gm (on Gm08 and Gm15)<br />
as well as the orthologous region <strong>of</strong> common<br />
bean (P. vulgaris). The level <strong>of</strong> gene retention<br />
varied from 81% to 91% among the Gm segments,<br />
values somewhat higher than observed<br />
by others (39, 44; Figure 3). As in Innes et al.<br />
(39), this analysis uncovered significant differences<br />
in retrotransposon density between the<br />
two regions, differences that were correlated<br />
with differing levels <strong>of</strong> structural variation. Going<br />
beyond structural analysis, the study examined<br />
gene expression levels along the two Gm<br />
segments and found 38% higher transcriptional<br />
activity on Gm08 compared with Gm15 based<br />
on a metric that integrated expression among<br />
seven different tissues. This difference in expression<br />
activity is significant because expression<br />
variation between retained gene pairs is an<br />
expectation <strong>of</strong> sub- and ne<strong>of</strong>unctionalization.<br />
<strong>Genome</strong> Duplication and the<br />
Evolution <strong>of</strong> Nodulation<br />
The property most striking about legumes is<br />
their capacity to form symbiotic nitrogen-fixing<br />
nodules in association with rhizobial bacteria.<br />
Not surprisingly, detailed analysis <strong>of</strong> legume<br />
genomes can provide valuable insights <strong>into</strong><br />
symbiosis, nodulation, and nitrogen fixation.<br />
At the simplest level, genome sequence data<br />
make it possible to generate a global inventory<br />
<strong>of</strong> nodulation-related genes. This was an<br />
important contribution <strong>of</strong> the recent Gm sequence<br />
(91). Here, genes <strong>of</strong> interest were identified<br />
by searching for Gm genes orthologous to<br />
known nodulation-related genes in any legume<br />
species. As a result, 34 Gm nodulins (noduleupregulated<br />
proteins) were discovered along<br />
with 23 nodulation-related regulatory genes<br />
within the Gm genome. This kind <strong>of</strong> gene<br />
inventory makes it possible to explore local<br />
nodulation-related gene clusters, putative homoeologs,<br />
and membership in related gene<br />
families. This inventory should be especially<br />
valuable in dissecting the global regulatory machinery<br />
controlling plant-rhizobium communication<br />
and nodule development.<br />
Analysis <strong>of</strong> the Mt genome sequence<br />
focused on the relationship between genome<br />
duplication and the evolution <strong>of</strong> nodulation.<br />
298 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Previous studies had established that legumes<br />
belong to a clade <strong>of</strong> rosids, Fabidae, that all<br />
share a predisposition to nodulate, presumably<br />
derived from their common ancestor (88). In<br />
analyzing the Mt genome, the question was<br />
whether the 58-Mya WGD contributed in any<br />
way to the elaboration <strong>of</strong> rhizobial nodulation.<br />
The answer appears to be a qualified yes.<br />
Multiple lines <strong>of</strong> evidence indicate that nodulation<br />
machinery predates the 58-Mya WGD.<br />
Moreover, many <strong>of</strong> the known regulatory<br />
steps in rhizobial nodulation are shared with<br />
mycorrhizal signaling (66), a symbiosis broadly<br />
shared among angiosperms (9). Just a few <strong>of</strong><br />
the known recognition steps are exclusively<br />
associated with rhizobial nodulation, including<br />
the key receptor-like kinase, NFP (66). In<br />
analyzing the Mt genome, NFP was found to<br />
have a homoeolog, LYR1, and genome position<br />
and K s data indicate that these duplicated<br />
genes derive from the 58-Mya WGD. NFP is<br />
nodulation specific in expression and function,<br />
whereas LYR1 is upregulated in mycorrhizae<br />
(30). In separate work, a nodulating nonlegume,<br />
Parasponia andersonii, is known to contain a single<br />
gene coding for a protein with the functions<br />
<strong>of</strong> both NFP and LYR1 (68). Therefore, one<br />
likely interpretation would be that the 58-Mya<br />
papilionoid WGD led to subfunctionalization<br />
<strong>of</strong> a more ancient gene that previously carried<br />
out both functions, resulting in two descendent<br />
genes that split the nodulation and mycorrhizal<br />
recognition functions between them. A separate<br />
nodulation-related transcription factor,<br />
ERN1 (96), also possesses a homoeolog (ERN2)<br />
in Mt. Like NFP/LYR1, ERN1 and ERN2 have<br />
contrasting nodulation-versus-mycorrhizal<br />
expression patterns and also derive from the<br />
58-Mya WGD. Potentially, they are a second<br />
example <strong>of</strong> sub- or ne<strong>of</strong>unctionalization<br />
resulting from the papilionoid WGD event.<br />
These observations even suggest a potential<br />
phylogenetic strategy for discovering genes<br />
that play a role in nodulation. It should be<br />
possible to mine the products <strong>of</strong> the 58-Mya<br />
WGD and search for genes that have nodulerelated<br />
expression in one or both gene products<br />
<strong>of</strong> the WGD event. At this point, one<br />
could examine potentially novel (or at least<br />
interesting) functions that these genes might be<br />
playing in nodulation. Indeed, this strategy has<br />
already been put <strong>into</strong> practice with the identification<br />
<strong>of</strong> a cytokinin response regulator promoting<br />
the expression <strong>of</strong> ERN1 (67). Analysis <strong>of</strong><br />
the Mt genome uncovers 51 additional WGDderived<br />
homoeolog pairs with one or both duplicates<br />
upregulated in nodules, including 10<br />
additional transcription factor genes.<br />
PERSPECTIVES ON<br />
LEGUME GENOMICS<br />
It is difficult to believe that massive amounts <strong>of</strong><br />
sequence data have been available in plants for<br />
such a short time. The pace <strong>of</strong> change has been<br />
so rapid that in less than a decade we have gone<br />
from having only thousands <strong>of</strong> ESTs in a few<br />
legume species to having three robust legume<br />
reference genomes. This review has examined<br />
ways in which the rapidly growing body <strong>of</strong><br />
genome sequence data sheds light on legume<br />
biology. At the simplest level, translation <strong>of</strong><br />
genome data between legume species enables<br />
important practical applications: the discovery<br />
<strong>of</strong> genetic markers, the development <strong>of</strong> linkage<br />
maps, and the saturation <strong>of</strong> genome regions<br />
for positional cloning. This is especially true<br />
for minor legumes, where many species are<br />
important to agriculture but supported by<br />
small research communities. At a more basic<br />
level, dissection <strong>of</strong> genome sequence data reveals<br />
the structure, architecture, and evolution<br />
<strong>of</strong> important gene families and enables the<br />
identification <strong>of</strong> orthologous versus paralogous<br />
relationships. Complete genome sequences<br />
also reveal legume- and species-specific genes<br />
whose functions remain largely unknown,<br />
although unquestionably important. Gene and<br />
genome duplications, so critical in shaping<br />
plant genomes, contain intrinsic information<br />
that can be exploited to predict function and<br />
the structure <strong>of</strong> genetic networks. Candidate<br />
gene discovery based on the papilionoid WGD<br />
is a promising example. In legumes, applying<br />
these strategies to nodulation and seed development<br />
will be especially critical. Additional<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 299
sequencing and resequencing <strong>of</strong> legume species<br />
will make this possible, but inevitably, it is<br />
the research community’s capacity to develop<br />
imaginative strategies for exploiting massive<br />
sequence data that will move legume genomics<br />
from the computer to biology.<br />
SUMMARY POINTS<br />
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
1. The genome sequences <strong>of</strong> three legumes—Glycine max, Medicago truncatula, andLotus<br />
japonicus—have recently been completed, and they illustrate a history <strong>of</strong> whole-genome<br />
duplication with important implications in legume biology. Glycine, in particular, underwent<br />
a genome duplication event within the past 13 million years that is strikingly<br />
evident in its genome architecture.<br />
2. Most agriculturally important legume crops, including so-called orphan species, are phylogenetically<br />
close to Glycine, Medicago,andLotus. Consequently, translational genomics<br />
to orphaned legumes should be straightforward and practically useful. It also means<br />
that major clades <strong>of</strong> more distant legumes remain largely unexplored from a genomic<br />
perspective.<br />
3. Analysis <strong>of</strong> legume genome sequence reveals hundreds <strong>of</strong> family-specific genes not observed<br />
in other angiosperms. They include a large group <strong>of</strong> defensin-like peptide genes<br />
seen only in Medicago and its close relatives that are exclusively expressed in nodules and<br />
in some cases play important roles in rhizobial differentiation.<br />
4. The aftermath <strong>of</strong> genome duplication in legumes involves extensive gene fractionation,<br />
especially in the lineage leading to Medicago and Lotus, as well as apparent examples <strong>of</strong><br />
sub- and ne<strong>of</strong>unctionalization. In some cases, products <strong>of</strong> whole-genome duplication<br />
have contributed to the elaboration <strong>of</strong> a preexisting capacity for rhizobial nodulation.<br />
DISCLOSURE STATEMENT<br />
N.D.Y. is principal investigator <strong>of</strong> a National Science Foundation Plant <strong>Genome</strong> Research Program<br />
grant that supported the sequencing <strong>of</strong> M. truncatula and later the development <strong>of</strong> an<br />
M. truncatula HapMap platform.<br />
ACKNOWLEDGMENTS<br />
We thank Doug Cook, Rene Geurts, and R. Op den Camp for helpful discussions relating to<br />
unpublished work; Robert Stupar for his review <strong>of</strong> the manuscript; and Sebastian Proost and Yves<br />
Van der Peer for preliminary analyses involving the PLAZA platform.<br />
LITERATURE CITED<br />
1. Ahn S, Tanksley SD. 1993. Comparative linkage maps <strong>of</strong> the rice and maize genomes. Proc. Natl. Acad.<br />
Sci. USA 90:7980–84<br />
2. Alkan C, Sajjadian S, Eichler EE. 2010. Limitations <strong>of</strong> next-generation genome sequence assembly. Nat.<br />
Methods 8:61–65<br />
3. Arabidopsis <strong>Genome</strong> Init. 2000. Analysis <strong>of</strong> the genome sequence <strong>of</strong> the flowering plant Arabidopsis<br />
thaliana. Nature 408:796–815<br />
300 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
4. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, et al. 2010. <strong>Genome</strong>-wide association<br />
study <strong>of</strong> 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–31<br />
5. Bertioli DJ, Moretzsohn MC, Madsen LH, Sandal N, Leal-Bertioli SC, et al. 2009. An analysis<br />
<strong>of</strong> synteny <strong>of</strong> Arachis with Lotus and Medicago sheds new light on the structure, stability and<br />
evolution <strong>of</strong> legume genomes. BMC Genomics 10:45<br />
6. Blanc G, Wolfe KH. 2004. Functional divergence <strong>of</strong> duplicated genes formed by polyploidy during<br />
Arabidopsis evolution. Plant Cell 16:1679–91<br />
7. Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions<br />
<strong>of</strong> duplicate genes. Plant Cell 16:1667–78<br />
8. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs<br />
using SSPACE. Bioinformatics 27:578–79<br />
9. Bonfante P, Genre A. 2008. Plants and arbuscular mycorrhizal fungi: an evolutionary-developmental<br />
perspective. Trends Plant Sci. 13:492–98<br />
10. Boutin SR, Young ND, Olson TC, Yu ZH, Vallejos CE, Shoemaker RC. 1995. <strong>Genome</strong> conservation<br />
among three legume genera detected with DNA markers. <strong>Genome</strong> 38:928–37<br />
11. Branca A, Paape T, Zhou P, Briskine R, Farmer AD, et al. 2011. Whole-genome nucleotide diversity,<br />
recombination, and linkage-disequilibrium in the model legume Medicago truncatula. Proc. Natl. Acad.<br />
Sci. USA 108:E864–70<br />
12. Cannon SB, Ilut D, Farmer AD, Maki SL, May GD, et al. 2010. Polyploidy did not predate the<br />
evolution <strong>of</strong> nodulation in all legumes. PLoS ONE 5:e11630<br />
13. Cannon SB, May GD, Jackson SA. 2009. Three sequenced legume genomes and many crop species: rich<br />
opportunities for translational genomics. Plant Physiol. 151:970–77<br />
14. Cannon SB, Mitra A, Baumgarten A, Young ND, May G. 2004. The roles <strong>of</strong> segmental and tandem<br />
gene duplication in the evolution <strong>of</strong> large gene families in Arabidopsis thaliana. BMC Plant Biol. 4:10<br />
15. Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, et al. 2006. <strong>Legume</strong> evolution viewed through<br />
the Medicago truncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. USA 103:14959–64<br />
16. Choi HK, Mun JH, Kim DJ, Zhu H, Baek JM, et al. 2004. Estimating genome conservation between<br />
crop and model legume species. Proc. Natl. Acad. Sci. USA 101:15289–94<br />
17. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. 2007. Common sequence polymorphisms<br />
shaping genetic diversity in Arabidopsis thaliana. Science 317:338–42<br />
18. Córdoba JM, Chavarro C, Schlueter JA, Jackson SA, Blair MW. 2010. Integration <strong>of</strong> physical and genetic<br />
maps <strong>of</strong> common bean through BAC-derived microsatellite markers. BMC Genomics 11:436<br />
19. Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S, et al. 2008. Detection and validation <strong>of</strong> single<br />
feature polymorphisms in cowpea (Vigna unguiculata L. Walp) using a soybean genome array. BMC<br />
Genomics 9:107<br />
20. Devos KM, Gale MD. 2000. <strong>Genome</strong> relationships: the grass model in current research. Plant Cell<br />
12:637–46<br />
21. Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, et al. 2008. Evolutionary genetics <strong>of</strong> genome<br />
merger and doubling in plants. Annu. Rev. Genet. 42:443–61<br />
22. Fawcett JA, Maere S, Vandepeer Y. 2009. Plants with double genomes might have had a better chance<br />
to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106:5737–42<br />
23. Force A, Lynch M, Pickett FB, Amores A, Yan YL, et al. 1999. Preservation <strong>of</strong> duplicate genes by<br />
complementary, degenerative mutations. Genetics 151:1531–45<br />
24. Freeling M. 2009. Bias in plant gene content following different sorts <strong>of</strong> duplication: tandem, wholegenome,<br />
segmental, or by transposition. Annu. Rev. Plant Biol. 60:433–53<br />
25. Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, et al. 2010. Population genomic<br />
analysis <strong>of</strong> Tunisian Medicago truncatula reveals candidates for local adaptation. Plant J. 63:623–<br />
35<br />
26. Gale MD, Devos KM. 1998. Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95:1971–74<br />
27. Gao AG, Hakimi SM, Mittanck CA, Wu Y, Woerner BM, et al. 2000. Fungal pathogen protection in<br />
potato by expression <strong>of</strong> a plant defensin peptide. Nat. Biotechnol. 18:1307–131<br />
5. Demonstrates that<br />
papilionoid genome<br />
duplication is shared<br />
with distant Arachis,<br />
which shows extensive<br />
synteny with sequenced<br />
legumes.<br />
12. Shows that legume<br />
genome duplication<br />
apparently occurred<br />
only within the<br />
papilionoid lineage, and<br />
not within the<br />
Mimosoideae or<br />
Caesalpinioideae<br />
subfamilies.<br />
25. Utilizes a genome<br />
association mapping<br />
approach to<br />
characterize salt<br />
tolerance in a natural<br />
population.<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 301
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
45. Describes<br />
next-generation<br />
sequencing <strong>of</strong> a wild<br />
soybean relative and<br />
extensive<br />
characterization <strong>of</strong><br />
genome differences<br />
between species.<br />
28. Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, et al. 2011. Gene discovery and tissue-specific<br />
transcriptome analysis in chickpea with massively parallel pyrosequencing and Web resource development.<br />
Plant Physiol. 156:1661–78<br />
29. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, et al. 2011. High-quality draft assemblies<br />
<strong>of</strong> mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108:1513–18<br />
30. Gomez SK, Javot H, Deewatthanawong PD, Torres-Jerez I, Tang Y, et al. 2009. Medicago truncatula and<br />
Glomus intraradices gene expression in cortical cells harboring arbuscules in the arbuscular mycorrhizal<br />
symbiosis. BMC Plant Biol. 9:10<br />
31. Graham MA, Silverstein KA, Cannon SB, VandenBosch KA. 2004. Computational identification and<br />
characterization <strong>of</strong> novel genes from legumes. Plant Physiol. 135:1179–97<br />
32. Graham PH, Vance CP. 2003. <strong>Legume</strong>s: importance and constraints to greater use. Plant Physiol.<br />
131:872–77<br />
33. Han Y, Kang Y, Torres-Jerez I, Cheung F, Town CD, et al. 2011. <strong>Genome</strong>-wide SNP discovery in<br />
tetraploid alfalfa using 454 sequencing and high resolution melting analysis. BMC Genomics 12:350<br />
34. Hiremath PJ, Farmer A, Cannon SB, Woodward J, Kudapa H, et al. 2011. Large-scale transcriptome<br />
analysis in chickpea (Cicer arietinum L.), an orphan legume crop <strong>of</strong> the semi-arid tropics <strong>of</strong> Asia and<br />
Africa. Plant Biotechnol. J. 9:922–31<br />
35. Hougaard BK, Madsen LH, Sandal N, de Carvalho Moretzsohn M, Fredslund J, et al. 2008. <strong>Legume</strong><br />
anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and<br />
Arachis. Genetics 179:2299–312<br />
36. Huang X, Wei X, Sang T, Zhao Q, Feng Q, et al. 2010. <strong>Genome</strong>-wide association studies <strong>of</strong> 14 agronomic<br />
traits in rice landraces. Nat. Genet. 42:961–67<br />
37. Huang Z-W, Zhao T-J, Yu D-Y, Chen S-Y, Gai J-Y. 2008. Correlation and QTL mapping <strong>of</strong> biomass<br />
accumulation, apparent harvest index, and yield in soybean. Acta Agron. Sin. 34:944–51<br />
38. Imelfort M, Edwards D. 2009. De novo sequencing <strong>of</strong> plant genomes using second-generation technologies.<br />
Brief. Bioinforma. 10:609–18<br />
39. Innes RW, Ameline-Torregrosa C, Ashfield T, Cannon E, Cannon SB, et al. 2008. Differential accumulation<br />
<strong>of</strong> retroelements and diversification <strong>of</strong> NB-LRR disease resistance genes in duplicated regions<br />
following polyploidy in the ancestor <strong>of</strong> soybean. Plant Physiol. 148:1740–59<br />
40. Int. Rice <strong>Genome</strong> Seq. Proj. 2005. The map-based sequence <strong>of</strong> the rice genome. Nature 436:793–800<br />
41. Jaillon O, Aury JM, Nöel B, Policriti A, Clepet C, et al. 2007. The grapevine genome sequence suggests<br />
ancestral hexaploidization in major angiosperm phyla. Nature 449:463–67<br />
42. Kaló P, Seres A, Taylor SA, Jakab J, Kevei Z, et al. 2004. Comparative mapping between Medicago sativa<br />
and Pisum sativum. Mol. Genet. Genomics 272:235–46<br />
43. Kamphuis LG, Williams AH, D’Souza NK, Pfaff T, Ellwood SR, et al. 2007. The Medicago truncatula<br />
reference accession A17 has an aberrant chromosomal configuration. New Phytol. 174:299–303<br />
44. Kim KD, Shin JH, Van K, Kim DH, Lee SH. 2009. Dynamic rearrangements determine genome<br />
organization and useful traits in soybean. Plant Physiol. 151:1066–76<br />
45. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, et al. 2010. Whole-genome sequencing and<br />
intensive analysis <strong>of</strong> the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc.<br />
Natl. Acad. Sci. USA 107:22032–37<br />
46. Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, et al. 2007. Recombination and linkage disequilibrium<br />
in Arabidopsis thaliana. Nat. Genet. 39:1151–55<br />
47. Kinzig AP, Socolow RH. 1994. Human impacts on the nitrogen cycle. Phys. Today 47:24–35<br />
48. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, et al. 2009. Circos: an information aesthetic<br />
for comparative genomics. <strong>Genome</strong> Res. 19:1639–45<br />
49. Kulikova O, Gualtieri G, Geurts R, Kim DJ, Cook D, et al. 2001. Integration <strong>of</strong> the FISH pachytene<br />
and genetic maps <strong>of</strong> Medicago truncatula. Plant J. 27:49–58<br />
50. Lam H-M, Xu X, Lui X, Chen W, Yang G, et al. 2010. Resequencing <strong>of</strong> 31 wild and cultivated soybean<br />
genomes identifies patterns <strong>of</strong> genetic diversity and selection. Nat. Genet. 42:1053–59<br />
51. Langham RJ, Walsh J, Dunn M, Ko C, G<strong>of</strong>f SA, et al. 2004. Genomic duplication, fractionation and the<br />
origin <strong>of</strong> regulatory novelty. Genetics 166:935–45<br />
302 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
52. Lavin M, Herendeen PS, Wojciechowski MF. 2005. Evolutionary rates analysis <strong>of</strong> Leguminosae implicates<br />
a rapid diversification <strong>of</strong> lineages during the tertiary. Syst. Biol. 54:575–94<br />
53. Li H, Liu H, Han Y, Wu X, Teng W, et al. 2010. Identification <strong>of</strong> QTL underlying vitamin E contents<br />
in soybean seed among multiple environments. Theor. Appl. Genet. 120:1405–13<br />
54. Lin JY, Stupar RM, Hans C, Hyten DL, Jackson SA. 2010. Structural and functional divergence <strong>of</strong> a<br />
1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region<br />
from Phaseolus vulgaris. Plant Cell 22:2545–61<br />
55. Lynch M, O’Hely M, Walsh B, Force A. 2001. The probability <strong>of</strong> preservation <strong>of</strong> a newly arisen gene<br />
duplicate. Genetics 159:1789–804<br />
56. McClean PE, Mamidi S, McConnell M, Chikara S, Lee R. 2010. Synteny mapping between common<br />
bean and soybean reveals extensive blocks <strong>of</strong> shared loci. BMC Genomics 11:184<br />
57. Metzker ML. 2009. Sequencing technologies—the next generation. Nat. Rev. Genet. 11:31–46<br />
58. Meyers BC, Kaushik S, Nandety RS. 2005. Evolving disease resistance genes. Curr. Opin. Plant Biol.<br />
8:129–134<br />
59. Miller JR, Koren S, Sutton G. 2010. Assembly algorithms for next-generation sequencing data. Genomics<br />
95:315–27<br />
60. Moore G, Devos KM, Wang Z, Gale MD. 1995. Grasses, line up and form a circle. Curr. Biol. 5:737–39<br />
61. Muchero W, Diop NN, Bhat PR, Fenton RD, Wanamaker S, et al. 2009. A consensus genetic map <strong>of</strong><br />
cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proc. Natl. Acad. Sci.<br />
USA 106:18159–64<br />
62. Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, et al. 2005. Highly syntenic regions in the genomes<br />
<strong>of</strong> soybean, Medicago truncatula, andArabidopsis thaliana. BMC Plant Biol. 5:15<br />
63. Munroe DJ, Harris TJ. 2010. Third-generation sequencing fireworks at Marco Island. Nat. Biotechnol.<br />
28:426–28<br />
64. Nordborg M, Weigel D. 2008. Next-generation genetics in plants. Nature 456:720–23<br />
65. Nayak SN, Zhu H, Varghese N, Datta S, Choi HK, et al. 2010. Integration <strong>of</strong> novel SSR and gene-based<br />
SNP marker loci in the chickpea genetic map and establishment <strong>of</strong> new anchor points with Medicago<br />
truncatula genome. Theor. Appl. Genet. 120:1415–41<br />
66. Oldroyd GE, Downie JA. 2008. Coordinating nodule morphogenesis with rhizobial infection in legumes.<br />
Annu. Rev. Plant Biol. 59:519–46<br />
67. Op den Camp RHM, De Mita S, Lillo A, Cao Q, Limpens E, et al. 2011. A phylogenetic strategy based<br />
on a legume-specific whole genome duplication yields symbiotic cytokinin type-A response regulators.<br />
Plant Physiol. 157:2013–22<br />
68. Op den Camp RHM, Streng A, De Mita S, Cao Q, Polone E, et al. 2011. LysM-type mycorrhizal<br />
receptor recruited for rhizobium symbiosis in nonlegume Parasponia. Science 331:909–12<br />
69. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, et al. 2008. Sequencing <strong>of</strong> natural<br />
strains <strong>of</strong> Arabidopsis thaliana with short reads. <strong>Genome</strong> Res. 18:2024–33<br />
70. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, et al. 2009. The Sorghum bicolor<br />
genome and the diversification <strong>of</strong> grasses. Nature 457:551–56<br />
71. Paterson AH, Chapman BA, Kissinger JC, Bowers JE, et al. 2006. Many gene and domain families<br />
have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza,<br />
Saccharomyces and Tetraodon. Trends Genet. 22:597–602<br />
72. Paterson AH, Freeling M, Tang H, Wang X. 2010. <strong>Insights</strong> from the comparison <strong>of</strong> plant genome<br />
sequences. Annu. Rev. Plant Biol. 61:349–72<br />
73. Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ. 2005. Placing paleopolyploidy in relation to taxon<br />
divergence: a phylogenetic analysis in legumes using 39 gene families. Syst. Biol. 54:441–54<br />
74. Polhill RM. 1981. Papilionoideae. In Advances in <strong>Legume</strong> Systematics, Part 1, ed. RM Polhill, PH Raven,<br />
pp. 191–208. Kew, UK: R. Bot. Gard.<br />
75. Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, et al. 2009. PLAZA: a comparative genomics<br />
resource to study gene and genome evolution in plants. Plant Cell 21:3718–31<br />
76. Ratnaparkhe MB, Wang X, Li J, Compton RO, Rainville LK, et al. 2011. Comparative analysis <strong>of</strong> peanut<br />
NBS-LRR gene clusters suggests evolutionary innovation among duplicated domains and erosion <strong>of</strong> gene<br />
microsynteny. New Phytol. 192:164–78<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 303
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
79. Provides the initial<br />
report <strong>of</strong> the Lotus<br />
japonicus genome<br />
sequence.<br />
81. Provides the initial<br />
report <strong>of</strong> the Glycine<br />
max genome sequence.<br />
87. Gives an overview <strong>of</strong><br />
an alternative legume,<br />
Chamaecrista, found<br />
within one <strong>of</strong> the clades<br />
not generally targeted<br />
for genomic analysis.<br />
100. Provides the initial<br />
report <strong>of</strong> the Medicago<br />
truncatula genome<br />
sequence.<br />
77. Rausch T, Koren S, Denisov G, Weese D, Emde AK, et al. 2009. A consistency-based consensus algorithm<br />
for de novo and reference-guided sequence assembly <strong>of</strong> short reads. Bioinformatics 25:1118–24<br />
78. Sato S, Isobe S, Tabata S. 2010. Structural analyses <strong>of</strong> the genomes in legumes. Curr. Opin. Plant Biol.<br />
13:1–7<br />
79. Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, et al. 2008. <strong>Genome</strong> structure <strong>of</strong> the legume,<br />
Lotus japonicus. DNA Res. 15:1–8<br />
80. Schlueter JA, Dixon P, Granger C, Grant D, Clark L, et al. 2004. Mining EST databases to resolve<br />
evolutionary events in major crop species. <strong>Genome</strong> 47:868–76<br />
81. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, et al. 2010. <strong>Genome</strong> sequence <strong>of</strong> the<br />
palaeopolyploid soybean. Nature 463:178–83<br />
82. Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, et al. 2011. Reference-guided assembly <strong>of</strong> four<br />
diverse Arabidopsis thaliana genomes. Proc. Natl. Acad. Sci. USA 108:10249–54<br />
83. Shin JH, Van K, Kim DH, Kim KD, Jang YE, et al. 2008. The lipoxygenase gene family: a genomic<br />
fossil <strong>of</strong> shared polyploidy between Glycine max and Medicago truncatula. BMC Plant Biol. 8:133<br />
84. Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, et al. 1996. <strong>Genome</strong> duplication in soybean<br />
(Glycine subgenus soja). Genetics 144:329–38<br />
85. Shoemaker RC, Schlueter J, Doyle JJ. 2006. Paleopolyploidy and gene duplication in soybean and other<br />
legumes. Curr. Opin. Plant Biol. 9:104–9<br />
86. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, et al. 2009. ABySS: a parallel assembler for<br />
short read sequence data. <strong>Genome</strong> Res. 19:1117–23<br />
87. Singer SR, Maki SL, Farmer AD, Ilut D, May GD, et al. 2009. Venturing beyond beans and peas:<br />
What can we learn from Chamaecrista? Plant Physiol. 151:1041–47<br />
88. Soltis DE, Soltis PS, Morgan DR, Swensen SM, Mullin BC, et al. 1995. Chloroplast gene sequence data<br />
suggest a single origin <strong>of</strong> the predisposition for symbiotic nitrogen fixation in angiosperms. Proc. Natl.<br />
Acad. Sci. USA 92:2647–51<br />
89. Sprent JI. 2008. 60 Ma <strong>of</strong> legume nodulation: What’s new? What’s changing? J. Exp. Bot. 59:1081–84<br />
90. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, et al. 2011. <strong>Genome</strong>-wide association study <strong>of</strong> leaf<br />
architecture in the maize nested association mapping population. Nat. Genet. 43:159–62<br />
91. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, et al. 2006. The genome <strong>of</strong> black cottonwood,<br />
Populus trichocarpa (Torr. & Gray). Science 313:1596–604<br />
92. Van de Velde W, Zehirov G, Szatmari A, Debreczeny M, Ishihara H, et al. 2010. Plant peptides govern<br />
terminal differentiation <strong>of</strong> bacteria in symbiosis. Science 327:1122–26<br />
93. van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, et al. 2011. Sequence-based physical<br />
mapping <strong>of</strong> complex genomes by whole genome pr<strong>of</strong>iling. <strong>Genome</strong> Res. 21:618–25<br />
94. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, et al. 2012. Draft genome sequence <strong>of</strong> pigeonpea<br />
(Cajanus cajan), an orphan legume crop <strong>of</strong> resource-poor farmers. Nat. Biotechnol. 30:83–89<br />
95. Varshney RK, Close TJ, Singh NK, Hoisington DA, Cook DR. 2009. Orphan legume crops enter the<br />
genomics era! Curr. Opin. Plant Biol. 12:202–10<br />
96. Vernié T, Moreau S, de Billy F, Plet J, Combier JP, et al. 2008. EFD is an ERF transcription factor<br />
involved in the control <strong>of</strong> nodule number and differentiation in Medicago truncatula. Plant Cell 20:2696–<br />
713<br />
97. Wojciechowski MF, Sanderson MJ, Steele KP, Liston A. 2000. Molecular phylogeny <strong>of</strong> the “temperate<br />
herbaceous tribes” <strong>of</strong> papilionoid legumes: a supertree approach. In Advances in <strong>Legume</strong> Systematics, Part<br />
9, ed. PS Herendeen, A Bruneau, pp. 277–98. Kew, UK: R. Bot. Gard.<br />
98. Yang S, Feng Z, Zhang X, Jiang K, Jin X, et al. 2006. <strong>Genome</strong>-wide investigation on the genetic variations<br />
<strong>of</strong> rice disease resistance genes. Plant Mol. Biol. 62:181–83<br />
99. Yang S, Gao M, Xu C, Gao J, Deshpande S, et al. 2008. Alfalfa benefits from Medicago truncatula: the<br />
RCT1 gene from M. truncatula confers broad-spectrum resistance to anthracnose in alfalfa. Proc. Natl.<br />
Acad. Sci. USA 105:12164–69<br />
100. Young N, Debellé F, Oldroyd G, Geurts R, Cannon SB, et al. 2011. The Medicago genome<br />
provides insight <strong>into</strong> the evolution <strong>of</strong> rhizobial symbioses. Nature 480:520–24<br />
101. Young ND, Udvardi M. 2009. Translating Medicago truncatula genomics to crop legumes. Curr. Opin.<br />
Plant Biol. 12:193–201<br />
304 Young·Bharti
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
102. Zhang M, Wu YH, Lee MK, Liu YH, Rong Y, et al. 2010. Numbers <strong>of</strong> genes in the NBS and RLK<br />
families vary by more than four-fold within a plant species and are regulated by multiple factors. Nucleic<br />
Acids Res. 38:6513–25<br />
103. Zhang XC, Wu X, Findley S, Wan J, Libault M, et al. 2007. Molecular evolution <strong>of</strong> lysin motif-type<br />
receptor-like kinases in plants. Plant Physiol. 144:623–36<br />
104. Zhou S, Bechner MC, Place M, Churas CP, Pape L, et al. 2007. Validation <strong>of</strong> rice genome sequences<br />
by optical mapping. BMC Genomics 15:278<br />
www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 305
Contents<br />
Annual Review <strong>of</strong><br />
Plant <strong>Biology</strong><br />
Volume 63, 2012<br />
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
There Ought to Be an Equation for That<br />
Joseph A. Berry ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣1<br />
Photorespiration and the Evolution <strong>of</strong> C 4 Photosynthesis<br />
Rowan F. Sage, Tammy L. Sage, and Ferit Kocacinar ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣19<br />
The Evolution <strong>of</strong> Flavin-Binding Photoreceptors: An Ancient<br />
Chromophore Serving Trendy Blue-Light Sensors<br />
Aba Losi and Wolfgang Gärtner ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣49<br />
The Shikimate Pathway and Aromatic Amino Acid Biosynthesis<br />
in Plants<br />
Hiroshi Maeda and Natalia Dudareva ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣73<br />
Regulation <strong>of</strong> Seed Germination and Seedling Growth by Chemical<br />
Signals from Burning Vegetation<br />
David C. Nelson, Gavin R. Flematti, Emilio L. Ghisalberti, Kingsley W. Dixon,<br />
and Steven M. Smith ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣107<br />
Iron Uptake, Translocation, and Regulation in Higher Plants<br />
Takanori Kobayashi and Naoko K. Nishizawa ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣131<br />
Plant Nitrogen Assimilation and Use Efficiency<br />
Guohua Xu, Xiaorong Fan, and Anthony J. Miller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣153<br />
Vacuolar Transporters in Their Physiological Context<br />
Enrico Martinoia, Stefan Meyer, Alexis De Angeli, and Réka Nagy ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣183<br />
Autophagy: Pathways for Self-Eating in Plant Cells<br />
Yimo Liu and Diane C. Bassham ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣215<br />
Plasmodesmata Paradigm Shift: Regulation from Without<br />
Versus Within<br />
Tessa M. Burch-Smith and Patricia C. Zambryski ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣239<br />
Small Molecules Present Large Opportunities in Plant <strong>Biology</strong><br />
Glenn R. Hicks and Natasha V. Raikhel ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣261<br />
<strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong><br />
Nevin D. Young and Arvind K. Bharti ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣283<br />
v
Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />
by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />
Synthetic Chromosome Platforms in Plants<br />
Robert T. Gaeta, Rick E. Masonbrink, Lakshminarasimhan Krishnaswamy,<br />
Changzeng Zhao, and James A. Birchler ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣307<br />
Epigenetic Mechanisms Underlying Genomic Imprinting in Plants<br />
Claudia Köhler, Philip Wolff, and Charles Spillane ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣331<br />
Cytokinin Signaling Networks<br />
Ildoo Hwang, Jen Sheen, and Bruno Müller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣353<br />
Growth Control and Cell Wall Signaling in Plants<br />
Sebastian Wolf, Kian Hématy, and Herman Höfte ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣381<br />
Phosphoinositide Signaling<br />
Wendy F. Boss and Yang Ju Im ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣409<br />
Plant Defense Against Herbivores: Chemical Aspects<br />
Axel Mithöfer and Wilhelm Boland ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣431<br />
Plant Innate Immunity: Perception <strong>of</strong> Conserved Microbial Signatures<br />
Benjamin Schwessinger and Pamela C. Ronald ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣451<br />
Early Embryogenesis in Flowering Plants: Setting Up<br />
the Basic Body Pattern<br />
Steffen Lau, Daniel Slane, Ole Herud, Jixiang Kong, and Gerd Jürgens<br />
♣♣♣♣♣♣♣♣♣♣♣♣♣♣483<br />
Seed Germination and Vigor<br />
Loïc Rajjou, Manuel Duval, Karine Gallardo, Julie Catusse, Julia Bally,<br />
Claudette Job, and Dominique Job ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣507<br />
A New Development: Evolving Concepts in Leaf Ontogeny<br />
Brad T. Townsley and Neelima R. Sinha ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣535<br />
Control <strong>of</strong> Arabidopsis Root Development<br />
Jalean J. Petricka, Cara M. Winter, and Philip N. Benfey ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣563<br />
Mechanisms <strong>of</strong> Stomatal Development<br />
Lynn Jo Pillitteri and Keiko U. Torii ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣591<br />
Plant Stem Cell Niches<br />
Ernst Aichinger, Noortje Kornet, Thomas Friedrich, and Thomas Laux ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣615<br />
The Effects <strong>of</strong> Tropospheric Ozone on Net Primary Productivity<br />
and Implications for Climate Change<br />
Elizabeth A. Ainsworth, Craig R. Yendrek, Stephen Sitch, William J. Collins,<br />
and Lisa D. Emberson ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣637<br />
Quantitative Imaging with Fluorescent Biosensors<br />
Sakiko Okumoto, Alexander Jones, and Wolf B. Frommer<br />
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣663<br />
vi<br />
Contents