Genome-Enabled Insights into Legume Biology - University of ...

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org 

by University of Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only. 

Annu. Rev. Plant Biol. 2012. 63:283–305 

First published online as a Review in Advance on 

January 30, 2012 

The Annual Review of Plant Biology is online at 

plant.annualreviews.org 

This article’s doi: 

10.1146/annurev-arplant-042110-103754 

Copyright c○ 2012 by Annual Reviews. 

All rights reserved 

1543-5008/12/0602-0283$20.00 

∗ Corresponding author. 

Genome-Enabled Insights 

into Legume Biology 

Nevin D. Young 1,∗ and Arvind K. Bharti 2 

1 Department of Plant Pathology and Department of Plant Biology, University of 

Minnesota, St. Paul, Minnesota 55108; email: neviny@umn.edu 

2 National Center for Genome Resources, Santa Fe, New Mexico 87505; 

email: akb@ncgr.org 

Keywords 

comparative genomics, genome duplication, microsynteny, 

nodulation, symbiosis 

Abstract 

Legumes are the third-largest family of angiosperms, the secondmost-important 

crop family, and a key source of biological nitrogen in 

agriculture. Recently, the genome sequences of Glycine max (soybean), 

Medicago truncatula, andLotus japonicus were substantially completed. 

Comparisons among legume genomes reveal a key role for duplication, 

especially a whole-genome duplication event approximately 58 Mya 

that is shared by most agriculturally important legumes. A second 

and more recent genome duplication occurred only in the lineage 

leading to soybean. Outcomes of genome duplication, including gene 

fractionation and sub- and neofunctionalization, have played key roles 

in shaping legume genomes and in the evolution of legume-specific 

traits. Analysis of legume genome sequences also enables the discovery 

of legume-specific gene families and provides a framework 

for genome-wide association mapping that will target phenotypes of 

special importance in legumes. Translating genomic resources from 

sequenced species to less studied but still important “orphan” legumes 

will enhance prospects for world food production. 

283



Contents 

INTRODUCTION.................. 284 

SEQUENCING LEGUME 

GENOMES....................... 284 

Reference Legume Genomes . . . . . . . 284 

What Can We Learn from 

Sequenced Legume Genomes? . . 286 

Sequencing in Nonreference 

Legumes....................... 287 

From Genome Sequencing 

toResequencing................ 288 

COMPARATIVE GENOMICS AND 

THE SEARCH FOR THE 

PRIMORDIAL LEGUME 

GENOME........................ 289 

Strategies for Comparative 

Genomic Analysis . . . . . . . . . . . . . . . 289 

Comparing Legume Genomes . . . . . . 290 

Envisioning the Ancestral 

LegumeGenome............... 294 

GENOME DUPLICATIONS 

IN LEGUME BIOLOGY . . . . . . . . . 294 

Whole-Genome Duplication Events 

in the History of Legumes. . . . . . . 294 

THE AFTERMATH OF GENOME 

DUPLICATION AND ITS 

IMPACT ON LEGUME 

BIOLOGY........................ 296 

The Fates of Duplicated Genes . . . . . 296 

Impacts of Genome Duplication 

on Legume Biology . . . . . . . . . . . . . 297 

Genome Duplication and the 

Evolution of Nodulation . . . . . . . . 298 

PERSPECTIVES ON LEGUME 

GENOMICS...................... 299 

INTRODUCTION 

Legumes (Fabaceae or Leguminosae) are the 

third-largest family of flowering plants and 

the second-most-important plant family in 

agriculture. They are especially interesting because 

most have the capacity to fix atmospheric 

nitrogen through mutualistic interactions with 

rhizobial soil bacteria, a trait that is both 

ecologically and agriculturally important (32). 

Indeed, without the nitrogen fixed each year 

by legumes, humans would need to consume 

288 billion kg of additional fuel in the Haber- 

Bosch process to generate anhydrous ammonia 

for agriculture (47). Given their importance to 

people, legumes are now the target of extensive 

sequence-based genomics research, which is 

revolutionizing our understanding of legume 

evolution and its connection to biologically 

important traits. Of particular significance are 

the recently completed and annotated genomes 

of three legume species—Glycine max (soybean) 

(Gm) (81), Medicago truncatula (Mt) (100), and 

Lotus japonicus (Lj) (79). This review focuses on 

genomics research carried out in legume biology, 

emphasizing comparisons among legume 

genomes and the critical role of genome duplication 

and its aftermath in shaping present-day 

legume genomes and traits. 

With the recent publication of three legume 

genome sequences—and, very recently, a 

fourth (76)—and the rapid development of genomics 

tools for multiple legume species, there 

are already several excellent scientific reviews 

available to researchers. These reviews have 

emphasized the structural analyses of legume 

genomes (13, 78), translational opportunities 

provided by reference genome sequences (101), 

and the prospects for extending genome sequence 

data to less studied “orphan” legume 

species (13, 95). Therefore, we endeavor here 

to complement and expand the scope of these 

existing reviews with our focus on genome evolution 

and genome duplication, and on their 

impact on legume biology. 

SEQUENCING LEGUME 

GENOMES 

Reference Legume Genomes 

The genome sequences of Gm, Mt, and Lj 

form the foundation for much of our current 

understanding about legume genomics. All 

three species are members of Papilionoideae, 

a subfamily that diverged from the two 

other legume subfamilies (Mimosoideae and 

284 Young·Bharti



Caesalpinoideae) approximately 60 Mya (52). 

Most cultivated legumes are found within 

two sister clades of the papilionoids: the 

millettoid/phaseoloid clade [warm-season 

legumes, including Gm, pigeon pea (Cajanus 

cajan), common bean (Phaseolus vulgaris), 

mung bean (Vigna radiata), and cowpea (Vigna 

unguiculata)] and the temperate galegoid 

clade [cool-season legumes, including Mt, Lj, 

and species such as alfalfa (Medicago sativa), 

chickpea (Cicer arietinum), clover (Trifolium 

sp.), lentil (Lens sp.), and garden pea (Pisum 

sativum)]. Papilionoideae also includes two 

minor clades: the genistoid (lupin, Lupinus sp.) 

and the dalbergioid (peanut, Arachis hypogaea). 

Because all these species are reasonably close 

phylogenetically, insights from the Gm, Mt, 

and Lj genomes should be highly relevant 

when transferred among cultivated legume 

crops. However, the current emphasis on 

papilionoids also means that many interesting 

legume species—especially mimosoids (Mimosa, 

Acacia, Prosopis, and Chamaecrista, for 

example) and caesalpinioids (Caesalpinia, Senna, 

and tamarind, for example)—are quite distant 

evolutionarily from the nexus of genomics research. 

Researchers have noted this previously 

and highlighted the importance of developing 

genomics resources in additional nodes 

throughout the legume evolutionary tree (87). 

The Gm genome sequence, published 

in 2010, is currently the most thoroughly 

characterized legume genome (81). More than 

950 million base pairs (Mb) of the overall 

1,115-Mb genome were completed through 

the use of 8x Sanger whole-genome shotgun 

(WGS) sequencing. Many of the resulting 

pseudomolecules extend all the way from centromeres 

(as indicated by scaffolds extending 

into centromeric repeats) out to telomeres 

(with scaffolds extending into telomeric repeats). 

The Gm sequence is also impressive for 

the very large size of the resulting sequence 

scaffolds. These are the physically defined 

assemblies of sequence contigs that are built 

into Gm’s 20 chromosome pseudomolecules. 

In Gm assembly Glyma 1.0, the so-called L50 

(a common metric to describe scaffold size 

that is calculated by summing the lengths of 

all scaffolds from longest to shortest and then 

finding the scaffold size where you reach 50% 

of the overall length) is 47.8 Mb. By comparison, 

nearly all other published WGS plant 

genome sequences have notably shorter L50s 

[with the notable exception of Sorghum bicolor 

(70), another very high-quality assembly]. 

It is especially noteworthy that nearly all of 

the published Gm sequence (98%) could be 

anchored to specific chromosomal positions. 

The Mt genome was sequenced by a combination 

of Sanger-based bacterial artificial chromosome 

(BAC) clones (with genomic inserts 

approximately 80–120 kb in length) and ∼40x 

Illumina WGS (100). In this case, the sequencing 

effort was focused on euchromatic arms 

outside centromeric regions through the use 

of fluorescence in situ hybridization (FISH) 

(49) and optical mapping (104) to define physical 

location. Altogether, 367 Mb of the approximately 

470-Mb Mt genome (http://data. 

kew.org/cvalues) is included in the published 

assembly. Because of the emphasis on 

BAC-based sequencing, the quality in BACsequenced 

regions is quite high, although scaffolds 

tend to be relatively short (overall L50 

of 1.27 Mb) and only the BAC-based portion 

of the sequence (245 Mb, or 67%) could be 

anchored to specific chromosomal locations. 

Another 17 Mb of BAC-based sequence could 

not be anchored. The remaining portion of 

the Mt sequence consists of Illumina WGS 

(104 Mb), with the Illumina contigs being quite 

short (L50 of 2.4 kb, largest 31 kb) and primarily 

useful as a way to recover missing portions of 

the genome for gene discovery. Still, Mt chromosome 

5 is noteworthy in being a nearly intact 

BAC-based pseudomolecule that is complete on 

either side of the centromere. Throughout the 

entire pseudomolecule of Mt chromosome 5, 

there are just four sequence gaps, which is comparable 

in quality to the Arabidopsis thaliana (3) 

or Oryza sativa (40) genomes. One surprising 

result of the Mt sequencing project was the discovery 

of a large chromosomal translocation in 

the accession used as a template for sequencing 

( Jemalong-A17) compared with other Mt 

www.annualreviews.org • Genome-Enabled Insights into Legume Biology 285



accessions. This had been suggested in previous 

genetic experiments that found biased segregation 

ratios involving crosses with A17 (43), but 

the sequencing project was able to pinpoint two 

breakpoints on chromosomes 4 and 8 to regions 

roughly the size of BAC clones. 

The Lj genome was published in 2008 (79) 

and was actually the first legume genome to 

appear, though it is still the most incomplete. 

As in Mt, the strategy was to focus on gene-rich 

portions of the genome through the sequencing 

of large insert clones (in this case, so-called 

transformation-competent artificial chromosomes). 

The published Lj genome sequence is 

315 Mb in length, corresponding to 67% of 

the Lj genome (472 Mb), but only 130 Mb is 

high quality and anchored to chromosomes. A 

more recent version of the Lj genome sequence 

is now available through the Web site of 

the lead sequencing group in Kazuza, Japan 

(ftp://ftp.kazusa.or.jp/pub/lotus/lotus_r2.5/ 

pseudomolecule), and it provides a much 

more robust platform for Lj genomics. This 

updated version (Lj 2.5) contains anchored 

pseudomolecules 268 Mb in length throughout 

the euchromatic portion of Lj plus 33 Mb of 

sequence as yet unanchored. 

What Can We Learn from Sequenced 

Legume Genomes? 

What have we learned about legume genomes 

from this first generation of sequencing 

projects? In the broadest sense, sequenced 

legume genomes look very much like those 

of other dicots, though comparisons with 

Arabidopsis can be complicated by its unusually 

small genome size and complex duplication 

history (3). A closer look at the Gm genome 

finds that ∼57% of the overall sequence 

is found in repeat-rich, low-recombination 

heterochromatin, while most genes (78%) are 

found in euchromatic chromosome arms (81). 

Of course, this also implies that substantial 

numbers of Gm genes (22%) lie within the 

pericentromeric heterochromatin, a somewhat 

surprising and potentially important result. As 

expected, crossovers are profoundly reduced 

near centromeres, with the ratio of genetic 

to physical distance dropping by 27-fold 

between the euchromatic and pericentromeric 

portions of the genome. Genome organization 

in Mt seems largely comparable, though the 

evidence for this is based on a combination of 

the BAC-based euchromatin sequence, FISH 

microscopy, and optical mapping (100). Notably, 

the estimated proportion of the genome 

located in pericentromeres is much lower in 

Mt compared with Gm (∼22% versus ∼57%), 

something that presumably plays a role in the 

difference in genome size. In both Gm and 

Mt, gene density is generally high throughout 

euchromatic arms, with only limited indications 

of a gene density gradient rising from 

centromere to telomere. In Mt, for example, 

the gene density is estimated at 16.9 per 100 kb 

(1 gene every 5.9 kb) throughout the euchromatin, 

with the average gene being 2,211 bp in 

length and containing four introns. By way of 

comparison, Mt values are similar to those in 

Arabidopsis (2,174 bp) and Oryza (3,403 bp). 

Altogether, the Gm genome is reported to 

have 46,430 “high-confidence” protein-coding 

loci, which represents a culled set of gene models 

from an original set that included ∼20,000 

predicted with lower confidence (81). In Mt, 

a total of 62,152 genes were annotated, a value 

that drops to 47,845 when retaining only those 

genes with experimental or database support. 

The similarity in gene counts between the two 

systems is surprising and significant, because 

the lineage leading to present-day soybean is 

known to have undergone a whole-genome 

duplication (WGD) at 13 Mya or later, a 

duplication that is absent in the Mt lineage 

(there is much more about this important 

evolutionary event below). Thus, one might 

have expected higher gene numbers in Gm 

compared with Mt. TheGm genome is also 

reported to have 313,125 retrotransposons and 

294,937 DNA transposons (spanning 403 Mb 

and 157 Mb, respectively), whereas the Mt 

genome has 253,048 retrotransposons and 

34,529 DNA transposons (spanning 88 Mb 

286 Young·Bharti



and 9.4 Mb, respectively). The lower numbers 

in Mt presumably reflect the lower amount of 

pericentromeric sequencing (also supported 

by the twofold difference in genome size), but 

may also indicate real genomic differences 

between the two species. 

Detailed examination of the genome sequences 

also provides insights into interesting 

or unusual gene families. The Gm genome 

is reported to have 283 legume-specific gene 

families (81), an estimate that increases to 

670 with the analysis of the more recent Mt 

genome sequence (100). Both Gm and Mt contain 

higher numbers of nucleotide-binding-site 

leucine-rich repeats (NBS-LRRs, also called 

NB-ARCs—i.e., nucleotide-binding adaptors 

shared by APAF-1, R proteins, and CED-4) 

containing disease-resistance genes than other 

plant genomes sequenced to date. In Mt, for example, 

there are 764 NBS-LRR-related genes, 

with at least 550 expressed based on RNA-Seq 

(100). Outside of legumes, O. sativa is reported 

to have the largest number so far (519) (98). 

More than 90% of Mt NBS-LRRs reside in 

clusters that contain on average 7.4 members, 

including two megaclusters—one on Mt06 with 

30 NBS-LRRs and another on Mt03 with 21. 

However, the conclusion that NBS-LRRs are 

overrepresented in legumes (or indeed in any 

plant family) needs to be tempered by the 

recent observation that there is considerable 

variation in NBS-LRR number between different 

accessions within a single species, including 

Gm (102). Legumes have higher numbers 

and increased complexity in other gene families: 

lipoxygenases (83), LysM receptor kinases 

(103), and flavonoid biosynthetic enzymes, such 

as chalcone synthase (100). It may be important 

that LysM receptors and flavonoids are both 

known to play important roles in nodulation. 

Finally, all three sequenced legumes contain 

unusually high numbers of F-box domain genes 

compared with other plant species, with Mt possessing 

three times the number of F-box domain 

genes compared with either Gm or Lj (100). 

The Mt genome is also notable for the 

presence of a large and novel gene family, 

the nodule-related cysteine-rich peptides 

(NCRs), which are members of the larger 

group of defensin-like sequences (DEFLs) 

(31). Notably, this group of genes has been 

observed only in members of the so-called 

inverted repeat-lacking clade (IRLC) [97a; 

http://tolweb.org/IRLC_(Inverted_Repeatlacking_clade)] 

of legumes, a subgroup of 

cool-season legumes that includes genera such 

as Pisum, Vicia, and Trifolium. The IRLC 

represents a clade of legumes known to have 

lost one copy of the 25-kb inverted repeat in 

its plastid genome—hence its name. Genome 

analysis demonstrates that the gene family is 

entirely missing from the sequences of Gm and 

Lj. DEFLs are known to act as antimicrobials in 

plants (27), although recently, Mt NCRs were 

also found to play a role in signaling terminal 

differentiation in rhizobial bacteria during 

nodulation (92). Notably, Mt and related 

genera develop an indeterminate nodule quite 

different than the one observed in Gm, Lj, 

or other papilionoids (89). Altogether, there 

are 593 NCRs in Mt along with 778 genes 

within the larger DEFL gene family. Like 

NBS-LRRs, NCRs are tightly clustered within 

the Mt genome, with 74% found in tandem 

clusters. Given their absence from the Gm 

and Lj genome sequences, NCRs must have 

expanded relatively rapidly within the IRLC 

clade. If so, some mechanism of propagation, 

such as ectopic movement followed by tandem 

duplication, may have led to their expansion. 

Sequencing in Nonreference Legumes 

Beyond the sequencing of reference species, 

genome-scale analysis is rapidly moving into 

less characterized legume species. Indeed, 

a draft genome sequence of pigeon pea 

(Cajanus cajan) has recently been published, including 

scaffolds representing 73% of the pigeon 

pea genome (94). All this is possible owing 

to the recent development of next-generation 

sequencing technologies, where billions of base 

pairs (Gb) can be sequenced at very high efficiency 

(57). In chickpea (C. arietinum), both 




Hiremath et al. (34) and Garg et al. (28) have 

used next-generation sequence technology to 

rapidly sequence the chickpea transcriptome. 

In the process, they developed an inventory for 

most chickpea expressed sequences, assigned 

predicted functions based on homology and 

gene ontology analysis, and aligned the assembled 

sequences to the Mt genome sequence. 

Next-generation sequencing in chickpea also 

led to the development of hundreds of different 

single-nucleotide polymorphism (SNP) 

and conserved genetic marker sequences useful 

in mapping. Córdoba et al. (18) have taken 

a very different approach to expanding the set 

of genomic tools in common bean (P. vulgaris). 

Analysis of nearly 90,000 BAC clones enabled 

the discovery of >600 simple sequence repeat 

markers. Mapping these repeats provided 

a basis for integrating the physical and genetic 

maps of Phaseolus. Many of the next-generation 

transcriptome assemblies and related data for 

orphan legume species are being collected 

and made available through the U.S. Department 

of Agriculture–supported Legume Information 

System (http://www.comparativelegumes.org) 

on its “Species” page. 

Inevitably, extending the power of wholegenome 

sequencing to nonreference legumes 

will require the creation of true whole-genome 

sequences for those species. This may soon be 

realistic given the ongoing increase in short 

read throughput coupled with decline in costs. 

However, de novo assembly of next-generation 

sequence data at the whole-genome scale remains 

challenging (2, 38). Nevertheless, there 

is intense work in this area aimed toward optimum 

contig assembly and improved scaffolding 

options (8, 29, 59, 86). Moreover, 

high-throughput physical mapping by wholegenome 

profiling (93) together with the launch 

of third-generation sequencing technologies 

such as those of Pacific Biosciences (PacBio) 

(63) will further enhance superscaffolding of 

genome assemblies into large pseudomolecules. 

Despite relatively high error rates, PacBio 

“strobed” multiple reads extending over long 

physical distances have great potential to contribute 

toward this goal. 

From Genome Sequencing 

to Resequencing 

Sequencing in legumes has not been limited to 

the development of reference genomes: Nextgeneration 

sequencing technologies also enable 

the resequencing of plant genomes (50, 77, 82), 

and resequencing opens the door to genomewide 

association studies. Here, statistical associations 

between sequence variation and naturally 

occurring phenotypic variation—detected 

at very high density through the process of 

resequencing—enable the discovery and localization 

of potential causative loci (4). But to 

make genome-wide association studies practical, 

insights into the architecture of sequence 

variation, haplotype size, population structure, 

and linkage disequilibrium (LD) are critically 

important (64). These subject areas, therefore, 

have been explored extensively in sequenced 

legume genomes (11, 50), just as in other wellcharacterized 

plant genomes (4, 36, 90). 

One example has been deep next-generation 

sequencing of the wild ancestor of cultivated 

soybean, Glycine soja, followed by comparison 

with the published Gm reference (45). Here, 

researchers generated more than 48 Gb of 

G. soja sequence, aligned it to the published Gm 

reference, and obtained more than 97% coverage. 

In the process, they discovered 2.5 Mb in 

total SNP variation between the genomes and 

found that 35.6% of all high-confidence genes 

contained at least one SNP. Additionally, they 

observed 406 kb of small insertions or deletions, 

32.4 Mb of unaligned and presumably 

deleted sequence from G. soja, and 8.3 Mb of 

novel, G. soja–specific sequence compared with 

Gm. Altogether, then, Gm and G. soja differ by 

0.31%, a value less than among Arabidopsis accessions 

(69) or between O. sativa ssp. indica 

and O. sativa ssp. japonica (40). Analysis of synonymous 

(K s ) values involving 6,780 genes also 

indicates that Gm and G. soja diverged approximately 

267,000 years ago, long before the 

domestication of soybean by humans. 

Focusing on genome variation within cultivated 

soybean itself, Lam et al. (50) utilized 

Illumina sequencing technology to survey 14 

288 Young·Bharti



cultivated and 17 wild Glycine accessions. Here, 

researchers obtained ∼5x coverage of the Gm 

genome for each of the 31 accessions. Not surprisingly, 

the wild accessions had much higher 

levels of genetic diversity (approximately 56% 

higher) and smaller LD blocks (approximately 

twice the frequency of LD blocks less than 

20 kb) compared with cultivated accessions. 

Indeed, they found that LD decays quite slowly 

in cultivated soybeans, with some LD blocks 

extending more than 1 Mb. Such results are expected 

during the domestication process, which 

presumably resulted in one or more genetic bottlenecks, 

lower diversity among cultivars, and 

large LD blocks. Separately, a scan for genome 

regions with high levels of differentiation between 

wild and cultivated soybeans and/or very 

low sequence diversity uncovered candidate regions 

associated with domestication. Two such 

regions of special interest were discovered, including 

a 200-kb region on Gm chromosome 

10 that overlaps known quantitative trait loci 

(QTLs) for harvest index, yield, and vitamin E 

content (37, 53). Analytical strategies like this 

involving a search for potential sites of selection 

are some of the most promising outcomes 

of genome resequencing research. 

The Mt genome has also been the target 

of genome resequencing (11). Twenty-six 

Mt accessions were sequenced to nearly 30x 

coverage, discovering more than 3 million 

total SNPs at a genome-wide density of 0.004– 

0.006 (i.e., 4–6 sequence variants every 1 kb), 

significantly higher than in wild and cultivated 

soybeans (50) or in Arabidopsis (17). LD decays 

quickly in Mt, reaching half its initial value 

within 3–4 kb, quite similar to that of Arabidopsis 

(46). Two gene families, the NBS-LRRs 

and NCRs, were found to harbor significantly 

higher levels of sequence diversity, especially in 

nonsynonymous sites. NBS-LRRs are known 

from other studies to be highly diverse (17), but 

it is intriguing to find that NCRs are also highly 

diverse given their recently discovered role in 

rhizobium signaling (92). Finally, resequence 

data in Mt revealed four genome regions 

as potential sites for selection, this time by 

searching for contiguous windows of very low 

sequence diversity. Three of these regions were 

located at telomeric ends of chromosomes, 

though the significance of this is unknown, and 

only a few examples of genes with suggestive 

functions (an isolated NBS-LRR, ENOD92) 

were found within candidate regions. 

Population genomic analysis can also reveal 

candidate regions associated with local 

adaptation, as demonstrated by Friesen et al. 

(25). In this case, 12 inbreds derived from four 

wild Tunisian populations of Mt were analyzed 

using Affymetrix GeneChip technology. 

Here, sequence variation is revealed by analysis 

of single-feature polymorphisms (SFPs), which 

are hundreds of thousands of probes located 

throughout the genome and all interrogated simultaneously 

by hybridization. The underlying 

logic of the study was to search for SFPs among 

inbreds and then to target loci that assorted by 

population. Altogether, 7% of all Affymetrix 

features segregated among inbreds, but only 

3% differentiated populations. By design, these 

Mt populations could be split into two groups 

according to their original habitats: two populations 

from saline environments versus two from 

nonsaline environments. A total of 18 genome 

regions defined by 52 probes showed consistent 

differences between the two habitats, results 

that could be validated by assaying a subset 

of the SFPs on a larger set of individuals in 

contrasting populations. 

COMPARATIVE GENOMICS AND 

THE SEARCH FOR THE 

PRIMORDIAL LEGUME GENOME 

Strategies for Comparative 

Genomic Analysis 

It has long been known that species in the 

same taxonomic family share extensive tracts of 

homologous genes, often in the same or similar 

gene order (1, 20, 26). This is commonly called 

synteny, though colinearity is probably a better 

term whenever gene order is maintained. 

Legumes are no exception, with a growing 

number of studies demonstrating genome-scale 

synteny, especially among papilionoids (5, 10, 

16). Synteny is discovered either through the 




genetic mapping of sequence-based markers 

segregating in multiple related species or 

by large-scale similarity searches between 

sequenced genomes. A hybrid approach, where 

sequenced genetic markers in one species are 

compared with a sequenced genome, is useful 

in translating insights from a reference to a less 

well-characterized species (5, 61, 65). Comparative 

genomics makes it possible to infer the 

structural changes that have led to present-day 

species while also enabling the reconstruction 

of primordial genome structure—the architecture 

of ancestral chromosomes and the 

underlying repertoire of genes (72). From a 

practical point of view, comparative genomics 

expands the range of genomics tools available 

for positional gene cloning (99) and discovery 

of new genetic markers (19, 33, 35), especially 

in species with few genomic tools. Legumes 

fit this description nicely, with dozens of 

agriculturally important but less well-studied 

crop species. This list includes valuable food 

crops like garden pea (P. sativum), chickpea 

(C. arietinum), alfalfa (M. sativa), common bean 

(P. vulgaris), and cowpea (V. unguiculata), all 

well-positioned phylogenetically with respect 

to the sequenced genomes of Gm, Mt, andLj 

(95). 

Visualization is key to successful comparative 

genomics, and there are various methods 

to visualize genome comparisons. One popular 

technique involves Circos diagrams (48), 

where chromosomes are placed end to end 

along the outside of a circle, and then colored 

arcs connecting homologous segments are 

joined within the circle (for notable legume 

examples, see References 50 and 100). An 

especially attractive feature of Circos diagrams 

is their ability to visualize multiple genomes 

while also illustrating synteny at reasonably 

high resolution. Alternatively, synteny can be 

visualized through the use of dot-plot diagrams 

(Figures 1 and 2). Here, one genome (or 

genome segment) is laid along the horizontal 

axis and a second genome (or segment) is laid 

along the vertical axis. A mark is then made at 

intersections where the two genomes display 

sequence similarity above some cutoff value. 

This results in significant stretches of synteny 

appearing as diagonal lines, with cases of parallel 

diagonal lines spanning the same portion 

of a genome indicating a potential duplication 

event. Of course, the dot-plot method can 

easily be applied to genetic marker comparisons 

and does not require sequenced genomes 

(42). Notably, both visualization methods can 

be used to compare a genome with itself in 

a search for within-species synteny, thereby 

investigating duplication events and helping to 

reveal the genomic history of a given species. 

Comparing Legume Genomes 

Although comparative genomics is most powerful 

when comparing sequenced genomes, 

there are only a few such legume genome 

sequences available today. Consequently, most 

legume comparative genomics studies to date 

have involved comparisons based on genetic 

markers. This raises the question, how are 

large numbers of shared and polymorphic 

markers discovered for multiple species? One 

successful strategy has been to design exonic 

polymerase chain reaction (PCR) primers 

that amplify across (shared) introns using 

available genomic sequence data as the basis 

for primer design. The idea here is that exonic 

sequences tend to be highly conserved, whereas 

intronic sequences tend to be variable, thereby 

providing both the conservation needed across 

species for successful PCR as well as the polymorphism 

needed for segregation mapping. 

As an example, Choi et al. (16) developed 

hundreds of potential cross-species legume 

markers based on Mt and Arabidopsis sequence 

data and demonstrated extensive synteny across 

papilionoids through detailed analysis of ∼50 

such markers. These markers demonstrated 

conservation that stretched from millettoids 

[Gm, mungbean(V. radiata)] all the way to 

galegoids [Mt, garden pea (P. sativum), alfalfa 

(M. sativa)]. In the process, they established 

the first integrated view of legume synteny in 

the form of a concentric graphic view (60) and 

illustrated the overall topology of pan-legume 

synteny. More recently, Hougaard et al. (35) 

290 Young·Bharti

*Lj03N 

*Lj03S 

Lj04N 

*Lj04S 



Lotus japonicus genome 

Figure 1 

*Lj02N 

*Lj06S 

Lj05S 

Lj05N 

Lj02S 

Lj01S 

Lj01N 

*Lj06N 

3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N 

Medicago truncatula genome 

Whole-genome dot-plot of two cool-season legume species, Medicago truncatula (Mt) ssp. truncatula 

Jemalong-A17 (horizontal axis) andLotus japonicus (Lj) cultivar Miyakojima MG-20 (vertical axis). An asterisk 

next to a chromosome number indicates reverse complement. The numbers/letters on the axes represent the 

chromosome number and north/south arms, respectively; these have been rearranged so that synteny blocks 

line up along the center diagonal, which makes the comparison easier to visualize. Many synteny blocks are 

nearly the lengths of whole chromosome arms (red circle), whereas others are disrupted by rearrangements 

( green circle). Secondary synteny blocks outside the main diagonal (orange circles) represent the wholegenome 

duplication in Papilionoideae ∼58 Mya. Two notable genome regions where synteny is totally 

lacking between the two species (Mt06N with Lj06S and Mt03N/Mt04N with Lj03N) are circled in purple. 

took the intron-spanning approach a step further 

by showing that 50% of intron-spanning 

markers designed from Arabidopsis work successfully 

in both common bean (P. vulgaris) 

and peanut (A. hypogaea). This is significant 

because peanut, although still a papilionoid, 

is in the dalbergioid clade, which is phylogenetically 

separate from the more frequently 

characterized millettoid and galegoid clades. 

Another strategy for comparative genomics 

begins with the mining of existing 

expressed sequence tag (EST) databases to 

search for SNPs or other types of mappable 

polymorphisms. Once positioned genetically, 

the underlying ESTs can be compared with 

one of the sequenced legume genomes as a 

basis for discovering shared synteny. Bertioli 

et al. (5) adopted this approach and extended 


Gm12S 

*Gm15S 

Gm20N 

Gm05N 

Gm12N 

Gm06S 

Gm07S 

Gm17N 

*Gm08N 

*Gm16S 



Glycine max genome 

*Gm05S 

*Gm13N 

*Gm07N 

*Gm19N 

Gm09N 

*Gm13S 

Gm15N 

*Gm20S 

Gm10S 

*Gm10N 

*Gm14N 

Gm14S 

*Gm17S 

Gm02S 

*Gm11N 

*Gm02N 

Gm01S 

*Gm01N 

Gm03S 

*Gm09S 

*Gm03N 

Gm16N 

Gm19S 

Gm18S 

Gm06N 

Gm11S 

Gm08S 

Gm04N 

*Gm04S 

*Gm18N 

3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N 

Medicago truncatula genome 

292 Young·Bharti



the earlier work of Hougaard et al. (35) by 

focusing on ∼126 cross-species ESTs mapped 

in Arachis and compared with available Mt and 

Lj sequences. They found that most synteny 

blocks align to a single region in either genome, 

an important observation because it implies 

that a previously predicted papilionoid WGD 

event (see below) predated the divergence of 

Arachis from galegoids and phaseoloids, and 

so occurred very early in the evolution of the 

subfamily. In Muchero et al. (61), more than 

10,000 SNPs were discovered within available 

EST databases of cowpea (V. unguiculata) and 

then used in map construction leading to 928 

positioned cowpea loci through the use of a 

medium-throughput Illumina GoldenGate 

assay system. Comparison with Gm revealed 

85% macrosynteny, while macrosynteny with 

the more distantly related Mt was still high 

at 82%. In a similar study, McClean et al. 

(56) examined >300 gene-based Phaseolus loci 

coming from EST and BAC-end sequence data 

and compared them with the Gm reference 

genome sequence, discovering 55 synteny 

blocks on 35 of Gm’s 40 chromosome arms. 

Syntenic blocks averaged 32 centimorgans 

in length in Phaseolus, a genetic distance that 

corresponded to an average physical distance of 

4.9 Mb in Gm. Using this set of synteny blocks 

as reference points, they could tentatively position 

another 15,000 Phaseolus gene sequences 

solely based on the Gm genome sequence. 

Side-by-side comparison of sequenced 

genomes is the most powerful way to learn 

about genome histories. In such comparisons 

it becomes possible to estimate the fraction of 

shared genes, the size distribution of synteny 

blocks, or differences between genomes in gene 

density or organization. Going a step further, 

one can examine genome rearrangements at the 

macroscale, whether they are shared or lineage 

specific, or drill down to the base-pair level to 

dissect the fine structure of conserved colinear 

genes. Ultimately, as more sequenced species 

are added to the analysis, we begin to see the actual 

step-by-step changes that distinguish one 

genome from another. 

One of the first sequence-based comparisons 

in legumes was between Mt and Gm. Focusing 

on a genome region surrounding a nematoderesistance 

gene in Gm (rhg1) on chromosome 

Gm18, Mudge et al. (62) found that 75% of 

genes were colinear between Mt and Gm in a 

region spanning ∼150 genes, including a remarkable 

stretch where 33 of 35 genes (94%) 

were conserved and colinear, a phenomenon 

they termed hypersynteny. Cannon et al. (15) 

later carried out a genome-scale sequence comparison 

based on the partially completed Mt and 

Lj genomes available at the time. In the case of 

one large synteny block between Mt05N and 

Lj02S, they found that 58 of 94 genes (62%) existed 

as colinear orthologous pairs between the 

syntenic segments. Indeed, synteny between Mt 

and Lj was found to extend nearly genomewide, 

despite a time span of 40–50 Mya since 

speciation. 

Figure 1 shows an updated dot-plot 

comparison of Mt and Lj based on versions 

of the genomes available in mid-2011 

(Reference 100 and ftp://ftp.kazusa.or. 

jp/pub/lotus/lotus_r2.5/pseudomolecule). 

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 

Figure 2 

Whole-genome dot-plot of the cool-season legume Medicago truncatula (Mt) ssp. truncatula Jemalong-A17 

(horizontal axis) and the warm-season legume Glycine max (Gm) var. Williams 82 (vertical axis). The 

pericentromeric regions of Gm chromosomes have been removed for this analysis. An asterisk next to a 

chromosome number indicates reverse complement. Chromosome arms have been rearranged so that the 

synteny blocks line up along the center diagonal, which makes the comparison easier to visualize. The 

presence of two synteny diagonals for almost every Mt region indicates an additional recent whole-genome 

duplication (



Here, chromosome arms for both Mt and 

Lj (based on the estimated positions of centromeric 

regions) have been reordered and 

in some cases flipped (noted by an asterisk) 

to align synteny blocks into a single coherent 

line. The result highlights the genome-scale 

synteny observed between the two species. 

If perfect synteny existed between Mt and 

Lj, the roughly 45 ◦ dot-plot line would be 

straight and continuous, and would reach 

all the way from one end to the other. The 

fact that the actual result produces a line that 

approaches this ideal is overwhelming evidence 

for genome-scale synteny between the two 

species. Synteny blocks are nearly the lengths 

of whole chromosome arms, and overall they 

span more than 75% of both species. One 

striking example between Mt05N and Lj02S 

is circled in red. Still, there are also breaks in 

synteny—for example, Mt07S and its synteny 

with Lj01S (circled in green). Here, rather 

than a contiguous diagonal line, one sees a 

cloud of shorter synteny blocks, broken into 

six pieces with two of them flipped around. 

Apparently, one or both syntenic chromosomes 

experienced major reorganization events since 

the separation of Mt and Lj.Therearealsonotable 

genome regions where synteny is totally 

lacking between the two species. Mt06N with 

Lj06S and Mt03N/Mt04N with Lj03N (circled 

in purple) are striking examples. Significantly, 

these genome regions coincide with higher 

densities of NBS-LRRs and retrotransposons 

compared with the remainder of the genome, a 

relationship that may be biologically significant 

(5) and similar in terms of degraded synteny to 

observations made in A. hypogaea (76). 

Envisioning the Ancestral 

Legume Genome 

Inevitably, as more legumes are sequenced it 

will become possible to reconstruct the ancestral 

legume genome, or at least the ancestral 

papilionoid genome. Such an effort is underway 

by integrating the sequenced legume genomes 

with comparably high-density marker/map data 

from species such as chickpea (C. arietinum) 

and pigeon pea (C. cajan) (D. Cook, personal 

communication). Comparisons of the Gm, Mt, 

and Lj genomes already provide a glimpse into 

the large-scale architecture of the ancestral 

legume genome. Despite the complexities resulting 

from the 13-Mya Glycine WGD event 

(discussed in further detail below), comparisons 

among Gm, Mt,andLj (Figures 1 and 2) 

suggest a limited number of ancestral synteny 

blocks that have been rearranged to generate 

present-day papilionoid genomes. In both comparisons, 

a conservative examination reveals just 

14 largely coherent blocks that span the majority 

of all three genomes. Notably, this estimate 

agrees nicely with the apparent basal chromosome 

number of seven for papilionoids (74). 

GENOME DUPLICATIONS 

IN LEGUME BIOLOGY 

Whole-Genome Duplication Events 

in the History of Legumes 

One of the most striking lessons coming out 

of plant comparative genomics has been the 

critical role of genome duplication in the evolutionary 

history of many, if not most, plant 

species (21). This is especially true in the case 

of legumes. Gm provided an early hint into the 

importance of WGD in genome restructuring 

in a study showing that restriction fragment 

length polymorphisms were duplicated on average 

2.55 times and localized to a homoeologous 

segment (paralogous sequences resulting 

from WGD) nearly as long as whole chromosomes 

(84). Later, as large amounts of genome 

sequence data became available, it became clear 

that most present-day plant genomes are the 

products of ancient genome-scale duplication 

events (examples include 3, 40, 41, 91). Subsequent 

studies have gone on to reveal the wide 

range of plant families that have experienced 

genome duplications and the architecture of retained 

duplication blocks, and have established 

reasonably precise estimates for the timing of 

key duplication events (7, 73, 85). We know, 

for example, that many dicots share an ancient 

(130–140 Mya) triploidization event based on 

294 Young·Bharti



synteny analysis of Vitus vinifera and the fact 

that each Vitus region typically shows synteny to 

three corresponding regions in other sequenced 

dicots (41). We also know that a surprisingly 

large number of plant WGD events followed 

closely after the Cretaceous-Tertiary boundary 

event ∼65 Mya. This led Fawcett et al. (22) 

to suggest that polyploids might have higher 

adaptability and greater tolerance to extreme 

conditions, something that would have come in 

quite handy during a time of widespread species 

extinction. Finally, we are beginning to discover 

the details about the aftermath of WGD events 

(summarized in 24)—and it is this final point, 

the consequence of genome duplication, that 

is especially relevant to our consideration of 

legume genome biology. 

Genome duplication is easy to see when 

looking at a dot-plot comparison. A closer look 

at Figure 1 reveals numerous secondary synteny 

blocks lying to one side or the other of the 

main diagonal. One notable example is where 

the primary synteny block involving Mt01N 

and Lj05N is paralleled by another synteny 

block lower down, between Mt01N and Lj01N 

(orange circles connected by an orange line). 

There are dozens of such duplicated synteny 

blocks in the comparison between these two 

species, and the simplest interpretation is an ancient 

WGD preceding the speciation between 

Mt and Lj. In a comparison like this, synteny 

blocks lying along the main diagonal represent 

the speciation event, whereas the off-center 

diagonals show regions of synteny resulting 

from one or more shared WGD events. Apparently, 

a WGD event that took place in the 

ancestor of Mt and Lj was followed quickly by 

a period of significant genome rearrangement 

and gene loss before speciation, rapidly degrading 

the quality of duplicate synteny blocks 

observed. (Loss of synteny in duplicate blocks 

is important in understanding the impact of 

duplication on legume biology and is discussed 

in more detail below.) The existence of such a 

WGD in the legume family has been indicated 

through multiple sources of evidence, especially 

K s (synonymous substitution) estimates 

between paralogs (6, 73, 80) and topology of 

phylogenetic tree analysis (12, 15). Integrating 

all these different sources of data leads to a 

best estimate for the timing of this WGD of 

58 Mya. This date would have preceded the 

Mt/Lj split (approximately 50 Mya) as well as 

the split with Gm (54 Mya) (52). Indeed, peanut 

(A. hypogaea), an earlier diverging papilionoid, 

also shares this WGD event (5). By contrast, 

a recent study in Chamaecrista indicates that 

this species (and presumably the Mimosoideae 

and Caesalpinioideae subfamilies) do not share 

the 58-Mya WGD event (12). In other words, 

we know with remarkable precision both the 

timing and evolutionary window for this pivotal 

WGD event in the history of legumes. Given 

the range of species that share this duplication, 

we will refer to it as the papilionoid WGD. 

But the papilionoid WGD is not the only 

one to play an important role in legume evolution. 

Figure 2 displays a comparison of the 

Mt and Gm genome sequences (based on their 

recently published sequences). This comparison 

illustrates important similarities but also 

striking differences with the Mt/Lj dot-plot in 

Figure 1. Gm and Mt clearly display extensive 

synteny, with many long, coherent synteny 

blocks. A quick count reveals as many as 30 

large-scale synteny blocks running the length 

of chromosome arms or nearly so. However, 

there is not a single 45 ◦ diagonal stretching 

across the genomes; instead, there are pairs of 

diagonals in Gm corresponding to individual 

chromosome arms of Mt. One example (circled 

in red) highlights synteny between Mt05S and 

two different Gm chromosomes/arms, Gm02S 

and Gm14N/Gm14S. A WGD is again the 

explanation, but this time, one that occurred 

more recently (estimated at 13 Mya) and only 

in the lineage leading to Gm (84). This duplication 

event explains the observation that there 

are two Gm blocks for each Mt genome region. 

Comparable levels of contiguity observed in 

each pair of synteny blocks are explained by the 

fact that both trace back to a single WGD event, 

and so the evolutionary distance between Mt 

and both of the Gm syntenic segments must be 

identical. This Glycine-specific WGD had been 

predicted previously (84), but the publication 




of the Gm genome revealed just how pervasive 

and fundamental it is in understanding the 

architecture of the present-day Gm genome. 

Figure 2 also illustrates exceptions to this 

pattern, demonstrating two important points. 

First, synteny blocks like the ones circled in 

orange show more ancient synteny blocks that 

trace back to the papilionoid WGD discussed 

above. Clearly, Mt and Gm (as well as Lj—and, 

indeed, all papilionoids) are expected to share 

the 58-Mya WGD event. Second, there are 

frequent cases of rearrangements—some that 

are simple, like the one involving Mt01S and 

Gm10N/Gm10S and Gm20S (circled in green), 

but others that are quite complex (one example 

circled in purple). These rearrangements are 

best explained by significant levels of reshuffling 

among the duplicated Glycine genome 

segments after the 13-Mya WGD event. 

THE AFTERMATH OF GENOME 

DUPLICATION AND ITS IMPACT 

ON LEGUME BIOLOGY 

The Fates of Duplicated Genes 

WGDs obviously have a profound impact on 

genome architecture. However, genome duplications 

play an equally important role in the 

evolution of individual genes and gene families. 

Other types of gene duplication exist— 

tandem gene duplication, segmental duplication, 

transposition—and they are certainly important 

in genomic and biological evolution 

(24). However, WGD events are worthy of special 

consideration because when they occur, every 

gene in the genome is suddenly present in 

two copies. In effect, the entire evolutionary trajectory 

of a lineage becomes primed to move in 

a novel direction. In the case of legumes, there 

is growing evidence that WGD events had an 

especially significant impact on nodulation and 

symbiosis with rhizobial bacteria (100). After 

duplications, there are only a small number of 

potential fates for duplicated gene pairs (24): 

Both paralogs are maintained and they share 

the function of their progenitor; both paralogs 

are maintained and one takes on an entirely new 

function; or one of the two progeny genes is lost 

and only a single copy is maintained. The first 

outcome (both genes maintained with shared 

function) is often called subfunctionalization, 

as the two paralogs have split up the function 

of their ancestor (23). The second (both 

maintained, one taking on a new function) is 

called neofunctionalization, for obvious reasons 

(55). The other possibility (only one gene retained, 

the other deleted) is fractionation (51) 

or, equivalently, diploidization. Still other outcomes 

are possible, such as pseudogenization 

without loss of one of the duplicates, but are 

not considered in detail here. Ultimately, biological 

function is expected to play a critical 

role in the fate of duplicated genes, with some 

functional classes (those most interconnected) 

retained more frequently than others (proteins 

that generally act solo) (24). Understanding 

gene fate following WGDs sheds light on important 

biological phenomena in legumes, including 

properties such as the generation of 

novel disease-resistance specificities and the appearance 

of novel developmental functions. 

To illustrate the fates of duplicated genes in 

legumes, Figure 3 displays a pair of duplicated 

segments in Mt roughly 150 kb in size each (located 

on Mt01 and Mt07) and shown alongside 

the four corresponding syntenic regions of 

Gm. This figure was created using the PLAZA 

genome analysis suite (75) and is based on the 

published sequences of Mt and Gm. The results 

are striking. Each Mt segment exhibits remarkable 

conservation with the pair of most closely 

related Gm segments, but far less conservation 

with its duplicate Mt pair. In this example, just 

7 of 19 genes (37%) in the duplicated blocks 

of Mt are maintained. These are homoeologs 

(WGD-derived paralogs) that trace back to the 

papilionoid WGD at 58 Mya. By contrast, the 

Mt07 segment shares 13 of 16 genes (81%) with 

either Gm03 or Gm19, whereas the Mt01 segment 

shares 11 of 13 (85%) with either Gm02 

or Gm10. These are orthologous relationships 

that derive from the millettoid/galegoid speciation 

event separating Mt and Gm at ∼55 Mya 

(52). It is noteworthy that the time span between 

the papilionoid WGD and the Mt/Gm 

296 Young·Bharti

Mt/Gm split 

~54 Mya 

Gm WGD 

~13 Mya 

WGD 

~58 Mya 

Ancestral 

legume 

Gm03 

Gm19 

Mt07 

Mt01 

Gm02 

Gm10 

81% 

37% 

85% 



Figure 3 

A 150-kb region on the Glycine max (Gm) andMedicago truncatula (Mt) genomes illustrating the differential gene loss between the 

duplicated regions, which took place after the split between warm- and cool-season legumes ∼54 Mya. In this example, only 37% of the 

genes are retained in both duplicated blocks of Mt, while the Mt duplicates retain 81%–85% with their Gm counterparts. By contrast, 

the number of retained gene pairs among Gm03/Gm19 (69%) and Gm02/Gm10 (100%) duplicates is much higher, at least in part due 

to the fact that the whole-genome duplication (WGD) in Gm is fairly recent (



translocated into the pericentromeric region of 

the chromosome. Between the two Gm genome 

regions, 77% of gene duplicates were retained. 

However, this high level of retention did not 

extend to NBS-LRRs, which existed as clusters 

in both genome regions, but with significant 

homoeolog-specific duplications and losses. 

The pericentromeric region was especially 

reduced in surviving NBS-LRRs. Clearly, 

NBS-LRR genes are subject to much higher 

levels of fractionation than other gene classes. 

Local duplications, deletions, and recombination 

are apparently acting preferentially on 

WGD-derived NBS-LRR clusters, with the 

pericentromeric NBS-LRR cluster experiencing 

much higher levels of fractionation. This 

pattern has been noted in other plant species, 

with NBS-LRRs frequently underrepresented 

in duplicated genome regions (14, 64), potentially 

reflecting a fitness cost associated with 

excess NBS-LRRs (58). 

In a similar study by Kim et al. (44), a different 

pair of homoeologous genome regions 

(1.96–4.60 Mb) on Gm05 and Gm17 and centeredaroundtheRxp 

bacterial leaf pustule– 

resistance gene were examined and compared 

with the homologous Mt genome regions. In 

this case, fractionation in Mt was observed to 

extend to the level of gene blocks (in which 

multiple linked genes were retained in one duplicate) 

but lost from the other (contrasting 

with the apparent gene-by-gene fractionation 

illustrated in Figure 3). In the case of Gm 

and the more recent 13-Mya WGD, duplicates 

were also retained as blocks rather than individual 

genes, though some of the gene blocks 

were not lost, but were instead translocated to 

a different location in the Gm genome. Notably, 

the locations of homoeologs coincided 

with known QTLs for leaf pustule resistance, 

leading the authors to suggest that duplicated 

resistance genes may have retrained their ancestral 

function and then diverged in a pathogen 

strain–specific manner. 

Finally, Lin et al. (54) examined two 

∼1-Mb homoeologous regions containing 

NBS-LRR clusters in Gm (on Gm08 and Gm15) 

as well as the orthologous region of common 

bean (P. vulgaris). The level of gene retention 

varied from 81% to 91% among the Gm segments, 

values somewhat higher than observed 

by others (39, 44; Figure 3). As in Innes et al. 

(39), this analysis uncovered significant differences 

in retrotransposon density between the 

two regions, differences that were correlated 

with differing levels of structural variation. Going 

beyond structural analysis, the study examined 

gene expression levels along the two Gm 

segments and found 38% higher transcriptional 

activity on Gm08 compared with Gm15 based 

on a metric that integrated expression among 

seven different tissues. This difference in expression 

activity is significant because expression 

variation between retained gene pairs is an 

expectation of sub- and neofunctionalization. 

Genome Duplication and the 

Evolution of Nodulation 

The property most striking about legumes is 

their capacity to form symbiotic nitrogen-fixing 

nodules in association with rhizobial bacteria. 

Not surprisingly, detailed analysis of legume 

genomes can provide valuable insights into 

symbiosis, nodulation, and nitrogen fixation. 

At the simplest level, genome sequence data 

make it possible to generate a global inventory 

of nodulation-related genes. This was an 

important contribution of the recent Gm sequence 

(91). Here, genes of interest were identified 

by searching for Gm genes orthologous to 

known nodulation-related genes in any legume 

species. As a result, 34 Gm nodulins (noduleupregulated 

proteins) were discovered along 

with 23 nodulation-related regulatory genes 

within the Gm genome. This kind of gene 

inventory makes it possible to explore local 

nodulation-related gene clusters, putative homoeologs, 

and membership in related gene 

families. This inventory should be especially 

valuable in dissecting the global regulatory machinery 

controlling plant-rhizobium communication 

and nodule development. 

Analysis of the Mt genome sequence 

focused on the relationship between genome 

duplication and the evolution of nodulation. 

298 Young·Bharti



Previous studies had established that legumes 

belong to a clade of rosids, Fabidae, that all 

share a predisposition to nodulate, presumably 

derived from their common ancestor (88). In 

analyzing the Mt genome, the question was 

whether the 58-Mya WGD contributed in any 

way to the elaboration of rhizobial nodulation. 

The answer appears to be a qualified yes. 

Multiple lines of evidence indicate that nodulation 

machinery predates the 58-Mya WGD. 

Moreover, many of the known regulatory 

steps in rhizobial nodulation are shared with 

mycorrhizal signaling (66), a symbiosis broadly 

shared among angiosperms (9). Just a few of 

the known recognition steps are exclusively 

associated with rhizobial nodulation, including 

the key receptor-like kinase, NFP (66). In 

analyzing the Mt genome, NFP was found to 

have a homoeolog, LYR1, and genome position 

and K s data indicate that these duplicated 

genes derive from the 58-Mya WGD. NFP is 

nodulation specific in expression and function, 

whereas LYR1 is upregulated in mycorrhizae 

(30). In separate work, a nodulating nonlegume, 

Parasponia andersonii, is known to contain a single 

gene coding for a protein with the functions 

of both NFP and LYR1 (68). Therefore, one 

likely interpretation would be that the 58-Mya 

papilionoid WGD led to subfunctionalization 

of a more ancient gene that previously carried 

out both functions, resulting in two descendent 

genes that split the nodulation and mycorrhizal 

recognition functions between them. A separate 

nodulation-related transcription factor, 

ERN1 (96), also possesses a homoeolog (ERN2) 

in Mt. Like NFP/LYR1, ERN1 and ERN2 have 

contrasting nodulation-versus-mycorrhizal 

expression patterns and also derive from the 

58-Mya WGD. Potentially, they are a second 

example of sub- or neofunctionalization 

resulting from the papilionoid WGD event. 

These observations even suggest a potential 

phylogenetic strategy for discovering genes 

that play a role in nodulation. It should be 

possible to mine the products of the 58-Mya 

WGD and search for genes that have nodulerelated 

expression in one or both gene products 

of the WGD event. At this point, one 

could examine potentially novel (or at least 

interesting) functions that these genes might be 

playing in nodulation. Indeed, this strategy has 

already been put into practice with the identification 

of a cytokinin response regulator promoting 

the expression of ERN1 (67). Analysis of 

the Mt genome uncovers 51 additional WGDderived 

homoeolog pairs with one or both duplicates 

upregulated in nodules, including 10 

additional transcription factor genes. 

PERSPECTIVES ON 

LEGUME GENOMICS 

It is difficult to believe that massive amounts of 

sequence data have been available in plants for 

such a short time. The pace of change has been 

so rapid that in less than a decade we have gone 

from having only thousands of ESTs in a few 

legume species to having three robust legume 

reference genomes. This review has examined 

ways in which the rapidly growing body of 

genome sequence data sheds light on legume 

biology. At the simplest level, translation of 

genome data between legume species enables 

important practical applications: the discovery 

of genetic markers, the development of linkage 

maps, and the saturation of genome regions 

for positional cloning. This is especially true 

for minor legumes, where many species are 

important to agriculture but supported by 

small research communities. At a more basic 

level, dissection of genome sequence data reveals 

the structure, architecture, and evolution 

of important gene families and enables the 

identification of orthologous versus paralogous 

relationships. Complete genome sequences 

also reveal legume- and species-specific genes 

whose functions remain largely unknown, 

although unquestionably important. Gene and 

genome duplications, so critical in shaping 

plant genomes, contain intrinsic information 

that can be exploited to predict function and 

the structure of genetic networks. Candidate 

gene discovery based on the papilionoid WGD 

is a promising example. In legumes, applying 

these strategies to nodulation and seed development 

will be especially critical. Additional 


sequencing and resequencing of legume species 

will make this possible, but inevitably, it is 

the research community’s capacity to develop 

imaginative strategies for exploiting massive 

sequence data that will move legume genomics 

from the computer to biology. 

SUMMARY POINTS 



1. The genome sequences of three legumes—Glycine max, Medicago truncatula, andLotus 

japonicus—have recently been completed, and they illustrate a history of whole-genome 

duplication with important implications in legume biology. Glycine, in particular, underwent 

a genome duplication event within the past 13 million years that is strikingly 

evident in its genome architecture. 

2. Most agriculturally important legume crops, including so-called orphan species, are phylogenetically 

close to Glycine, Medicago,andLotus. Consequently, translational genomics 

to orphaned legumes should be straightforward and practically useful. It also means 

that major clades of more distant legumes remain largely unexplored from a genomic 

perspective. 

3. Analysis of legume genome sequence reveals hundreds of family-specific genes not observed 

in other angiosperms. They include a large group of defensin-like peptide genes 

seen only in Medicago and its close relatives that are exclusively expressed in nodules and 

in some cases play important roles in rhizobial differentiation. 

4. The aftermath of genome duplication in legumes involves extensive gene fractionation, 

especially in the lineage leading to Medicago and Lotus, as well as apparent examples of 

sub- and neofunctionalization. In some cases, products of whole-genome duplication 

have contributed to the elaboration of a preexisting capacity for rhizobial nodulation. 

DISCLOSURE STATEMENT 

N.D.Y. is principal investigator of a National Science Foundation Plant Genome Research Program 

grant that supported the sequencing of M. truncatula and later the development of an 

M. truncatula HapMap platform. 

ACKNOWLEDGMENTS 

We thank Doug Cook, Rene Geurts, and R. Op den Camp for helpful discussions relating to 

unpublished work; Robert Stupar for his review of the manuscript; and Sebastian Proost and Yves 

Van der Peer for preliminary analyses involving the PLAZA platform. 

LITERATURE CITED 

1. Ahn S, Tanksley SD. 1993. Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. 

Sci. USA 90:7980–84 

2. Alkan C, Sajjadian S, Eichler EE. 2010. Limitations of next-generation genome sequence assembly. Nat. 

Methods 8:61–65 

3. Arabidopsis Genome Init. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis 

thaliana. Nature 408:796–815 

300 Young·Bharti



4. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, et al. 2010. Genome-wide association 

study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–31 

5. Bertioli DJ, Moretzsohn MC, Madsen LH, Sandal N, Leal-Bertioli SC, et al. 2009. An analysis 

of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and 

evolution of legume genomes. BMC Genomics 10:45 

6. Blanc G, Wolfe KH. 2004. Functional divergence of duplicated genes formed by polyploidy during 

Arabidopsis evolution. Plant Cell 16:1679–91 

7. Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions 

of duplicate genes. Plant Cell 16:1667–78 

8. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs 

using SSPACE. Bioinformatics 27:578–79 

9. Bonfante P, Genre A. 2008. Plants and arbuscular mycorrhizal fungi: an evolutionary-developmental 

perspective. Trends Plant Sci. 13:492–98 

10. Boutin SR, Young ND, Olson TC, Yu ZH, Vallejos CE, Shoemaker RC. 1995. Genome conservation 

among three legume genera detected with DNA markers. Genome 38:928–37 

11. Branca A, Paape T, Zhou P, Briskine R, Farmer AD, et al. 2011. Whole-genome nucleotide diversity, 

recombination, and linkage-disequilibrium in the model legume Medicago truncatula. Proc. Natl. Acad. 

Sci. USA 108:E864–70 

12. Cannon SB, Ilut D, Farmer AD, Maki SL, May GD, et al. 2010. Polyploidy did not predate the 

evolution of nodulation in all legumes. PLoS ONE 5:e11630 

13. Cannon SB, May GD, Jackson SA. 2009. Three sequenced legume genomes and many crop species: rich 

opportunities for translational genomics. Plant Physiol. 151:970–77 

14. Cannon SB, Mitra A, Baumgarten A, Young ND, May G. 2004. The roles of segmental and tandem 

gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4:10 

15. Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, et al. 2006. Legume evolution viewed through 

the Medicago truncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. USA 103:14959–64 

16. Choi HK, Mun JH, Kim DJ, Zhu H, Baek JM, et al. 2004. Estimating genome conservation between 

crop and model legume species. Proc. Natl. Acad. Sci. USA 101:15289–94 

17. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. 2007. Common sequence polymorphisms 

shaping genetic diversity in Arabidopsis thaliana. Science 317:338–42 

18. Córdoba JM, Chavarro C, Schlueter JA, Jackson SA, Blair MW. 2010. Integration of physical and genetic 

maps of common bean through BAC-derived microsatellite markers. BMC Genomics 11:436 

19. Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S, et al. 2008. Detection and validation of single 

feature polymorphisms in cowpea (Vigna unguiculata L. Walp) using a soybean genome array. BMC 

Genomics 9:107 

20. Devos KM, Gale MD. 2000. Genome relationships: the grass model in current research. Plant Cell 

12:637–46 

21. Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, et al. 2008. Evolutionary genetics of genome 

merger and doubling in plants. Annu. Rev. Genet. 42:443–61 

22. Fawcett JA, Maere S, Vandepeer Y. 2009. Plants with double genomes might have had a better chance 

to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106:5737–42 

23. Force A, Lynch M, Pickett FB, Amores A, Yan YL, et al. 1999. Preservation of duplicate genes by 

complementary, degenerative mutations. Genetics 151:1531–45 

24. Freeling M. 2009. Bias in plant gene content following different sorts of duplication: tandem, wholegenome, 

segmental, or by transposition. Annu. Rev. Plant Biol. 60:433–53 

25. Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, et al. 2010. Population genomic 

analysis of Tunisian Medicago truncatula reveals candidates for local adaptation. Plant J. 63:623– 

35 

26. Gale MD, Devos KM. 1998. Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95:1971–74 

27. Gao AG, Hakimi SM, Mittanck CA, Wu Y, Woerner BM, et al. 2000. Fungal pathogen protection in 

potato by expression of a plant defensin peptide. Nat. Biotechnol. 18:1307–131 

5. Demonstrates that 

papilionoid genome 

duplication is shared 

with distant Arachis, 

which shows extensive 

synteny with sequenced 

legumes. 

12. Shows that legume 

genome duplication 

apparently occurred 

only within the 

papilionoid lineage, and 

not within the 

Mimosoideae or 

Caesalpinioideae 

subfamilies. 

25. Utilizes a genome 

association mapping 

approach to 

characterize salt 

tolerance in a natural 

population. 




45. Describes 

next-generation 

sequencing of a wild 

soybean relative and 

extensive 

characterization of 

genome differences 

between species. 

28. Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, et al. 2011. Gene discovery and tissue-specific 

transcriptome analysis in chickpea with massively parallel pyrosequencing and Web resource development. 

Plant Physiol. 156:1661–78 

29. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, et al. 2011. High-quality draft assemblies 

of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108:1513–18 

30. Gomez SK, Javot H, Deewatthanawong PD, Torres-Jerez I, Tang Y, et al. 2009. Medicago truncatula and 

Glomus intraradices gene expression in cortical cells harboring arbuscules in the arbuscular mycorrhizal 

symbiosis. BMC Plant Biol. 9:10 

31. Graham MA, Silverstein KA, Cannon SB, VandenBosch KA. 2004. Computational identification and 

characterization of novel genes from legumes. Plant Physiol. 135:1179–97 

32. Graham PH, Vance CP. 2003. Legumes: importance and constraints to greater use. Plant Physiol. 

131:872–77 

33. Han Y, Kang Y, Torres-Jerez I, Cheung F, Town CD, et al. 2011. Genome-wide SNP discovery in 

tetraploid alfalfa using 454 sequencing and high resolution melting analysis. BMC Genomics 12:350 

34. Hiremath PJ, Farmer A, Cannon SB, Woodward J, Kudapa H, et al. 2011. Large-scale transcriptome 

analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and 

Africa. Plant Biotechnol. J. 9:922–31 

35. Hougaard BK, Madsen LH, Sandal N, de Carvalho Moretzsohn M, Fredslund J, et al. 2008. Legume 

anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and 

Arachis. Genetics 179:2299–312 

36. Huang X, Wei X, Sang T, Zhao Q, Feng Q, et al. 2010. Genome-wide association studies of 14 agronomic 

traits in rice landraces. Nat. Genet. 42:961–67 

37. Huang Z-W, Zhao T-J, Yu D-Y, Chen S-Y, Gai J-Y. 2008. Correlation and QTL mapping of biomass 

accumulation, apparent harvest index, and yield in soybean. Acta Agron. Sin. 34:944–51 

38. Imelfort M, Edwards D. 2009. De novo sequencing of plant genomes using second-generation technologies. 

Brief. Bioinforma. 10:609–18 

39. Innes RW, Ameline-Torregrosa C, Ashfield T, Cannon E, Cannon SB, et al. 2008. Differential accumulation 

of retroelements and diversification of NB-LRR disease resistance genes in duplicated regions 

following polyploidy in the ancestor of soybean. Plant Physiol. 148:1740–59 

40. Int. Rice Genome Seq. Proj. 2005. The map-based sequence of the rice genome. Nature 436:793–800 

41. Jaillon O, Aury JM, Nöel B, Policriti A, Clepet C, et al. 2007. The grapevine genome sequence suggests 

ancestral hexaploidization in major angiosperm phyla. Nature 449:463–67 

42. Kaló P, Seres A, Taylor SA, Jakab J, Kevei Z, et al. 2004. Comparative mapping between Medicago sativa 

and Pisum sativum. Mol. Genet. Genomics 272:235–46 

43. Kamphuis LG, Williams AH, D’Souza NK, Pfaff T, Ellwood SR, et al. 2007. The Medicago truncatula 

reference accession A17 has an aberrant chromosomal configuration. New Phytol. 174:299–303 

44. Kim KD, Shin JH, Van K, Kim DH, Lee SH. 2009. Dynamic rearrangements determine genome 

organization and useful traits in soybean. Plant Physiol. 151:1066–76 

45. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, et al. 2010. Whole-genome sequencing and 

intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc. 

Natl. Acad. Sci. USA 107:22032–37 

46. Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, et al. 2007. Recombination and linkage disequilibrium 

in Arabidopsis thaliana. Nat. Genet. 39:1151–55 

47. Kinzig AP, Socolow RH. 1994. Human impacts on the nitrogen cycle. Phys. Today 47:24–35 

48. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, et al. 2009. Circos: an information aesthetic 

for comparative genomics. Genome Res. 19:1639–45 

49. Kulikova O, Gualtieri G, Geurts R, Kim DJ, Cook D, et al. 2001. Integration of the FISH pachytene 

and genetic maps of Medicago truncatula. Plant J. 27:49–58 

50. Lam H-M, Xu X, Lui X, Chen W, Yang G, et al. 2010. Resequencing of 31 wild and cultivated soybean 

genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42:1053–59 

51. Langham RJ, Walsh J, Dunn M, Ko C, Goff SA, et al. 2004. Genomic duplication, fractionation and the 

origin of regulatory novelty. Genetics 166:935–45 

302 Young·Bharti



52. Lavin M, Herendeen PS, Wojciechowski MF. 2005. Evolutionary rates analysis of Leguminosae implicates 

a rapid diversification of lineages during the tertiary. Syst. Biol. 54:575–94 

53. Li H, Liu H, Han Y, Wu X, Teng W, et al. 2010. Identification of QTL underlying vitamin E contents 

in soybean seed among multiple environments. Theor. Appl. Genet. 120:1405–13 

54. Lin JY, Stupar RM, Hans C, Hyten DL, Jackson SA. 2010. Structural and functional divergence of a 

1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region 

from Phaseolus vulgaris. Plant Cell 22:2545–61 

55. Lynch M, O’Hely M, Walsh B, Force A. 2001. The probability of preservation of a newly arisen gene 

duplicate. Genetics 159:1789–804 

56. McClean PE, Mamidi S, McConnell M, Chikara S, Lee R. 2010. Synteny mapping between common 

bean and soybean reveals extensive blocks of shared loci. BMC Genomics 11:184 

57. Metzker ML. 2009. Sequencing technologies—the next generation. Nat. Rev. Genet. 11:31–46 

58. Meyers BC, Kaushik S, Nandety RS. 2005. Evolving disease resistance genes. Curr. Opin. Plant Biol. 

8:129–134 

59. Miller JR, Koren S, Sutton G. 2010. Assembly algorithms for next-generation sequencing data. Genomics 

95:315–27 

60. Moore G, Devos KM, Wang Z, Gale MD. 1995. Grasses, line up and form a circle. Curr. Biol. 5:737–39 

61. Muchero W, Diop NN, Bhat PR, Fenton RD, Wanamaker S, et al. 2009. A consensus genetic map of 

cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proc. Natl. Acad. Sci. 

USA 106:18159–64 

62. Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, et al. 2005. Highly syntenic regions in the genomes 

of soybean, Medicago truncatula, andArabidopsis thaliana. BMC Plant Biol. 5:15 

63. Munroe DJ, Harris TJ. 2010. Third-generation sequencing fireworks at Marco Island. Nat. Biotechnol. 

28:426–28 

64. Nordborg M, Weigel D. 2008. Next-generation genetics in plants. Nature 456:720–23 

65. Nayak SN, Zhu H, Varghese N, Datta S, Choi HK, et al. 2010. Integration of novel SSR and gene-based 

SNP marker loci in the chickpea genetic map and establishment of new anchor points with Medicago 

truncatula genome. Theor. Appl. Genet. 120:1415–41 

66. Oldroyd GE, Downie JA. 2008. Coordinating nodule morphogenesis with rhizobial infection in legumes. 

Annu. Rev. Plant Biol. 59:519–46 

67. Op den Camp RHM, De Mita S, Lillo A, Cao Q, Limpens E, et al. 2011. A phylogenetic strategy based 

on a legume-specific whole genome duplication yields symbiotic cytokinin type-A response regulators. 

Plant Physiol. 157:2013–22 

68. Op den Camp RHM, Streng A, De Mita S, Cao Q, Polone E, et al. 2011. LysM-type mycorrhizal 

receptor recruited for rhizobium symbiosis in nonlegume Parasponia. Science 331:909–12 

69. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, et al. 2008. Sequencing of natural 

strains of Arabidopsis thaliana with short reads. Genome Res. 18:2024–33 

70. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, et al. 2009. The Sorghum bicolor 

genome and the diversification of grasses. Nature 457:551–56 

71. Paterson AH, Chapman BA, Kissinger JC, Bowers JE, et al. 2006. Many gene and domain families 

have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, 

Saccharomyces and Tetraodon. Trends Genet. 22:597–602 

72. Paterson AH, Freeling M, Tang H, Wang X. 2010. Insights from the comparison of plant genome 

sequences. Annu. Rev. Plant Biol. 61:349–72 

73. Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ. 2005. Placing paleopolyploidy in relation to taxon 

divergence: a phylogenetic analysis in legumes using 39 gene families. Syst. Biol. 54:441–54 

74. Polhill RM. 1981. Papilionoideae. In Advances in Legume Systematics, Part 1, ed. RM Polhill, PH Raven, 

pp. 191–208. Kew, UK: R. Bot. Gard. 

75. Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, et al. 2009. PLAZA: a comparative genomics 

resource to study gene and genome evolution in plants. Plant Cell 21:3718–31 

76. Ratnaparkhe MB, Wang X, Li J, Compton RO, Rainville LK, et al. 2011. Comparative analysis of peanut 

NBS-LRR gene clusters suggests evolutionary innovation among duplicated domains and erosion of gene 

microsynteny. New Phytol. 192:164–78 




79. Provides the initial 

report of the Lotus 

japonicus genome 

sequence. 


report of the Glycine 

max genome sequence. 

87. Gives an overview of 

an alternative legume, 

Chamaecrista, found 

within one of the clades 

not generally targeted 

for genomic analysis. 


report of the Medicago 

truncatula genome 

sequence. 

77. Rausch T, Koren S, Denisov G, Weese D, Emde AK, et al. 2009. A consistency-based consensus algorithm 

for de novo and reference-guided sequence assembly of short reads. Bioinformatics 25:1118–24 

78. Sato S, Isobe S, Tabata S. 2010. Structural analyses of the genomes in legumes. Curr. Opin. Plant Biol. 

13:1–7 

79. Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, et al. 2008. Genome structure of the legume, 

Lotus japonicus. DNA Res. 15:1–8 

80. Schlueter JA, Dixon P, Granger C, Grant D, Clark L, et al. 2004. Mining EST databases to resolve 

evolutionary events in major crop species. Genome 47:868–76 

81. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, et al. 2010. Genome sequence of the 

palaeopolyploid soybean. Nature 463:178–83 

82. Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, et al. 2011. Reference-guided assembly of four 

diverse Arabidopsis thaliana genomes. Proc. Natl. Acad. Sci. USA 108:10249–54 

83. Shin JH, Van K, Kim DH, Kim KD, Jang YE, et al. 2008. The lipoxygenase gene family: a genomic 

fossil of shared polyploidy between Glycine max and Medicago truncatula. BMC Plant Biol. 8:133 

84. Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, et al. 1996. Genome duplication in soybean 

(Glycine subgenus soja). Genetics 144:329–38 

85. Shoemaker RC, Schlueter J, Doyle JJ. 2006. Paleopolyploidy and gene duplication in soybean and other 

legumes. Curr. Opin. Plant Biol. 9:104–9 

86. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, et al. 2009. ABySS: a parallel assembler for 

short read sequence data. Genome Res. 19:1117–23 

87. Singer SR, Maki SL, Farmer AD, Ilut D, May GD, et al. 2009. Venturing beyond beans and peas: 

What can we learn from Chamaecrista? Plant Physiol. 151:1041–47 

88. Soltis DE, Soltis PS, Morgan DR, Swensen SM, Mullin BC, et al. 1995. Chloroplast gene sequence data 

suggest a single origin of the predisposition for symbiotic nitrogen fixation in angiosperms. Proc. Natl. 

Acad. Sci. USA 92:2647–51 

89. Sprent JI. 2008. 60 Ma of legume nodulation: What’s new? What’s changing? J. Exp. Bot. 59:1081–84 

90. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, et al. 2011. Genome-wide association study of leaf 

architecture in the maize nested association mapping population. Nat. Genet. 43:159–62 

91. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, et al. 2006. The genome of black cottonwood, 

Populus trichocarpa (Torr. & Gray). Science 313:1596–604 

92. Van de Velde W, Zehirov G, Szatmari A, Debreczeny M, Ishihara H, et al. 2010. Plant peptides govern 

terminal differentiation of bacteria in symbiosis. Science 327:1122–26 

93. van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, et al. 2011. Sequence-based physical 

mapping of complex genomes by whole genome profiling. Genome Res. 21:618–25 

94. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, et al. 2012. Draft genome sequence of pigeonpea 

(Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 30:83–89 

95. Varshney RK, Close TJ, Singh NK, Hoisington DA, Cook DR. 2009. Orphan legume crops enter the 

genomics era! Curr. Opin. Plant Biol. 12:202–10 

96. Vernié T, Moreau S, de Billy F, Plet J, Combier JP, et al. 2008. EFD is an ERF transcription factor 

involved in the control of nodule number and differentiation in Medicago truncatula. Plant Cell 20:2696– 

713 

97. Wojciechowski MF, Sanderson MJ, Steele KP, Liston A. 2000. Molecular phylogeny of the “temperate 

herbaceous tribes” of papilionoid legumes: a supertree approach. In Advances in Legume Systematics, Part 

9, ed. PS Herendeen, A Bruneau, pp. 277–98. Kew, UK: R. Bot. Gard. 

98. Yang S, Feng Z, Zhang X, Jiang K, Jin X, et al. 2006. Genome-wide investigation on the genetic variations 

of rice disease resistance genes. Plant Mol. Biol. 62:181–83 

99. Yang S, Gao M, Xu C, Gao J, Deshpande S, et al. 2008. Alfalfa benefits from Medicago truncatula: the 

RCT1 gene from M. truncatula confers broad-spectrum resistance to anthracnose in alfalfa. Proc. Natl. 

Acad. Sci. USA 105:12164–69 

100. Young N, Debellé F, Oldroyd G, Geurts R, Cannon SB, et al. 2011. The Medicago genome 

provides insight into the evolution of rhizobial symbioses. Nature 480:520–24 

101. Young ND, Udvardi M. 2009. Translating Medicago truncatula genomics to crop legumes. Curr. Opin. 

Plant Biol. 12:193–201 

304 Young·Bharti



102. Zhang M, Wu YH, Lee MK, Liu YH, Rong Y, et al. 2010. Numbers of genes in the NBS and RLK 

families vary by more than four-fold within a plant species and are regulated by multiple factors. Nucleic 

Acids Res. 38:6513–25 

103. Zhang XC, Wu X, Findley S, Wan J, Libault M, et al. 2007. Molecular evolution of lysin motif-type 

receptor-like kinases in plants. Plant Physiol. 144:623–36 

104. Zhou S, Bechner MC, Place M, Churas CP, Pape L, et al. 2007. Validation of rice genome sequences 

by optical mapping. BMC Genomics 15:278 


Contents 

Annual Review of 

Plant Biology 

Volume 63, 2012 



There Ought to Be an Equation for That 

Joseph A. Berry ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣1 

Photorespiration and the Evolution of C 4 Photosynthesis 

Rowan F. Sage, Tammy L. Sage, and Ferit Kocacinar ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣19 

The Evolution of Flavin-Binding Photoreceptors: An Ancient 

Chromophore Serving Trendy Blue-Light Sensors 

Aba Losi and Wolfgang Gärtner ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣49 

The Shikimate Pathway and Aromatic Amino Acid Biosynthesis 

in Plants 

Hiroshi Maeda and Natalia Dudareva ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣73 

Regulation of Seed Germination and Seedling Growth by Chemical 

Signals from Burning Vegetation 

David C. Nelson, Gavin R. Flematti, Emilio L. Ghisalberti, Kingsley W. Dixon, 

and Steven M. Smith ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣107 

Iron Uptake, Translocation, and Regulation in Higher Plants 

Takanori Kobayashi and Naoko K. Nishizawa ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣131 

Plant Nitrogen Assimilation and Use Efficiency 

Guohua Xu, Xiaorong Fan, and Anthony J. Miller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣153 

Vacuolar Transporters in Their Physiological Context 

Enrico Martinoia, Stefan Meyer, Alexis De Angeli, and Réka Nagy ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣183 

Autophagy: Pathways for Self-Eating in Plant Cells 

Yimo Liu and Diane C. Bassham ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣215 

Plasmodesmata Paradigm Shift: Regulation from Without 

Versus Within 

Tessa M. Burch-Smith and Patricia C. Zambryski ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣239 

Small Molecules Present Large Opportunities in Plant Biology 

Glenn R. Hicks and Natasha V. Raikhel ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣261 

Genome-Enabled Insights into Legume Biology 

Nevin D. Young and Arvind K. Bharti ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣283 

v



Synthetic Chromosome Platforms in Plants 

Robert T. Gaeta, Rick E. Masonbrink, Lakshminarasimhan Krishnaswamy, 

Changzeng Zhao, and James A. Birchler ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣307 

Epigenetic Mechanisms Underlying Genomic Imprinting in Plants 

Claudia Köhler, Philip Wolff, and Charles Spillane ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣331 

Cytokinin Signaling Networks 

Ildoo Hwang, Jen Sheen, and Bruno Müller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣353 

Growth Control and Cell Wall Signaling in Plants 

Sebastian Wolf, Kian Hématy, and Herman Höfte ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣381 

Phosphoinositide Signaling 

Wendy F. Boss and Yang Ju Im ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣409 

Plant Defense Against Herbivores: Chemical Aspects 

Axel Mithöfer and Wilhelm Boland ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣431 

Plant Innate Immunity: Perception of Conserved Microbial Signatures 

Benjamin Schwessinger and Pamela C. Ronald ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣451 

Early Embryogenesis in Flowering Plants: Setting Up 

the Basic Body Pattern 

Steffen Lau, Daniel Slane, Ole Herud, Jixiang Kong, and Gerd Jürgens 

♣♣♣♣♣♣♣♣♣♣♣♣♣♣483 

Seed Germination and Vigor 

Loïc Rajjou, Manuel Duval, Karine Gallardo, Julie Catusse, Julia Bally, 

Claudette Job, and Dominique Job ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣507 

A New Development: Evolving Concepts in Leaf Ontogeny 

Brad T. Townsley and Neelima R. Sinha ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣535 

Control of Arabidopsis Root Development 

Jalean J. Petricka, Cara M. Winter, and Philip N. Benfey ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣563 

Mechanisms of Stomatal Development 

Lynn Jo Pillitteri and Keiko U. Torii ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣591 

Plant Stem Cell Niches 

Ernst Aichinger, Noortje Kornet, Thomas Friedrich, and Thomas Laux ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣615 

The Effects of Tropospheric Ozone on Net Primary Productivity 

and Implications for Climate Change 

Elizabeth A. Ainsworth, Craig R. Yendrek, Stephen Sitch, William J. Collins, 

and Lisa D. Emberson ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣637 

Quantitative Imaging with Fluorescent Biosensors 

Sakiko Okumoto, Alexander Jones, and Wolf B. Frommer 

♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣663 

vi 

Contents

Genome-Enabled Insights into Legume Biology - University of ...

Create successful ePaper yourself

Delete template?

Save as template?