02.05.2015 Views

Genome-Enabled Insights into Legume Biology - University of ...

Genome-Enabled Insights into Legume Biology - University of ...

Genome-Enabled Insights into Legume Biology - University of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Annu. Rev. Plant Biol. 2012. 63:283–305<br />

First published online as a Review in Advance on<br />

January 30, 2012<br />

The Annual Review <strong>of</strong> Plant <strong>Biology</strong> is online at<br />

plant.annualreviews.org<br />

This article’s doi:<br />

10.1146/annurev-arplant-042110-103754<br />

Copyright c○ 2012 by Annual Reviews.<br />

All rights reserved<br />

1543-5008/12/0602-0283$20.00<br />

∗ Corresponding author.<br />

<strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong><br />

<strong>into</strong> <strong>Legume</strong> <strong>Biology</strong><br />

Nevin D. Young 1,∗ and Arvind K. Bharti 2<br />

1 Department <strong>of</strong> Plant Pathology and Department <strong>of</strong> Plant <strong>Biology</strong>, <strong>University</strong> <strong>of</strong><br />

Minnesota, St. Paul, Minnesota 55108; email: neviny@umn.edu<br />

2 National Center for <strong>Genome</strong> Resources, Santa Fe, New Mexico 87505;<br />

email: akb@ncgr.org<br />

Keywords<br />

comparative genomics, genome duplication, microsynteny,<br />

nodulation, symbiosis<br />

Abstract<br />

<strong>Legume</strong>s are the third-largest family <strong>of</strong> angiosperms, the secondmost-important<br />

crop family, and a key source <strong>of</strong> biological nitrogen in<br />

agriculture. Recently, the genome sequences <strong>of</strong> Glycine max (soybean),<br />

Medicago truncatula, andLotus japonicus were substantially completed.<br />

Comparisons among legume genomes reveal a key role for duplication,<br />

especially a whole-genome duplication event approximately 58 Mya<br />

that is shared by most agriculturally important legumes. A second<br />

and more recent genome duplication occurred only in the lineage<br />

leading to soybean. Outcomes <strong>of</strong> genome duplication, including gene<br />

fractionation and sub- and ne<strong>of</strong>unctionalization, have played key roles<br />

in shaping legume genomes and in the evolution <strong>of</strong> legume-specific<br />

traits. Analysis <strong>of</strong> legume genome sequences also enables the discovery<br />

<strong>of</strong> legume-specific gene families and provides a framework<br />

for genome-wide association mapping that will target phenotypes <strong>of</strong><br />

special importance in legumes. Translating genomic resources from<br />

sequenced species to less studied but still important “orphan” legumes<br />

will enhance prospects for world food production.<br />

283


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Contents<br />

INTRODUCTION.................. 284<br />

SEQUENCING LEGUME<br />

GENOMES....................... 284<br />

Reference <strong>Legume</strong> <strong>Genome</strong>s . . . . . . . 284<br />

What Can We Learn from<br />

Sequenced <strong>Legume</strong> <strong>Genome</strong>s? . . 286<br />

Sequencing in Nonreference<br />

<strong>Legume</strong>s....................... 287<br />

From <strong>Genome</strong> Sequencing<br />

toResequencing................ 288<br />

COMPARATIVE GENOMICS AND<br />

THE SEARCH FOR THE<br />

PRIMORDIAL LEGUME<br />

GENOME........................ 289<br />

Strategies for Comparative<br />

Genomic Analysis . . . . . . . . . . . . . . . 289<br />

Comparing <strong>Legume</strong> <strong>Genome</strong>s . . . . . . 290<br />

Envisioning the Ancestral<br />

<strong>Legume</strong><strong>Genome</strong>............... 294<br />

GENOME DUPLICATIONS<br />

IN LEGUME BIOLOGY . . . . . . . . . 294<br />

Whole-<strong>Genome</strong> Duplication Events<br />

in the History <strong>of</strong> <strong>Legume</strong>s. . . . . . . 294<br />

THE AFTERMATH OF GENOME<br />

DUPLICATION AND ITS<br />

IMPACT ON LEGUME<br />

BIOLOGY........................ 296<br />

The Fates <strong>of</strong> Duplicated Genes . . . . . 296<br />

Impacts <strong>of</strong> <strong>Genome</strong> Duplication<br />

on <strong>Legume</strong> <strong>Biology</strong> . . . . . . . . . . . . . 297<br />

<strong>Genome</strong> Duplication and the<br />

Evolution <strong>of</strong> Nodulation . . . . . . . . 298<br />

PERSPECTIVES ON LEGUME<br />

GENOMICS...................... 299<br />

INTRODUCTION<br />

<strong>Legume</strong>s (Fabaceae or Leguminosae) are the<br />

third-largest family <strong>of</strong> flowering plants and<br />

the second-most-important plant family in<br />

agriculture. They are especially interesting because<br />

most have the capacity to fix atmospheric<br />

nitrogen through mutualistic interactions with<br />

rhizobial soil bacteria, a trait that is both<br />

ecologically and agriculturally important (32).<br />

Indeed, without the nitrogen fixed each year<br />

by legumes, humans would need to consume<br />

288 billion kg <strong>of</strong> additional fuel in the Haber-<br />

Bosch process to generate anhydrous ammonia<br />

for agriculture (47). Given their importance to<br />

people, legumes are now the target <strong>of</strong> extensive<br />

sequence-based genomics research, which is<br />

revolutionizing our understanding <strong>of</strong> legume<br />

evolution and its connection to biologically<br />

important traits. Of particular significance are<br />

the recently completed and annotated genomes<br />

<strong>of</strong> three legume species—Glycine max (soybean)<br />

(Gm) (81), Medicago truncatula (Mt) (100), and<br />

Lotus japonicus (Lj) (79). This review focuses on<br />

genomics research carried out in legume biology,<br />

emphasizing comparisons among legume<br />

genomes and the critical role <strong>of</strong> genome duplication<br />

and its aftermath in shaping present-day<br />

legume genomes and traits.<br />

With the recent publication <strong>of</strong> three legume<br />

genome sequences—and, very recently, a<br />

fourth (76)—and the rapid development <strong>of</strong> genomics<br />

tools for multiple legume species, there<br />

are already several excellent scientific reviews<br />

available to researchers. These reviews have<br />

emphasized the structural analyses <strong>of</strong> legume<br />

genomes (13, 78), translational opportunities<br />

provided by reference genome sequences (101),<br />

and the prospects for extending genome sequence<br />

data to less studied “orphan” legume<br />

species (13, 95). Therefore, we endeavor here<br />

to complement and expand the scope <strong>of</strong> these<br />

existing reviews with our focus on genome evolution<br />

and genome duplication, and on their<br />

impact on legume biology.<br />

SEQUENCING LEGUME<br />

GENOMES<br />

Reference <strong>Legume</strong> <strong>Genome</strong>s<br />

The genome sequences <strong>of</strong> Gm, Mt, and Lj<br />

form the foundation for much <strong>of</strong> our current<br />

understanding about legume genomics. All<br />

three species are members <strong>of</strong> Papilionoideae,<br />

a subfamily that diverged from the two<br />

other legume subfamilies (Mimosoideae and<br />

284 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Caesalpinoideae) approximately 60 Mya (52).<br />

Most cultivated legumes are found within<br />

two sister clades <strong>of</strong> the papilionoids: the<br />

millettoid/phaseoloid clade [warm-season<br />

legumes, including Gm, pigeon pea (Cajanus<br />

cajan), common bean (Phaseolus vulgaris),<br />

mung bean (Vigna radiata), and cowpea (Vigna<br />

unguiculata)] and the temperate galegoid<br />

clade [cool-season legumes, including Mt, Lj,<br />

and species such as alfalfa (Medicago sativa),<br />

chickpea (Cicer arietinum), clover (Trifolium<br />

sp.), lentil (Lens sp.), and garden pea (Pisum<br />

sativum)]. Papilionoideae also includes two<br />

minor clades: the genistoid (lupin, Lupinus sp.)<br />

and the dalbergioid (peanut, Arachis hypogaea).<br />

Because all these species are reasonably close<br />

phylogenetically, insights from the Gm, Mt,<br />

and Lj genomes should be highly relevant<br />

when transferred among cultivated legume<br />

crops. However, the current emphasis on<br />

papilionoids also means that many interesting<br />

legume species—especially mimosoids (Mimosa,<br />

Acacia, Prosopis, and Chamaecrista, for<br />

example) and caesalpinioids (Caesalpinia, Senna,<br />

and tamarind, for example)—are quite distant<br />

evolutionarily from the nexus <strong>of</strong> genomics research.<br />

Researchers have noted this previously<br />

and highlighted the importance <strong>of</strong> developing<br />

genomics resources in additional nodes<br />

throughout the legume evolutionary tree (87).<br />

The Gm genome sequence, published<br />

in 2010, is currently the most thoroughly<br />

characterized legume genome (81). More than<br />

950 million base pairs (Mb) <strong>of</strong> the overall<br />

1,115-Mb genome were completed through<br />

the use <strong>of</strong> 8x Sanger whole-genome shotgun<br />

(WGS) sequencing. Many <strong>of</strong> the resulting<br />

pseudomolecules extend all the way from centromeres<br />

(as indicated by scaffolds extending<br />

<strong>into</strong> centromeric repeats) out to telomeres<br />

(with scaffolds extending <strong>into</strong> telomeric repeats).<br />

The Gm sequence is also impressive for<br />

the very large size <strong>of</strong> the resulting sequence<br />

scaffolds. These are the physically defined<br />

assemblies <strong>of</strong> sequence contigs that are built<br />

<strong>into</strong> Gm’s 20 chromosome pseudomolecules.<br />

In Gm assembly Glyma 1.0, the so-called L50<br />

(a common metric to describe scaffold size<br />

that is calculated by summing the lengths <strong>of</strong><br />

all scaffolds from longest to shortest and then<br />

finding the scaffold size where you reach 50%<br />

<strong>of</strong> the overall length) is 47.8 Mb. By comparison,<br />

nearly all other published WGS plant<br />

genome sequences have notably shorter L50s<br />

[with the notable exception <strong>of</strong> Sorghum bicolor<br />

(70), another very high-quality assembly].<br />

It is especially noteworthy that nearly all <strong>of</strong><br />

the published Gm sequence (98%) could be<br />

anchored to specific chromosomal positions.<br />

The Mt genome was sequenced by a combination<br />

<strong>of</strong> Sanger-based bacterial artificial chromosome<br />

(BAC) clones (with genomic inserts<br />

approximately 80–120 kb in length) and ∼40x<br />

Illumina WGS (100). In this case, the sequencing<br />

effort was focused on euchromatic arms<br />

outside centromeric regions through the use<br />

<strong>of</strong> fluorescence in situ hybridization (FISH)<br />

(49) and optical mapping (104) to define physical<br />

location. Altogether, 367 Mb <strong>of</strong> the approximately<br />

470-Mb Mt genome (http://data.<br />

kew.org/cvalues) is included in the published<br />

assembly. Because <strong>of</strong> the emphasis on<br />

BAC-based sequencing, the quality in BACsequenced<br />

regions is quite high, although scaffolds<br />

tend to be relatively short (overall L50<br />

<strong>of</strong> 1.27 Mb) and only the BAC-based portion<br />

<strong>of</strong> the sequence (245 Mb, or 67%) could be<br />

anchored to specific chromosomal locations.<br />

Another 17 Mb <strong>of</strong> BAC-based sequence could<br />

not be anchored. The remaining portion <strong>of</strong><br />

the Mt sequence consists <strong>of</strong> Illumina WGS<br />

(104 Mb), with the Illumina contigs being quite<br />

short (L50 <strong>of</strong> 2.4 kb, largest 31 kb) and primarily<br />

useful as a way to recover missing portions <strong>of</strong><br />

the genome for gene discovery. Still, Mt chromosome<br />

5 is noteworthy in being a nearly intact<br />

BAC-based pseudomolecule that is complete on<br />

either side <strong>of</strong> the centromere. Throughout the<br />

entire pseudomolecule <strong>of</strong> Mt chromosome 5,<br />

there are just four sequence gaps, which is comparable<br />

in quality to the Arabidopsis thaliana (3)<br />

or Oryza sativa (40) genomes. One surprising<br />

result <strong>of</strong> the Mt sequencing project was the discovery<br />

<strong>of</strong> a large chromosomal translocation in<br />

the accession used as a template for sequencing<br />

( Jemalong-A17) compared with other Mt<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 285


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

accessions. This had been suggested in previous<br />

genetic experiments that found biased segregation<br />

ratios involving crosses with A17 (43), but<br />

the sequencing project was able to pinpoint two<br />

breakpoints on chromosomes 4 and 8 to regions<br />

roughly the size <strong>of</strong> BAC clones.<br />

The Lj genome was published in 2008 (79)<br />

and was actually the first legume genome to<br />

appear, though it is still the most incomplete.<br />

As in Mt, the strategy was to focus on gene-rich<br />

portions <strong>of</strong> the genome through the sequencing<br />

<strong>of</strong> large insert clones (in this case, so-called<br />

transformation-competent artificial chromosomes).<br />

The published Lj genome sequence is<br />

315 Mb in length, corresponding to 67% <strong>of</strong><br />

the Lj genome (472 Mb), but only 130 Mb is<br />

high quality and anchored to chromosomes. A<br />

more recent version <strong>of</strong> the Lj genome sequence<br />

is now available through the Web site <strong>of</strong><br />

the lead sequencing group in Kazuza, Japan<br />

(ftp://ftp.kazusa.or.jp/pub/lotus/lotus_r2.5/<br />

pseudomolecule), and it provides a much<br />

more robust platform for Lj genomics. This<br />

updated version (Lj 2.5) contains anchored<br />

pseudomolecules 268 Mb in length throughout<br />

the euchromatic portion <strong>of</strong> Lj plus 33 Mb <strong>of</strong><br />

sequence as yet unanchored.<br />

What Can We Learn from Sequenced<br />

<strong>Legume</strong> <strong>Genome</strong>s?<br />

What have we learned about legume genomes<br />

from this first generation <strong>of</strong> sequencing<br />

projects? In the broadest sense, sequenced<br />

legume genomes look very much like those<br />

<strong>of</strong> other dicots, though comparisons with<br />

Arabidopsis can be complicated by its unusually<br />

small genome size and complex duplication<br />

history (3). A closer look at the Gm genome<br />

finds that ∼57% <strong>of</strong> the overall sequence<br />

is found in repeat-rich, low-recombination<br />

heterochromatin, while most genes (78%) are<br />

found in euchromatic chromosome arms (81).<br />

Of course, this also implies that substantial<br />

numbers <strong>of</strong> Gm genes (22%) lie within the<br />

pericentromeric heterochromatin, a somewhat<br />

surprising and potentially important result. As<br />

expected, crossovers are pr<strong>of</strong>oundly reduced<br />

near centromeres, with the ratio <strong>of</strong> genetic<br />

to physical distance dropping by 27-fold<br />

between the euchromatic and pericentromeric<br />

portions <strong>of</strong> the genome. <strong>Genome</strong> organization<br />

in Mt seems largely comparable, though the<br />

evidence for this is based on a combination <strong>of</strong><br />

the BAC-based euchromatin sequence, FISH<br />

microscopy, and optical mapping (100). Notably,<br />

the estimated proportion <strong>of</strong> the genome<br />

located in pericentromeres is much lower in<br />

Mt compared with Gm (∼22% versus ∼57%),<br />

something that presumably plays a role in the<br />

difference in genome size. In both Gm and<br />

Mt, gene density is generally high throughout<br />

euchromatic arms, with only limited indications<br />

<strong>of</strong> a gene density gradient rising from<br />

centromere to telomere. In Mt, for example,<br />

the gene density is estimated at 16.9 per 100 kb<br />

(1 gene every 5.9 kb) throughout the euchromatin,<br />

with the average gene being 2,211 bp in<br />

length and containing four introns. By way <strong>of</strong><br />

comparison, Mt values are similar to those in<br />

Arabidopsis (2,174 bp) and Oryza (3,403 bp).<br />

Altogether, the Gm genome is reported to<br />

have 46,430 “high-confidence” protein-coding<br />

loci, which represents a culled set <strong>of</strong> gene models<br />

from an original set that included ∼20,000<br />

predicted with lower confidence (81). In Mt,<br />

a total <strong>of</strong> 62,152 genes were annotated, a value<br />

that drops to 47,845 when retaining only those<br />

genes with experimental or database support.<br />

The similarity in gene counts between the two<br />

systems is surprising and significant, because<br />

the lineage leading to present-day soybean is<br />

known to have undergone a whole-genome<br />

duplication (WGD) at 13 Mya or later, a<br />

duplication that is absent in the Mt lineage<br />

(there is much more about this important<br />

evolutionary event below). Thus, one might<br />

have expected higher gene numbers in Gm<br />

compared with Mt. TheGm genome is also<br />

reported to have 313,125 retrotransposons and<br />

294,937 DNA transposons (spanning 403 Mb<br />

and 157 Mb, respectively), whereas the Mt<br />

genome has 253,048 retrotransposons and<br />

34,529 DNA transposons (spanning 88 Mb<br />

286 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

and 9.4 Mb, respectively). The lower numbers<br />

in Mt presumably reflect the lower amount <strong>of</strong><br />

pericentromeric sequencing (also supported<br />

by the tw<strong>of</strong>old difference in genome size), but<br />

may also indicate real genomic differences<br />

between the two species.<br />

Detailed examination <strong>of</strong> the genome sequences<br />

also provides insights <strong>into</strong> interesting<br />

or unusual gene families. The Gm genome<br />

is reported to have 283 legume-specific gene<br />

families (81), an estimate that increases to<br />

670 with the analysis <strong>of</strong> the more recent Mt<br />

genome sequence (100). Both Gm and Mt contain<br />

higher numbers <strong>of</strong> nucleotide-binding-site<br />

leucine-rich repeats (NBS-LRRs, also called<br />

NB-ARCs—i.e., nucleotide-binding adaptors<br />

shared by APAF-1, R proteins, and CED-4)<br />

containing disease-resistance genes than other<br />

plant genomes sequenced to date. In Mt, for example,<br />

there are 764 NBS-LRR-related genes,<br />

with at least 550 expressed based on RNA-Seq<br />

(100). Outside <strong>of</strong> legumes, O. sativa is reported<br />

to have the largest number so far (519) (98).<br />

More than 90% <strong>of</strong> Mt NBS-LRRs reside in<br />

clusters that contain on average 7.4 members,<br />

including two megaclusters—one on Mt06 with<br />

30 NBS-LRRs and another on Mt03 with 21.<br />

However, the conclusion that NBS-LRRs are<br />

overrepresented in legumes (or indeed in any<br />

plant family) needs to be tempered by the<br />

recent observation that there is considerable<br />

variation in NBS-LRR number between different<br />

accessions within a single species, including<br />

Gm (102). <strong>Legume</strong>s have higher numbers<br />

and increased complexity in other gene families:<br />

lipoxygenases (83), LysM receptor kinases<br />

(103), and flavonoid biosynthetic enzymes, such<br />

as chalcone synthase (100). It may be important<br />

that LysM receptors and flavonoids are both<br />

known to play important roles in nodulation.<br />

Finally, all three sequenced legumes contain<br />

unusually high numbers <strong>of</strong> F-box domain genes<br />

compared with other plant species, with Mt possessing<br />

three times the number <strong>of</strong> F-box domain<br />

genes compared with either Gm or Lj (100).<br />

The Mt genome is also notable for the<br />

presence <strong>of</strong> a large and novel gene family,<br />

the nodule-related cysteine-rich peptides<br />

(NCRs), which are members <strong>of</strong> the larger<br />

group <strong>of</strong> defensin-like sequences (DEFLs)<br />

(31). Notably, this group <strong>of</strong> genes has been<br />

observed only in members <strong>of</strong> the so-called<br />

inverted repeat-lacking clade (IRLC) [97a;<br />

http://tolweb.org/IRLC_(Inverted_Repeatlacking_clade)]<br />

<strong>of</strong> legumes, a subgroup <strong>of</strong><br />

cool-season legumes that includes genera such<br />

as Pisum, Vicia, and Trifolium. The IRLC<br />

represents a clade <strong>of</strong> legumes known to have<br />

lost one copy <strong>of</strong> the 25-kb inverted repeat in<br />

its plastid genome—hence its name. <strong>Genome</strong><br />

analysis demonstrates that the gene family is<br />

entirely missing from the sequences <strong>of</strong> Gm and<br />

Lj. DEFLs are known to act as antimicrobials in<br />

plants (27), although recently, Mt NCRs were<br />

also found to play a role in signaling terminal<br />

differentiation in rhizobial bacteria during<br />

nodulation (92). Notably, Mt and related<br />

genera develop an indeterminate nodule quite<br />

different than the one observed in Gm, Lj,<br />

or other papilionoids (89). Altogether, there<br />

are 593 NCRs in Mt along with 778 genes<br />

within the larger DEFL gene family. Like<br />

NBS-LRRs, NCRs are tightly clustered within<br />

the Mt genome, with 74% found in tandem<br />

clusters. Given their absence from the Gm<br />

and Lj genome sequences, NCRs must have<br />

expanded relatively rapidly within the IRLC<br />

clade. If so, some mechanism <strong>of</strong> propagation,<br />

such as ectopic movement followed by tandem<br />

duplication, may have led to their expansion.<br />

Sequencing in Nonreference <strong>Legume</strong>s<br />

Beyond the sequencing <strong>of</strong> reference species,<br />

genome-scale analysis is rapidly moving <strong>into</strong><br />

less characterized legume species. Indeed,<br />

a draft genome sequence <strong>of</strong> pigeon pea<br />

(Cajanus cajan) has recently been published, including<br />

scaffolds representing 73% <strong>of</strong> the pigeon<br />

pea genome (94). All this is possible owing<br />

to the recent development <strong>of</strong> next-generation<br />

sequencing technologies, where billions <strong>of</strong> base<br />

pairs (Gb) can be sequenced at very high efficiency<br />

(57). In chickpea (C. arietinum), both<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 287


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Hiremath et al. (34) and Garg et al. (28) have<br />

used next-generation sequence technology to<br />

rapidly sequence the chickpea transcriptome.<br />

In the process, they developed an inventory for<br />

most chickpea expressed sequences, assigned<br />

predicted functions based on homology and<br />

gene ontology analysis, and aligned the assembled<br />

sequences to the Mt genome sequence.<br />

Next-generation sequencing in chickpea also<br />

led to the development <strong>of</strong> hundreds <strong>of</strong> different<br />

single-nucleotide polymorphism (SNP)<br />

and conserved genetic marker sequences useful<br />

in mapping. Córdoba et al. (18) have taken<br />

a very different approach to expanding the set<br />

<strong>of</strong> genomic tools in common bean (P. vulgaris).<br />

Analysis <strong>of</strong> nearly 90,000 BAC clones enabled<br />

the discovery <strong>of</strong> >600 simple sequence repeat<br />

markers. Mapping these repeats provided<br />

a basis for integrating the physical and genetic<br />

maps <strong>of</strong> Phaseolus. Many <strong>of</strong> the next-generation<br />

transcriptome assemblies and related data for<br />

orphan legume species are being collected<br />

and made available through the U.S. Department<br />

<strong>of</strong> Agriculture–supported <strong>Legume</strong> Information<br />

System (http://www.comparativelegumes.org)<br />

on its “Species” page.<br />

Inevitably, extending the power <strong>of</strong> wholegenome<br />

sequencing to nonreference legumes<br />

will require the creation <strong>of</strong> true whole-genome<br />

sequences for those species. This may soon be<br />

realistic given the ongoing increase in short<br />

read throughput coupled with decline in costs.<br />

However, de novo assembly <strong>of</strong> next-generation<br />

sequence data at the whole-genome scale remains<br />

challenging (2, 38). Nevertheless, there<br />

is intense work in this area aimed toward optimum<br />

contig assembly and improved scaffolding<br />

options (8, 29, 59, 86). Moreover,<br />

high-throughput physical mapping by wholegenome<br />

pr<strong>of</strong>iling (93) together with the launch<br />

<strong>of</strong> third-generation sequencing technologies<br />

such as those <strong>of</strong> Pacific Biosciences (PacBio)<br />

(63) will further enhance superscaffolding <strong>of</strong><br />

genome assemblies <strong>into</strong> large pseudomolecules.<br />

Despite relatively high error rates, PacBio<br />

“strobed” multiple reads extending over long<br />

physical distances have great potential to contribute<br />

toward this goal.<br />

From <strong>Genome</strong> Sequencing<br />

to Resequencing<br />

Sequencing in legumes has not been limited to<br />

the development <strong>of</strong> reference genomes: Nextgeneration<br />

sequencing technologies also enable<br />

the resequencing <strong>of</strong> plant genomes (50, 77, 82),<br />

and resequencing opens the door to genomewide<br />

association studies. Here, statistical associations<br />

between sequence variation and naturally<br />

occurring phenotypic variation—detected<br />

at very high density through the process <strong>of</strong><br />

resequencing—enable the discovery and localization<br />

<strong>of</strong> potential causative loci (4). But to<br />

make genome-wide association studies practical,<br />

insights <strong>into</strong> the architecture <strong>of</strong> sequence<br />

variation, haplotype size, population structure,<br />

and linkage disequilibrium (LD) are critically<br />

important (64). These subject areas, therefore,<br />

have been explored extensively in sequenced<br />

legume genomes (11, 50), just as in other wellcharacterized<br />

plant genomes (4, 36, 90).<br />

One example has been deep next-generation<br />

sequencing <strong>of</strong> the wild ancestor <strong>of</strong> cultivated<br />

soybean, Glycine soja, followed by comparison<br />

with the published Gm reference (45). Here,<br />

researchers generated more than 48 Gb <strong>of</strong><br />

G. soja sequence, aligned it to the published Gm<br />

reference, and obtained more than 97% coverage.<br />

In the process, they discovered 2.5 Mb in<br />

total SNP variation between the genomes and<br />

found that 35.6% <strong>of</strong> all high-confidence genes<br />

contained at least one SNP. Additionally, they<br />

observed 406 kb <strong>of</strong> small insertions or deletions,<br />

32.4 Mb <strong>of</strong> unaligned and presumably<br />

deleted sequence from G. soja, and 8.3 Mb <strong>of</strong><br />

novel, G. soja–specific sequence compared with<br />

Gm. Altogether, then, Gm and G. soja differ by<br />

0.31%, a value less than among Arabidopsis accessions<br />

(69) or between O. sativa ssp. indica<br />

and O. sativa ssp. japonica (40). Analysis <strong>of</strong> synonymous<br />

(K s ) values involving 6,780 genes also<br />

indicates that Gm and G. soja diverged approximately<br />

267,000 years ago, long before the<br />

domestication <strong>of</strong> soybean by humans.<br />

Focusing on genome variation within cultivated<br />

soybean itself, Lam et al. (50) utilized<br />

Illumina sequencing technology to survey 14<br />

288 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

cultivated and 17 wild Glycine accessions. Here,<br />

researchers obtained ∼5x coverage <strong>of</strong> the Gm<br />

genome for each <strong>of</strong> the 31 accessions. Not surprisingly,<br />

the wild accessions had much higher<br />

levels <strong>of</strong> genetic diversity (approximately 56%<br />

higher) and smaller LD blocks (approximately<br />

twice the frequency <strong>of</strong> LD blocks less than<br />

20 kb) compared with cultivated accessions.<br />

Indeed, they found that LD decays quite slowly<br />

in cultivated soybeans, with some LD blocks<br />

extending more than 1 Mb. Such results are expected<br />

during the domestication process, which<br />

presumably resulted in one or more genetic bottlenecks,<br />

lower diversity among cultivars, and<br />

large LD blocks. Separately, a scan for genome<br />

regions with high levels <strong>of</strong> differentiation between<br />

wild and cultivated soybeans and/or very<br />

low sequence diversity uncovered candidate regions<br />

associated with domestication. Two such<br />

regions <strong>of</strong> special interest were discovered, including<br />

a 200-kb region on Gm chromosome<br />

10 that overlaps known quantitative trait loci<br />

(QTLs) for harvest index, yield, and vitamin E<br />

content (37, 53). Analytical strategies like this<br />

involving a search for potential sites <strong>of</strong> selection<br />

are some <strong>of</strong> the most promising outcomes<br />

<strong>of</strong> genome resequencing research.<br />

The Mt genome has also been the target<br />

<strong>of</strong> genome resequencing (11). Twenty-six<br />

Mt accessions were sequenced to nearly 30x<br />

coverage, discovering more than 3 million<br />

total SNPs at a genome-wide density <strong>of</strong> 0.004–<br />

0.006 (i.e., 4–6 sequence variants every 1 kb),<br />

significantly higher than in wild and cultivated<br />

soybeans (50) or in Arabidopsis (17). LD decays<br />

quickly in Mt, reaching half its initial value<br />

within 3–4 kb, quite similar to that <strong>of</strong> Arabidopsis<br />

(46). Two gene families, the NBS-LRRs<br />

and NCRs, were found to harbor significantly<br />

higher levels <strong>of</strong> sequence diversity, especially in<br />

nonsynonymous sites. NBS-LRRs are known<br />

from other studies to be highly diverse (17), but<br />

it is intriguing to find that NCRs are also highly<br />

diverse given their recently discovered role in<br />

rhizobium signaling (92). Finally, resequence<br />

data in Mt revealed four genome regions<br />

as potential sites for selection, this time by<br />

searching for contiguous windows <strong>of</strong> very low<br />

sequence diversity. Three <strong>of</strong> these regions were<br />

located at telomeric ends <strong>of</strong> chromosomes,<br />

though the significance <strong>of</strong> this is unknown, and<br />

only a few examples <strong>of</strong> genes with suggestive<br />

functions (an isolated NBS-LRR, ENOD92)<br />

were found within candidate regions.<br />

Population genomic analysis can also reveal<br />

candidate regions associated with local<br />

adaptation, as demonstrated by Friesen et al.<br />

(25). In this case, 12 inbreds derived from four<br />

wild Tunisian populations <strong>of</strong> Mt were analyzed<br />

using Affymetrix GeneChip technology.<br />

Here, sequence variation is revealed by analysis<br />

<strong>of</strong> single-feature polymorphisms (SFPs), which<br />

are hundreds <strong>of</strong> thousands <strong>of</strong> probes located<br />

throughout the genome and all interrogated simultaneously<br />

by hybridization. The underlying<br />

logic <strong>of</strong> the study was to search for SFPs among<br />

inbreds and then to target loci that assorted by<br />

population. Altogether, 7% <strong>of</strong> all Affymetrix<br />

features segregated among inbreds, but only<br />

3% differentiated populations. By design, these<br />

Mt populations could be split <strong>into</strong> two groups<br />

according to their original habitats: two populations<br />

from saline environments versus two from<br />

nonsaline environments. A total <strong>of</strong> 18 genome<br />

regions defined by 52 probes showed consistent<br />

differences between the two habitats, results<br />

that could be validated by assaying a subset<br />

<strong>of</strong> the SFPs on a larger set <strong>of</strong> individuals in<br />

contrasting populations.<br />

COMPARATIVE GENOMICS AND<br />

THE SEARCH FOR THE<br />

PRIMORDIAL LEGUME GENOME<br />

Strategies for Comparative<br />

Genomic Analysis<br />

It has long been known that species in the<br />

same taxonomic family share extensive tracts <strong>of</strong><br />

homologous genes, <strong>of</strong>ten in the same or similar<br />

gene order (1, 20, 26). This is commonly called<br />

synteny, though colinearity is probably a better<br />

term whenever gene order is maintained.<br />

<strong>Legume</strong>s are no exception, with a growing<br />

number <strong>of</strong> studies demonstrating genome-scale<br />

synteny, especially among papilionoids (5, 10,<br />

16). Synteny is discovered either through the<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 289


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

genetic mapping <strong>of</strong> sequence-based markers<br />

segregating in multiple related species or<br />

by large-scale similarity searches between<br />

sequenced genomes. A hybrid approach, where<br />

sequenced genetic markers in one species are<br />

compared with a sequenced genome, is useful<br />

in translating insights from a reference to a less<br />

well-characterized species (5, 61, 65). Comparative<br />

genomics makes it possible to infer the<br />

structural changes that have led to present-day<br />

species while also enabling the reconstruction<br />

<strong>of</strong> primordial genome structure—the architecture<br />

<strong>of</strong> ancestral chromosomes and the<br />

underlying repertoire <strong>of</strong> genes (72). From a<br />

practical point <strong>of</strong> view, comparative genomics<br />

expands the range <strong>of</strong> genomics tools available<br />

for positional gene cloning (99) and discovery<br />

<strong>of</strong> new genetic markers (19, 33, 35), especially<br />

in species with few genomic tools. <strong>Legume</strong>s<br />

fit this description nicely, with dozens <strong>of</strong><br />

agriculturally important but less well-studied<br />

crop species. This list includes valuable food<br />

crops like garden pea (P. sativum), chickpea<br />

(C. arietinum), alfalfa (M. sativa), common bean<br />

(P. vulgaris), and cowpea (V. unguiculata), all<br />

well-positioned phylogenetically with respect<br />

to the sequenced genomes <strong>of</strong> Gm, Mt, andLj<br />

(95).<br />

Visualization is key to successful comparative<br />

genomics, and there are various methods<br />

to visualize genome comparisons. One popular<br />

technique involves Circos diagrams (48),<br />

where chromosomes are placed end to end<br />

along the outside <strong>of</strong> a circle, and then colored<br />

arcs connecting homologous segments are<br />

joined within the circle (for notable legume<br />

examples, see References 50 and 100). An<br />

especially attractive feature <strong>of</strong> Circos diagrams<br />

is their ability to visualize multiple genomes<br />

while also illustrating synteny at reasonably<br />

high resolution. Alternatively, synteny can be<br />

visualized through the use <strong>of</strong> dot-plot diagrams<br />

(Figures 1 and 2). Here, one genome (or<br />

genome segment) is laid along the horizontal<br />

axis and a second genome (or segment) is laid<br />

along the vertical axis. A mark is then made at<br />

intersections where the two genomes display<br />

sequence similarity above some cut<strong>of</strong>f value.<br />

This results in significant stretches <strong>of</strong> synteny<br />

appearing as diagonal lines, with cases <strong>of</strong> parallel<br />

diagonal lines spanning the same portion<br />

<strong>of</strong> a genome indicating a potential duplication<br />

event. Of course, the dot-plot method can<br />

easily be applied to genetic marker comparisons<br />

and does not require sequenced genomes<br />

(42). Notably, both visualization methods can<br />

be used to compare a genome with itself in<br />

a search for within-species synteny, thereby<br />

investigating duplication events and helping to<br />

reveal the genomic history <strong>of</strong> a given species.<br />

Comparing <strong>Legume</strong> <strong>Genome</strong>s<br />

Although comparative genomics is most powerful<br />

when comparing sequenced genomes,<br />

there are only a few such legume genome<br />

sequences available today. Consequently, most<br />

legume comparative genomics studies to date<br />

have involved comparisons based on genetic<br />

markers. This raises the question, how are<br />

large numbers <strong>of</strong> shared and polymorphic<br />

markers discovered for multiple species? One<br />

successful strategy has been to design exonic<br />

polymerase chain reaction (PCR) primers<br />

that amplify across (shared) introns using<br />

available genomic sequence data as the basis<br />

for primer design. The idea here is that exonic<br />

sequences tend to be highly conserved, whereas<br />

intronic sequences tend to be variable, thereby<br />

providing both the conservation needed across<br />

species for successful PCR as well as the polymorphism<br />

needed for segregation mapping.<br />

As an example, Choi et al. (16) developed<br />

hundreds <strong>of</strong> potential cross-species legume<br />

markers based on Mt and Arabidopsis sequence<br />

data and demonstrated extensive synteny across<br />

papilionoids through detailed analysis <strong>of</strong> ∼50<br />

such markers. These markers demonstrated<br />

conservation that stretched from millettoids<br />

[Gm, mungbean(V. radiata)] all the way to<br />

galegoids [Mt, garden pea (P. sativum), alfalfa<br />

(M. sativa)]. In the process, they established<br />

the first integrated view <strong>of</strong> legume synteny in<br />

the form <strong>of</strong> a concentric graphic view (60) and<br />

illustrated the overall topology <strong>of</strong> pan-legume<br />

synteny. More recently, Hougaard et al. (35)<br />

290 Young·Bharti


*Lj03N<br />

*Lj03S<br />

Lj04N<br />

*Lj04S<br />

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Lotus japonicus genome<br />

Figure 1<br />

*Lj02N<br />

*Lj06S<br />

Lj05S<br />

Lj05N<br />

Lj02S<br />

Lj01S<br />

Lj01N<br />

*Lj06N<br />

3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N<br />

Medicago truncatula genome<br />

Whole-genome dot-plot <strong>of</strong> two cool-season legume species, Medicago truncatula (Mt) ssp. truncatula<br />

Jemalong-A17 (horizontal axis) andLotus japonicus (Lj) cultivar Miyakojima MG-20 (vertical axis). An asterisk<br />

next to a chromosome number indicates reverse complement. The numbers/letters on the axes represent the<br />

chromosome number and north/south arms, respectively; these have been rearranged so that synteny blocks<br />

line up along the center diagonal, which makes the comparison easier to visualize. Many synteny blocks are<br />

nearly the lengths <strong>of</strong> whole chromosome arms (red circle), whereas others are disrupted by rearrangements<br />

( green circle). Secondary synteny blocks outside the main diagonal (orange circles) represent the wholegenome<br />

duplication in Papilionoideae ∼58 Mya. Two notable genome regions where synteny is totally<br />

lacking between the two species (Mt06N with Lj06S and Mt03N/Mt04N with Lj03N) are circled in purple.<br />

took the intron-spanning approach a step further<br />

by showing that 50% <strong>of</strong> intron-spanning<br />

markers designed from Arabidopsis work successfully<br />

in both common bean (P. vulgaris)<br />

and peanut (A. hypogaea). This is significant<br />

because peanut, although still a papilionoid,<br />

is in the dalbergioid clade, which is phylogenetically<br />

separate from the more frequently<br />

characterized millettoid and galegoid clades.<br />

Another strategy for comparative genomics<br />

begins with the mining <strong>of</strong> existing<br />

expressed sequence tag (EST) databases to<br />

search for SNPs or other types <strong>of</strong> mappable<br />

polymorphisms. Once positioned genetically,<br />

the underlying ESTs can be compared with<br />

one <strong>of</strong> the sequenced legume genomes as a<br />

basis for discovering shared synteny. Bertioli<br />

et al. (5) adopted this approach and extended<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 291


Gm12S<br />

*Gm15S<br />

Gm20N<br />

Gm05N<br />

Gm12N<br />

Gm06S<br />

Gm07S<br />

Gm17N<br />

*Gm08N<br />

*Gm16S<br />

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Glycine max genome<br />

*Gm05S<br />

*Gm13N<br />

*Gm07N<br />

*Gm19N<br />

Gm09N<br />

*Gm13S<br />

Gm15N<br />

*Gm20S<br />

Gm10S<br />

*Gm10N<br />

*Gm14N<br />

Gm14S<br />

*Gm17S<br />

Gm02S<br />

*Gm11N<br />

*Gm02N<br />

Gm01S<br />

*Gm01N<br />

Gm03S<br />

*Gm09S<br />

*Gm03N<br />

Gm16N<br />

Gm19S<br />

Gm18S<br />

Gm06N<br />

Gm11S<br />

Gm08S<br />

Gm04N<br />

*Gm04S<br />

*Gm18N<br />

3S 7N 7S *5N 5S *1N 1S 2N *6N *6S *8S *4S 8N *2S 3N 4N<br />

Medicago truncatula genome<br />

292 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

the earlier work <strong>of</strong> Hougaard et al. (35) by<br />

focusing on ∼126 cross-species ESTs mapped<br />

in Arachis and compared with available Mt and<br />

Lj sequences. They found that most synteny<br />

blocks align to a single region in either genome,<br />

an important observation because it implies<br />

that a previously predicted papilionoid WGD<br />

event (see below) predated the divergence <strong>of</strong><br />

Arachis from galegoids and phaseoloids, and<br />

so occurred very early in the evolution <strong>of</strong> the<br />

subfamily. In Muchero et al. (61), more than<br />

10,000 SNPs were discovered within available<br />

EST databases <strong>of</strong> cowpea (V. unguiculata) and<br />

then used in map construction leading to 928<br />

positioned cowpea loci through the use <strong>of</strong> a<br />

medium-throughput Illumina GoldenGate<br />

assay system. Comparison with Gm revealed<br />

85% macrosynteny, while macrosynteny with<br />

the more distantly related Mt was still high<br />

at 82%. In a similar study, McClean et al.<br />

(56) examined >300 gene-based Phaseolus loci<br />

coming from EST and BAC-end sequence data<br />

and compared them with the Gm reference<br />

genome sequence, discovering 55 synteny<br />

blocks on 35 <strong>of</strong> Gm’s 40 chromosome arms.<br />

Syntenic blocks averaged 32 centimorgans<br />

in length in Phaseolus, a genetic distance that<br />

corresponded to an average physical distance <strong>of</strong><br />

4.9 Mb in Gm. Using this set <strong>of</strong> synteny blocks<br />

as reference points, they could tentatively position<br />

another 15,000 Phaseolus gene sequences<br />

solely based on the Gm genome sequence.<br />

Side-by-side comparison <strong>of</strong> sequenced<br />

genomes is the most powerful way to learn<br />

about genome histories. In such comparisons<br />

it becomes possible to estimate the fraction <strong>of</strong><br />

shared genes, the size distribution <strong>of</strong> synteny<br />

blocks, or differences between genomes in gene<br />

density or organization. Going a step further,<br />

one can examine genome rearrangements at the<br />

macroscale, whether they are shared or lineage<br />

specific, or drill down to the base-pair level to<br />

dissect the fine structure <strong>of</strong> conserved colinear<br />

genes. Ultimately, as more sequenced species<br />

are added to the analysis, we begin to see the actual<br />

step-by-step changes that distinguish one<br />

genome from another.<br />

One <strong>of</strong> the first sequence-based comparisons<br />

in legumes was between Mt and Gm. Focusing<br />

on a genome region surrounding a nematoderesistance<br />

gene in Gm (rhg1) on chromosome<br />

Gm18, Mudge et al. (62) found that 75% <strong>of</strong><br />

genes were colinear between Mt and Gm in a<br />

region spanning ∼150 genes, including a remarkable<br />

stretch where 33 <strong>of</strong> 35 genes (94%)<br />

were conserved and colinear, a phenomenon<br />

they termed hypersynteny. Cannon et al. (15)<br />

later carried out a genome-scale sequence comparison<br />

based on the partially completed Mt and<br />

Lj genomes available at the time. In the case <strong>of</strong><br />

one large synteny block between Mt05N and<br />

Lj02S, they found that 58 <strong>of</strong> 94 genes (62%) existed<br />

as colinear orthologous pairs between the<br />

syntenic segments. Indeed, synteny between Mt<br />

and Lj was found to extend nearly genomewide,<br />

despite a time span <strong>of</strong> 40–50 Mya since<br />

speciation.<br />

Figure 1 shows an updated dot-plot<br />

comparison <strong>of</strong> Mt and Lj based on versions<br />

<strong>of</strong> the genomes available in mid-2011<br />

(Reference 100 and ftp://ftp.kazusa.or.<br />

jp/pub/lotus/lotus_r2.5/pseudomolecule).<br />

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−<br />

Figure 2<br />

Whole-genome dot-plot <strong>of</strong> the cool-season legume Medicago truncatula (Mt) ssp. truncatula Jemalong-A17<br />

(horizontal axis) and the warm-season legume Glycine max (Gm) var. Williams 82 (vertical axis). The<br />

pericentromeric regions <strong>of</strong> Gm chromosomes have been removed for this analysis. An asterisk next to a<br />

chromosome number indicates reverse complement. Chromosome arms have been rearranged so that the<br />

synteny blocks line up along the center diagonal, which makes the comparison easier to visualize. The<br />

presence <strong>of</strong> two synteny diagonals for almost every Mt region indicates an additional recent whole-genome<br />

duplication (


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Here, chromosome arms for both Mt and<br />

Lj (based on the estimated positions <strong>of</strong> centromeric<br />

regions) have been reordered and<br />

in some cases flipped (noted by an asterisk)<br />

to align synteny blocks <strong>into</strong> a single coherent<br />

line. The result highlights the genome-scale<br />

synteny observed between the two species.<br />

If perfect synteny existed between Mt and<br />

Lj, the roughly 45 ◦ dot-plot line would be<br />

straight and continuous, and would reach<br />

all the way from one end to the other. The<br />

fact that the actual result produces a line that<br />

approaches this ideal is overwhelming evidence<br />

for genome-scale synteny between the two<br />

species. Synteny blocks are nearly the lengths<br />

<strong>of</strong> whole chromosome arms, and overall they<br />

span more than 75% <strong>of</strong> both species. One<br />

striking example between Mt05N and Lj02S<br />

is circled in red. Still, there are also breaks in<br />

synteny—for example, Mt07S and its synteny<br />

with Lj01S (circled in green). Here, rather<br />

than a contiguous diagonal line, one sees a<br />

cloud <strong>of</strong> shorter synteny blocks, broken <strong>into</strong><br />

six pieces with two <strong>of</strong> them flipped around.<br />

Apparently, one or both syntenic chromosomes<br />

experienced major reorganization events since<br />

the separation <strong>of</strong> Mt and Lj.Therearealsonotable<br />

genome regions where synteny is totally<br />

lacking between the two species. Mt06N with<br />

Lj06S and Mt03N/Mt04N with Lj03N (circled<br />

in purple) are striking examples. Significantly,<br />

these genome regions coincide with higher<br />

densities <strong>of</strong> NBS-LRRs and retrotransposons<br />

compared with the remainder <strong>of</strong> the genome, a<br />

relationship that may be biologically significant<br />

(5) and similar in terms <strong>of</strong> degraded synteny to<br />

observations made in A. hypogaea (76).<br />

Envisioning the Ancestral<br />

<strong>Legume</strong> <strong>Genome</strong><br />

Inevitably, as more legumes are sequenced it<br />

will become possible to reconstruct the ancestral<br />

legume genome, or at least the ancestral<br />

papilionoid genome. Such an effort is underway<br />

by integrating the sequenced legume genomes<br />

with comparably high-density marker/map data<br />

from species such as chickpea (C. arietinum)<br />

and pigeon pea (C. cajan) (D. Cook, personal<br />

communication). Comparisons <strong>of</strong> the Gm, Mt,<br />

and Lj genomes already provide a glimpse <strong>into</strong><br />

the large-scale architecture <strong>of</strong> the ancestral<br />

legume genome. Despite the complexities resulting<br />

from the 13-Mya Glycine WGD event<br />

(discussed in further detail below), comparisons<br />

among Gm, Mt,andLj (Figures 1 and 2)<br />

suggest a limited number <strong>of</strong> ancestral synteny<br />

blocks that have been rearranged to generate<br />

present-day papilionoid genomes. In both comparisons,<br />

a conservative examination reveals just<br />

14 largely coherent blocks that span the majority<br />

<strong>of</strong> all three genomes. Notably, this estimate<br />

agrees nicely with the apparent basal chromosome<br />

number <strong>of</strong> seven for papilionoids (74).<br />

GENOME DUPLICATIONS<br />

IN LEGUME BIOLOGY<br />

Whole-<strong>Genome</strong> Duplication Events<br />

in the History <strong>of</strong> <strong>Legume</strong>s<br />

One <strong>of</strong> the most striking lessons coming out<br />

<strong>of</strong> plant comparative genomics has been the<br />

critical role <strong>of</strong> genome duplication in the evolutionary<br />

history <strong>of</strong> many, if not most, plant<br />

species (21). This is especially true in the case<br />

<strong>of</strong> legumes. Gm provided an early hint <strong>into</strong> the<br />

importance <strong>of</strong> WGD in genome restructuring<br />

in a study showing that restriction fragment<br />

length polymorphisms were duplicated on average<br />

2.55 times and localized to a homoeologous<br />

segment (paralogous sequences resulting<br />

from WGD) nearly as long as whole chromosomes<br />

(84). Later, as large amounts <strong>of</strong> genome<br />

sequence data became available, it became clear<br />

that most present-day plant genomes are the<br />

products <strong>of</strong> ancient genome-scale duplication<br />

events (examples include 3, 40, 41, 91). Subsequent<br />

studies have gone on to reveal the wide<br />

range <strong>of</strong> plant families that have experienced<br />

genome duplications and the architecture <strong>of</strong> retained<br />

duplication blocks, and have established<br />

reasonably precise estimates for the timing <strong>of</strong><br />

key duplication events (7, 73, 85). We know,<br />

for example, that many dicots share an ancient<br />

(130–140 Mya) triploidization event based on<br />

294 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

synteny analysis <strong>of</strong> Vitus vinifera and the fact<br />

that each Vitus region typically shows synteny to<br />

three corresponding regions in other sequenced<br />

dicots (41). We also know that a surprisingly<br />

large number <strong>of</strong> plant WGD events followed<br />

closely after the Cretaceous-Tertiary boundary<br />

event ∼65 Mya. This led Fawcett et al. (22)<br />

to suggest that polyploids might have higher<br />

adaptability and greater tolerance to extreme<br />

conditions, something that would have come in<br />

quite handy during a time <strong>of</strong> widespread species<br />

extinction. Finally, we are beginning to discover<br />

the details about the aftermath <strong>of</strong> WGD events<br />

(summarized in 24)—and it is this final point,<br />

the consequence <strong>of</strong> genome duplication, that<br />

is especially relevant to our consideration <strong>of</strong><br />

legume genome biology.<br />

<strong>Genome</strong> duplication is easy to see when<br />

looking at a dot-plot comparison. A closer look<br />

at Figure 1 reveals numerous secondary synteny<br />

blocks lying to one side or the other <strong>of</strong> the<br />

main diagonal. One notable example is where<br />

the primary synteny block involving Mt01N<br />

and Lj05N is paralleled by another synteny<br />

block lower down, between Mt01N and Lj01N<br />

(orange circles connected by an orange line).<br />

There are dozens <strong>of</strong> such duplicated synteny<br />

blocks in the comparison between these two<br />

species, and the simplest interpretation is an ancient<br />

WGD preceding the speciation between<br />

Mt and Lj. In a comparison like this, synteny<br />

blocks lying along the main diagonal represent<br />

the speciation event, whereas the <strong>of</strong>f-center<br />

diagonals show regions <strong>of</strong> synteny resulting<br />

from one or more shared WGD events. Apparently,<br />

a WGD event that took place in the<br />

ancestor <strong>of</strong> Mt and Lj was followed quickly by<br />

a period <strong>of</strong> significant genome rearrangement<br />

and gene loss before speciation, rapidly degrading<br />

the quality <strong>of</strong> duplicate synteny blocks<br />

observed. (Loss <strong>of</strong> synteny in duplicate blocks<br />

is important in understanding the impact <strong>of</strong><br />

duplication on legume biology and is discussed<br />

in more detail below.) The existence <strong>of</strong> such a<br />

WGD in the legume family has been indicated<br />

through multiple sources <strong>of</strong> evidence, especially<br />

K s (synonymous substitution) estimates<br />

between paralogs (6, 73, 80) and topology <strong>of</strong><br />

phylogenetic tree analysis (12, 15). Integrating<br />

all these different sources <strong>of</strong> data leads to a<br />

best estimate for the timing <strong>of</strong> this WGD <strong>of</strong><br />

58 Mya. This date would have preceded the<br />

Mt/Lj split (approximately 50 Mya) as well as<br />

the split with Gm (54 Mya) (52). Indeed, peanut<br />

(A. hypogaea), an earlier diverging papilionoid,<br />

also shares this WGD event (5). By contrast,<br />

a recent study in Chamaecrista indicates that<br />

this species (and presumably the Mimosoideae<br />

and Caesalpinioideae subfamilies) do not share<br />

the 58-Mya WGD event (12). In other words,<br />

we know with remarkable precision both the<br />

timing and evolutionary window for this pivotal<br />

WGD event in the history <strong>of</strong> legumes. Given<br />

the range <strong>of</strong> species that share this duplication,<br />

we will refer to it as the papilionoid WGD.<br />

But the papilionoid WGD is not the only<br />

one to play an important role in legume evolution.<br />

Figure 2 displays a comparison <strong>of</strong> the<br />

Mt and Gm genome sequences (based on their<br />

recently published sequences). This comparison<br />

illustrates important similarities but also<br />

striking differences with the Mt/Lj dot-plot in<br />

Figure 1. Gm and Mt clearly display extensive<br />

synteny, with many long, coherent synteny<br />

blocks. A quick count reveals as many as 30<br />

large-scale synteny blocks running the length<br />

<strong>of</strong> chromosome arms or nearly so. However,<br />

there is not a single 45 ◦ diagonal stretching<br />

across the genomes; instead, there are pairs <strong>of</strong><br />

diagonals in Gm corresponding to individual<br />

chromosome arms <strong>of</strong> Mt. One example (circled<br />

in red) highlights synteny between Mt05S and<br />

two different Gm chromosomes/arms, Gm02S<br />

and Gm14N/Gm14S. A WGD is again the<br />

explanation, but this time, one that occurred<br />

more recently (estimated at 13 Mya) and only<br />

in the lineage leading to Gm (84). This duplication<br />

event explains the observation that there<br />

are two Gm blocks for each Mt genome region.<br />

Comparable levels <strong>of</strong> contiguity observed in<br />

each pair <strong>of</strong> synteny blocks are explained by the<br />

fact that both trace back to a single WGD event,<br />

and so the evolutionary distance between Mt<br />

and both <strong>of</strong> the Gm syntenic segments must be<br />

identical. This Glycine-specific WGD had been<br />

predicted previously (84), but the publication<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 295


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

<strong>of</strong> the Gm genome revealed just how pervasive<br />

and fundamental it is in understanding the<br />

architecture <strong>of</strong> the present-day Gm genome.<br />

Figure 2 also illustrates exceptions to this<br />

pattern, demonstrating two important points.<br />

First, synteny blocks like the ones circled in<br />

orange show more ancient synteny blocks that<br />

trace back to the papilionoid WGD discussed<br />

above. Clearly, Mt and Gm (as well as Lj—and,<br />

indeed, all papilionoids) are expected to share<br />

the 58-Mya WGD event. Second, there are<br />

frequent cases <strong>of</strong> rearrangements—some that<br />

are simple, like the one involving Mt01S and<br />

Gm10N/Gm10S and Gm20S (circled in green),<br />

but others that are quite complex (one example<br />

circled in purple). These rearrangements are<br />

best explained by significant levels <strong>of</strong> reshuffling<br />

among the duplicated Glycine genome<br />

segments after the 13-Mya WGD event.<br />

THE AFTERMATH OF GENOME<br />

DUPLICATION AND ITS IMPACT<br />

ON LEGUME BIOLOGY<br />

The Fates <strong>of</strong> Duplicated Genes<br />

WGDs obviously have a pr<strong>of</strong>ound impact on<br />

genome architecture. However, genome duplications<br />

play an equally important role in the<br />

evolution <strong>of</strong> individual genes and gene families.<br />

Other types <strong>of</strong> gene duplication exist—<br />

tandem gene duplication, segmental duplication,<br />

transposition—and they are certainly important<br />

in genomic and biological evolution<br />

(24). However, WGD events are worthy <strong>of</strong> special<br />

consideration because when they occur, every<br />

gene in the genome is suddenly present in<br />

two copies. In effect, the entire evolutionary trajectory<br />

<strong>of</strong> a lineage becomes primed to move in<br />

a novel direction. In the case <strong>of</strong> legumes, there<br />

is growing evidence that WGD events had an<br />

especially significant impact on nodulation and<br />

symbiosis with rhizobial bacteria (100). After<br />

duplications, there are only a small number <strong>of</strong><br />

potential fates for duplicated gene pairs (24):<br />

Both paralogs are maintained and they share<br />

the function <strong>of</strong> their progenitor; both paralogs<br />

are maintained and one takes on an entirely new<br />

function; or one <strong>of</strong> the two progeny genes is lost<br />

and only a single copy is maintained. The first<br />

outcome (both genes maintained with shared<br />

function) is <strong>of</strong>ten called subfunctionalization,<br />

as the two paralogs have split up the function<br />

<strong>of</strong> their ancestor (23). The second (both<br />

maintained, one taking on a new function) is<br />

called ne<strong>of</strong>unctionalization, for obvious reasons<br />

(55). The other possibility (only one gene retained,<br />

the other deleted) is fractionation (51)<br />

or, equivalently, diploidization. Still other outcomes<br />

are possible, such as pseudogenization<br />

without loss <strong>of</strong> one <strong>of</strong> the duplicates, but are<br />

not considered in detail here. Ultimately, biological<br />

function is expected to play a critical<br />

role in the fate <strong>of</strong> duplicated genes, with some<br />

functional classes (those most interconnected)<br />

retained more frequently than others (proteins<br />

that generally act solo) (24). Understanding<br />

gene fate following WGDs sheds light on important<br />

biological phenomena in legumes, including<br />

properties such as the generation <strong>of</strong><br />

novel disease-resistance specificities and the appearance<br />

<strong>of</strong> novel developmental functions.<br />

To illustrate the fates <strong>of</strong> duplicated genes in<br />

legumes, Figure 3 displays a pair <strong>of</strong> duplicated<br />

segments in Mt roughly 150 kb in size each (located<br />

on Mt01 and Mt07) and shown alongside<br />

the four corresponding syntenic regions <strong>of</strong><br />

Gm. This figure was created using the PLAZA<br />

genome analysis suite (75) and is based on the<br />

published sequences <strong>of</strong> Mt and Gm. The results<br />

are striking. Each Mt segment exhibits remarkable<br />

conservation with the pair <strong>of</strong> most closely<br />

related Gm segments, but far less conservation<br />

with its duplicate Mt pair. In this example, just<br />

7 <strong>of</strong> 19 genes (37%) in the duplicated blocks<br />

<strong>of</strong> Mt are maintained. These are homoeologs<br />

(WGD-derived paralogs) that trace back to the<br />

papilionoid WGD at 58 Mya. By contrast, the<br />

Mt07 segment shares 13 <strong>of</strong> 16 genes (81%) with<br />

either Gm03 or Gm19, whereas the Mt01 segment<br />

shares 11 <strong>of</strong> 13 (85%) with either Gm02<br />

or Gm10. These are orthologous relationships<br />

that derive from the millettoid/galegoid speciation<br />

event separating Mt and Gm at ∼55 Mya<br />

(52). It is noteworthy that the time span between<br />

the papilionoid WGD and the Mt/Gm<br />

296 Young·Bharti


Mt/Gm split<br />

~54 Mya<br />

Gm WGD<br />

~13 Mya<br />

WGD<br />

~58 Mya<br />

Ancestral<br />

legume<br />

Gm03<br />

Gm19<br />

Mt07<br />

Mt01<br />

Gm02<br />

Gm10<br />

81%<br />

37%<br />

85%<br />

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Figure 3<br />

A 150-kb region on the Glycine max (Gm) andMedicago truncatula (Mt) genomes illustrating the differential gene loss between the<br />

duplicated regions, which took place after the split between warm- and cool-season legumes ∼54 Mya. In this example, only 37% <strong>of</strong> the<br />

genes are retained in both duplicated blocks <strong>of</strong> Mt, while the Mt duplicates retain 81%–85% with their Gm counterparts. By contrast,<br />

the number <strong>of</strong> retained gene pairs among Gm03/Gm19 (69%) and Gm02/Gm10 (100%) duplicates is much higher, at least in part due<br />

to the fact that the whole-genome duplication (WGD) in Gm is fairly recent (


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

translocated <strong>into</strong> the pericentromeric region <strong>of</strong><br />

the chromosome. Between the two Gm genome<br />

regions, 77% <strong>of</strong> gene duplicates were retained.<br />

However, this high level <strong>of</strong> retention did not<br />

extend to NBS-LRRs, which existed as clusters<br />

in both genome regions, but with significant<br />

homoeolog-specific duplications and losses.<br />

The pericentromeric region was especially<br />

reduced in surviving NBS-LRRs. Clearly,<br />

NBS-LRR genes are subject to much higher<br />

levels <strong>of</strong> fractionation than other gene classes.<br />

Local duplications, deletions, and recombination<br />

are apparently acting preferentially on<br />

WGD-derived NBS-LRR clusters, with the<br />

pericentromeric NBS-LRR cluster experiencing<br />

much higher levels <strong>of</strong> fractionation. This<br />

pattern has been noted in other plant species,<br />

with NBS-LRRs frequently underrepresented<br />

in duplicated genome regions (14, 64), potentially<br />

reflecting a fitness cost associated with<br />

excess NBS-LRRs (58).<br />

In a similar study by Kim et al. (44), a different<br />

pair <strong>of</strong> homoeologous genome regions<br />

(1.96–4.60 Mb) on Gm05 and Gm17 and centeredaroundtheRxp<br />

bacterial leaf pustule–<br />

resistance gene were examined and compared<br />

with the homologous Mt genome regions. In<br />

this case, fractionation in Mt was observed to<br />

extend to the level <strong>of</strong> gene blocks (in which<br />

multiple linked genes were retained in one duplicate)<br />

but lost from the other (contrasting<br />

with the apparent gene-by-gene fractionation<br />

illustrated in Figure 3). In the case <strong>of</strong> Gm<br />

and the more recent 13-Mya WGD, duplicates<br />

were also retained as blocks rather than individual<br />

genes, though some <strong>of</strong> the gene blocks<br />

were not lost, but were instead translocated to<br />

a different location in the Gm genome. Notably,<br />

the locations <strong>of</strong> homoeologs coincided<br />

with known QTLs for leaf pustule resistance,<br />

leading the authors to suggest that duplicated<br />

resistance genes may have retrained their ancestral<br />

function and then diverged in a pathogen<br />

strain–specific manner.<br />

Finally, Lin et al. (54) examined two<br />

∼1-Mb homoeologous regions containing<br />

NBS-LRR clusters in Gm (on Gm08 and Gm15)<br />

as well as the orthologous region <strong>of</strong> common<br />

bean (P. vulgaris). The level <strong>of</strong> gene retention<br />

varied from 81% to 91% among the Gm segments,<br />

values somewhat higher than observed<br />

by others (39, 44; Figure 3). As in Innes et al.<br />

(39), this analysis uncovered significant differences<br />

in retrotransposon density between the<br />

two regions, differences that were correlated<br />

with differing levels <strong>of</strong> structural variation. Going<br />

beyond structural analysis, the study examined<br />

gene expression levels along the two Gm<br />

segments and found 38% higher transcriptional<br />

activity on Gm08 compared with Gm15 based<br />

on a metric that integrated expression among<br />

seven different tissues. This difference in expression<br />

activity is significant because expression<br />

variation between retained gene pairs is an<br />

expectation <strong>of</strong> sub- and ne<strong>of</strong>unctionalization.<br />

<strong>Genome</strong> Duplication and the<br />

Evolution <strong>of</strong> Nodulation<br />

The property most striking about legumes is<br />

their capacity to form symbiotic nitrogen-fixing<br />

nodules in association with rhizobial bacteria.<br />

Not surprisingly, detailed analysis <strong>of</strong> legume<br />

genomes can provide valuable insights <strong>into</strong><br />

symbiosis, nodulation, and nitrogen fixation.<br />

At the simplest level, genome sequence data<br />

make it possible to generate a global inventory<br />

<strong>of</strong> nodulation-related genes. This was an<br />

important contribution <strong>of</strong> the recent Gm sequence<br />

(91). Here, genes <strong>of</strong> interest were identified<br />

by searching for Gm genes orthologous to<br />

known nodulation-related genes in any legume<br />

species. As a result, 34 Gm nodulins (noduleupregulated<br />

proteins) were discovered along<br />

with 23 nodulation-related regulatory genes<br />

within the Gm genome. This kind <strong>of</strong> gene<br />

inventory makes it possible to explore local<br />

nodulation-related gene clusters, putative homoeologs,<br />

and membership in related gene<br />

families. This inventory should be especially<br />

valuable in dissecting the global regulatory machinery<br />

controlling plant-rhizobium communication<br />

and nodule development.<br />

Analysis <strong>of</strong> the Mt genome sequence<br />

focused on the relationship between genome<br />

duplication and the evolution <strong>of</strong> nodulation.<br />

298 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Previous studies had established that legumes<br />

belong to a clade <strong>of</strong> rosids, Fabidae, that all<br />

share a predisposition to nodulate, presumably<br />

derived from their common ancestor (88). In<br />

analyzing the Mt genome, the question was<br />

whether the 58-Mya WGD contributed in any<br />

way to the elaboration <strong>of</strong> rhizobial nodulation.<br />

The answer appears to be a qualified yes.<br />

Multiple lines <strong>of</strong> evidence indicate that nodulation<br />

machinery predates the 58-Mya WGD.<br />

Moreover, many <strong>of</strong> the known regulatory<br />

steps in rhizobial nodulation are shared with<br />

mycorrhizal signaling (66), a symbiosis broadly<br />

shared among angiosperms (9). Just a few <strong>of</strong><br />

the known recognition steps are exclusively<br />

associated with rhizobial nodulation, including<br />

the key receptor-like kinase, NFP (66). In<br />

analyzing the Mt genome, NFP was found to<br />

have a homoeolog, LYR1, and genome position<br />

and K s data indicate that these duplicated<br />

genes derive from the 58-Mya WGD. NFP is<br />

nodulation specific in expression and function,<br />

whereas LYR1 is upregulated in mycorrhizae<br />

(30). In separate work, a nodulating nonlegume,<br />

Parasponia andersonii, is known to contain a single<br />

gene coding for a protein with the functions<br />

<strong>of</strong> both NFP and LYR1 (68). Therefore, one<br />

likely interpretation would be that the 58-Mya<br />

papilionoid WGD led to subfunctionalization<br />

<strong>of</strong> a more ancient gene that previously carried<br />

out both functions, resulting in two descendent<br />

genes that split the nodulation and mycorrhizal<br />

recognition functions between them. A separate<br />

nodulation-related transcription factor,<br />

ERN1 (96), also possesses a homoeolog (ERN2)<br />

in Mt. Like NFP/LYR1, ERN1 and ERN2 have<br />

contrasting nodulation-versus-mycorrhizal<br />

expression patterns and also derive from the<br />

58-Mya WGD. Potentially, they are a second<br />

example <strong>of</strong> sub- or ne<strong>of</strong>unctionalization<br />

resulting from the papilionoid WGD event.<br />

These observations even suggest a potential<br />

phylogenetic strategy for discovering genes<br />

that play a role in nodulation. It should be<br />

possible to mine the products <strong>of</strong> the 58-Mya<br />

WGD and search for genes that have nodulerelated<br />

expression in one or both gene products<br />

<strong>of</strong> the WGD event. At this point, one<br />

could examine potentially novel (or at least<br />

interesting) functions that these genes might be<br />

playing in nodulation. Indeed, this strategy has<br />

already been put <strong>into</strong> practice with the identification<br />

<strong>of</strong> a cytokinin response regulator promoting<br />

the expression <strong>of</strong> ERN1 (67). Analysis <strong>of</strong><br />

the Mt genome uncovers 51 additional WGDderived<br />

homoeolog pairs with one or both duplicates<br />

upregulated in nodules, including 10<br />

additional transcription factor genes.<br />

PERSPECTIVES ON<br />

LEGUME GENOMICS<br />

It is difficult to believe that massive amounts <strong>of</strong><br />

sequence data have been available in plants for<br />

such a short time. The pace <strong>of</strong> change has been<br />

so rapid that in less than a decade we have gone<br />

from having only thousands <strong>of</strong> ESTs in a few<br />

legume species to having three robust legume<br />

reference genomes. This review has examined<br />

ways in which the rapidly growing body <strong>of</strong><br />

genome sequence data sheds light on legume<br />

biology. At the simplest level, translation <strong>of</strong><br />

genome data between legume species enables<br />

important practical applications: the discovery<br />

<strong>of</strong> genetic markers, the development <strong>of</strong> linkage<br />

maps, and the saturation <strong>of</strong> genome regions<br />

for positional cloning. This is especially true<br />

for minor legumes, where many species are<br />

important to agriculture but supported by<br />

small research communities. At a more basic<br />

level, dissection <strong>of</strong> genome sequence data reveals<br />

the structure, architecture, and evolution<br />

<strong>of</strong> important gene families and enables the<br />

identification <strong>of</strong> orthologous versus paralogous<br />

relationships. Complete genome sequences<br />

also reveal legume- and species-specific genes<br />

whose functions remain largely unknown,<br />

although unquestionably important. Gene and<br />

genome duplications, so critical in shaping<br />

plant genomes, contain intrinsic information<br />

that can be exploited to predict function and<br />

the structure <strong>of</strong> genetic networks. Candidate<br />

gene discovery based on the papilionoid WGD<br />

is a promising example. In legumes, applying<br />

these strategies to nodulation and seed development<br />

will be especially critical. Additional<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 299


sequencing and resequencing <strong>of</strong> legume species<br />

will make this possible, but inevitably, it is<br />

the research community’s capacity to develop<br />

imaginative strategies for exploiting massive<br />

sequence data that will move legume genomics<br />

from the computer to biology.<br />

SUMMARY POINTS<br />

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

1. The genome sequences <strong>of</strong> three legumes—Glycine max, Medicago truncatula, andLotus<br />

japonicus—have recently been completed, and they illustrate a history <strong>of</strong> whole-genome<br />

duplication with important implications in legume biology. Glycine, in particular, underwent<br />

a genome duplication event within the past 13 million years that is strikingly<br />

evident in its genome architecture.<br />

2. Most agriculturally important legume crops, including so-called orphan species, are phylogenetically<br />

close to Glycine, Medicago,andLotus. Consequently, translational genomics<br />

to orphaned legumes should be straightforward and practically useful. It also means<br />

that major clades <strong>of</strong> more distant legumes remain largely unexplored from a genomic<br />

perspective.<br />

3. Analysis <strong>of</strong> legume genome sequence reveals hundreds <strong>of</strong> family-specific genes not observed<br />

in other angiosperms. They include a large group <strong>of</strong> defensin-like peptide genes<br />

seen only in Medicago and its close relatives that are exclusively expressed in nodules and<br />

in some cases play important roles in rhizobial differentiation.<br />

4. The aftermath <strong>of</strong> genome duplication in legumes involves extensive gene fractionation,<br />

especially in the lineage leading to Medicago and Lotus, as well as apparent examples <strong>of</strong><br />

sub- and ne<strong>of</strong>unctionalization. In some cases, products <strong>of</strong> whole-genome duplication<br />

have contributed to the elaboration <strong>of</strong> a preexisting capacity for rhizobial nodulation.<br />

DISCLOSURE STATEMENT<br />

N.D.Y. is principal investigator <strong>of</strong> a National Science Foundation Plant <strong>Genome</strong> Research Program<br />

grant that supported the sequencing <strong>of</strong> M. truncatula and later the development <strong>of</strong> an<br />

M. truncatula HapMap platform.<br />

ACKNOWLEDGMENTS<br />

We thank Doug Cook, Rene Geurts, and R. Op den Camp for helpful discussions relating to<br />

unpublished work; Robert Stupar for his review <strong>of</strong> the manuscript; and Sebastian Proost and Yves<br />

Van der Peer for preliminary analyses involving the PLAZA platform.<br />

LITERATURE CITED<br />

1. Ahn S, Tanksley SD. 1993. Comparative linkage maps <strong>of</strong> the rice and maize genomes. Proc. Natl. Acad.<br />

Sci. USA 90:7980–84<br />

2. Alkan C, Sajjadian S, Eichler EE. 2010. Limitations <strong>of</strong> next-generation genome sequence assembly. Nat.<br />

Methods 8:61–65<br />

3. Arabidopsis <strong>Genome</strong> Init. 2000. Analysis <strong>of</strong> the genome sequence <strong>of</strong> the flowering plant Arabidopsis<br />

thaliana. Nature 408:796–815<br />

300 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

4. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, et al. 2010. <strong>Genome</strong>-wide association<br />

study <strong>of</strong> 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–31<br />

5. Bertioli DJ, Moretzsohn MC, Madsen LH, Sandal N, Leal-Bertioli SC, et al. 2009. An analysis<br />

<strong>of</strong> synteny <strong>of</strong> Arachis with Lotus and Medicago sheds new light on the structure, stability and<br />

evolution <strong>of</strong> legume genomes. BMC Genomics 10:45<br />

6. Blanc G, Wolfe KH. 2004. Functional divergence <strong>of</strong> duplicated genes formed by polyploidy during<br />

Arabidopsis evolution. Plant Cell 16:1679–91<br />

7. Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions<br />

<strong>of</strong> duplicate genes. Plant Cell 16:1667–78<br />

8. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs<br />

using SSPACE. Bioinformatics 27:578–79<br />

9. Bonfante P, Genre A. 2008. Plants and arbuscular mycorrhizal fungi: an evolutionary-developmental<br />

perspective. Trends Plant Sci. 13:492–98<br />

10. Boutin SR, Young ND, Olson TC, Yu ZH, Vallejos CE, Shoemaker RC. 1995. <strong>Genome</strong> conservation<br />

among three legume genera detected with DNA markers. <strong>Genome</strong> 38:928–37<br />

11. Branca A, Paape T, Zhou P, Briskine R, Farmer AD, et al. 2011. Whole-genome nucleotide diversity,<br />

recombination, and linkage-disequilibrium in the model legume Medicago truncatula. Proc. Natl. Acad.<br />

Sci. USA 108:E864–70<br />

12. Cannon SB, Ilut D, Farmer AD, Maki SL, May GD, et al. 2010. Polyploidy did not predate the<br />

evolution <strong>of</strong> nodulation in all legumes. PLoS ONE 5:e11630<br />

13. Cannon SB, May GD, Jackson SA. 2009. Three sequenced legume genomes and many crop species: rich<br />

opportunities for translational genomics. Plant Physiol. 151:970–77<br />

14. Cannon SB, Mitra A, Baumgarten A, Young ND, May G. 2004. The roles <strong>of</strong> segmental and tandem<br />

gene duplication in the evolution <strong>of</strong> large gene families in Arabidopsis thaliana. BMC Plant Biol. 4:10<br />

15. Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, et al. 2006. <strong>Legume</strong> evolution viewed through<br />

the Medicago truncatula and Lotus japonicus genomes. Proc. Natl. Acad. Sci. USA 103:14959–64<br />

16. Choi HK, Mun JH, Kim DJ, Zhu H, Baek JM, et al. 2004. Estimating genome conservation between<br />

crop and model legume species. Proc. Natl. Acad. Sci. USA 101:15289–94<br />

17. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. 2007. Common sequence polymorphisms<br />

shaping genetic diversity in Arabidopsis thaliana. Science 317:338–42<br />

18. Córdoba JM, Chavarro C, Schlueter JA, Jackson SA, Blair MW. 2010. Integration <strong>of</strong> physical and genetic<br />

maps <strong>of</strong> common bean through BAC-derived microsatellite markers. BMC Genomics 11:436<br />

19. Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S, et al. 2008. Detection and validation <strong>of</strong> single<br />

feature polymorphisms in cowpea (Vigna unguiculata L. Walp) using a soybean genome array. BMC<br />

Genomics 9:107<br />

20. Devos KM, Gale MD. 2000. <strong>Genome</strong> relationships: the grass model in current research. Plant Cell<br />

12:637–46<br />

21. Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, et al. 2008. Evolutionary genetics <strong>of</strong> genome<br />

merger and doubling in plants. Annu. Rev. Genet. 42:443–61<br />

22. Fawcett JA, Maere S, Vandepeer Y. 2009. Plants with double genomes might have had a better chance<br />

to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106:5737–42<br />

23. Force A, Lynch M, Pickett FB, Amores A, Yan YL, et al. 1999. Preservation <strong>of</strong> duplicate genes by<br />

complementary, degenerative mutations. Genetics 151:1531–45<br />

24. Freeling M. 2009. Bias in plant gene content following different sorts <strong>of</strong> duplication: tandem, wholegenome,<br />

segmental, or by transposition. Annu. Rev. Plant Biol. 60:433–53<br />

25. Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, et al. 2010. Population genomic<br />

analysis <strong>of</strong> Tunisian Medicago truncatula reveals candidates for local adaptation. Plant J. 63:623–<br />

35<br />

26. Gale MD, Devos KM. 1998. Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95:1971–74<br />

27. Gao AG, Hakimi SM, Mittanck CA, Wu Y, Woerner BM, et al. 2000. Fungal pathogen protection in<br />

potato by expression <strong>of</strong> a plant defensin peptide. Nat. Biotechnol. 18:1307–131<br />

5. Demonstrates that<br />

papilionoid genome<br />

duplication is shared<br />

with distant Arachis,<br />

which shows extensive<br />

synteny with sequenced<br />

legumes.<br />

12. Shows that legume<br />

genome duplication<br />

apparently occurred<br />

only within the<br />

papilionoid lineage, and<br />

not within the<br />

Mimosoideae or<br />

Caesalpinioideae<br />

subfamilies.<br />

25. Utilizes a genome<br />

association mapping<br />

approach to<br />

characterize salt<br />

tolerance in a natural<br />

population.<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 301


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

45. Describes<br />

next-generation<br />

sequencing <strong>of</strong> a wild<br />

soybean relative and<br />

extensive<br />

characterization <strong>of</strong><br />

genome differences<br />

between species.<br />

28. Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, et al. 2011. Gene discovery and tissue-specific<br />

transcriptome analysis in chickpea with massively parallel pyrosequencing and Web resource development.<br />

Plant Physiol. 156:1661–78<br />

29. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, et al. 2011. High-quality draft assemblies<br />

<strong>of</strong> mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108:1513–18<br />

30. Gomez SK, Javot H, Deewatthanawong PD, Torres-Jerez I, Tang Y, et al. 2009. Medicago truncatula and<br />

Glomus intraradices gene expression in cortical cells harboring arbuscules in the arbuscular mycorrhizal<br />

symbiosis. BMC Plant Biol. 9:10<br />

31. Graham MA, Silverstein KA, Cannon SB, VandenBosch KA. 2004. Computational identification and<br />

characterization <strong>of</strong> novel genes from legumes. Plant Physiol. 135:1179–97<br />

32. Graham PH, Vance CP. 2003. <strong>Legume</strong>s: importance and constraints to greater use. Plant Physiol.<br />

131:872–77<br />

33. Han Y, Kang Y, Torres-Jerez I, Cheung F, Town CD, et al. 2011. <strong>Genome</strong>-wide SNP discovery in<br />

tetraploid alfalfa using 454 sequencing and high resolution melting analysis. BMC Genomics 12:350<br />

34. Hiremath PJ, Farmer A, Cannon SB, Woodward J, Kudapa H, et al. 2011. Large-scale transcriptome<br />

analysis in chickpea (Cicer arietinum L.), an orphan legume crop <strong>of</strong> the semi-arid tropics <strong>of</strong> Asia and<br />

Africa. Plant Biotechnol. J. 9:922–31<br />

35. Hougaard BK, Madsen LH, Sandal N, de Carvalho Moretzsohn M, Fredslund J, et al. 2008. <strong>Legume</strong><br />

anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and<br />

Arachis. Genetics 179:2299–312<br />

36. Huang X, Wei X, Sang T, Zhao Q, Feng Q, et al. 2010. <strong>Genome</strong>-wide association studies <strong>of</strong> 14 agronomic<br />

traits in rice landraces. Nat. Genet. 42:961–67<br />

37. Huang Z-W, Zhao T-J, Yu D-Y, Chen S-Y, Gai J-Y. 2008. Correlation and QTL mapping <strong>of</strong> biomass<br />

accumulation, apparent harvest index, and yield in soybean. Acta Agron. Sin. 34:944–51<br />

38. Imelfort M, Edwards D. 2009. De novo sequencing <strong>of</strong> plant genomes using second-generation technologies.<br />

Brief. Bioinforma. 10:609–18<br />

39. Innes RW, Ameline-Torregrosa C, Ashfield T, Cannon E, Cannon SB, et al. 2008. Differential accumulation<br />

<strong>of</strong> retroelements and diversification <strong>of</strong> NB-LRR disease resistance genes in duplicated regions<br />

following polyploidy in the ancestor <strong>of</strong> soybean. Plant Physiol. 148:1740–59<br />

40. Int. Rice <strong>Genome</strong> Seq. Proj. 2005. The map-based sequence <strong>of</strong> the rice genome. Nature 436:793–800<br />

41. Jaillon O, Aury JM, Nöel B, Policriti A, Clepet C, et al. 2007. The grapevine genome sequence suggests<br />

ancestral hexaploidization in major angiosperm phyla. Nature 449:463–67<br />

42. Kaló P, Seres A, Taylor SA, Jakab J, Kevei Z, et al. 2004. Comparative mapping between Medicago sativa<br />

and Pisum sativum. Mol. Genet. Genomics 272:235–46<br />

43. Kamphuis LG, Williams AH, D’Souza NK, Pfaff T, Ellwood SR, et al. 2007. The Medicago truncatula<br />

reference accession A17 has an aberrant chromosomal configuration. New Phytol. 174:299–303<br />

44. Kim KD, Shin JH, Van K, Kim DH, Lee SH. 2009. Dynamic rearrangements determine genome<br />

organization and useful traits in soybean. Plant Physiol. 151:1066–76<br />

45. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, et al. 2010. Whole-genome sequencing and<br />

intensive analysis <strong>of</strong> the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc.<br />

Natl. Acad. Sci. USA 107:22032–37<br />

46. Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, et al. 2007. Recombination and linkage disequilibrium<br />

in Arabidopsis thaliana. Nat. Genet. 39:1151–55<br />

47. Kinzig AP, Socolow RH. 1994. Human impacts on the nitrogen cycle. Phys. Today 47:24–35<br />

48. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, et al. 2009. Circos: an information aesthetic<br />

for comparative genomics. <strong>Genome</strong> Res. 19:1639–45<br />

49. Kulikova O, Gualtieri G, Geurts R, Kim DJ, Cook D, et al. 2001. Integration <strong>of</strong> the FISH pachytene<br />

and genetic maps <strong>of</strong> Medicago truncatula. Plant J. 27:49–58<br />

50. Lam H-M, Xu X, Lui X, Chen W, Yang G, et al. 2010. Resequencing <strong>of</strong> 31 wild and cultivated soybean<br />

genomes identifies patterns <strong>of</strong> genetic diversity and selection. Nat. Genet. 42:1053–59<br />

51. Langham RJ, Walsh J, Dunn M, Ko C, G<strong>of</strong>f SA, et al. 2004. Genomic duplication, fractionation and the<br />

origin <strong>of</strong> regulatory novelty. Genetics 166:935–45<br />

302 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

52. Lavin M, Herendeen PS, Wojciechowski MF. 2005. Evolutionary rates analysis <strong>of</strong> Leguminosae implicates<br />

a rapid diversification <strong>of</strong> lineages during the tertiary. Syst. Biol. 54:575–94<br />

53. Li H, Liu H, Han Y, Wu X, Teng W, et al. 2010. Identification <strong>of</strong> QTL underlying vitamin E contents<br />

in soybean seed among multiple environments. Theor. Appl. Genet. 120:1405–13<br />

54. Lin JY, Stupar RM, Hans C, Hyten DL, Jackson SA. 2010. Structural and functional divergence <strong>of</strong> a<br />

1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region<br />

from Phaseolus vulgaris. Plant Cell 22:2545–61<br />

55. Lynch M, O’Hely M, Walsh B, Force A. 2001. The probability <strong>of</strong> preservation <strong>of</strong> a newly arisen gene<br />

duplicate. Genetics 159:1789–804<br />

56. McClean PE, Mamidi S, McConnell M, Chikara S, Lee R. 2010. Synteny mapping between common<br />

bean and soybean reveals extensive blocks <strong>of</strong> shared loci. BMC Genomics 11:184<br />

57. Metzker ML. 2009. Sequencing technologies—the next generation. Nat. Rev. Genet. 11:31–46<br />

58. Meyers BC, Kaushik S, Nandety RS. 2005. Evolving disease resistance genes. Curr. Opin. Plant Biol.<br />

8:129–134<br />

59. Miller JR, Koren S, Sutton G. 2010. Assembly algorithms for next-generation sequencing data. Genomics<br />

95:315–27<br />

60. Moore G, Devos KM, Wang Z, Gale MD. 1995. Grasses, line up and form a circle. Curr. Biol. 5:737–39<br />

61. Muchero W, Diop NN, Bhat PR, Fenton RD, Wanamaker S, et al. 2009. A consensus genetic map <strong>of</strong><br />

cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proc. Natl. Acad. Sci.<br />

USA 106:18159–64<br />

62. Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, et al. 2005. Highly syntenic regions in the genomes<br />

<strong>of</strong> soybean, Medicago truncatula, andArabidopsis thaliana. BMC Plant Biol. 5:15<br />

63. Munroe DJ, Harris TJ. 2010. Third-generation sequencing fireworks at Marco Island. Nat. Biotechnol.<br />

28:426–28<br />

64. Nordborg M, Weigel D. 2008. Next-generation genetics in plants. Nature 456:720–23<br />

65. Nayak SN, Zhu H, Varghese N, Datta S, Choi HK, et al. 2010. Integration <strong>of</strong> novel SSR and gene-based<br />

SNP marker loci in the chickpea genetic map and establishment <strong>of</strong> new anchor points with Medicago<br />

truncatula genome. Theor. Appl. Genet. 120:1415–41<br />

66. Oldroyd GE, Downie JA. 2008. Coordinating nodule morphogenesis with rhizobial infection in legumes.<br />

Annu. Rev. Plant Biol. 59:519–46<br />

67. Op den Camp RHM, De Mita S, Lillo A, Cao Q, Limpens E, et al. 2011. A phylogenetic strategy based<br />

on a legume-specific whole genome duplication yields symbiotic cytokinin type-A response regulators.<br />

Plant Physiol. 157:2013–22<br />

68. Op den Camp RHM, Streng A, De Mita S, Cao Q, Polone E, et al. 2011. LysM-type mycorrhizal<br />

receptor recruited for rhizobium symbiosis in nonlegume Parasponia. Science 331:909–12<br />

69. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, et al. 2008. Sequencing <strong>of</strong> natural<br />

strains <strong>of</strong> Arabidopsis thaliana with short reads. <strong>Genome</strong> Res. 18:2024–33<br />

70. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, et al. 2009. The Sorghum bicolor<br />

genome and the diversification <strong>of</strong> grasses. Nature 457:551–56<br />

71. Paterson AH, Chapman BA, Kissinger JC, Bowers JE, et al. 2006. Many gene and domain families<br />

have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza,<br />

Saccharomyces and Tetraodon. Trends Genet. 22:597–602<br />

72. Paterson AH, Freeling M, Tang H, Wang X. 2010. <strong>Insights</strong> from the comparison <strong>of</strong> plant genome<br />

sequences. Annu. Rev. Plant Biol. 61:349–72<br />

73. Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ. 2005. Placing paleopolyploidy in relation to taxon<br />

divergence: a phylogenetic analysis in legumes using 39 gene families. Syst. Biol. 54:441–54<br />

74. Polhill RM. 1981. Papilionoideae. In Advances in <strong>Legume</strong> Systematics, Part 1, ed. RM Polhill, PH Raven,<br />

pp. 191–208. Kew, UK: R. Bot. Gard.<br />

75. Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, et al. 2009. PLAZA: a comparative genomics<br />

resource to study gene and genome evolution in plants. Plant Cell 21:3718–31<br />

76. Ratnaparkhe MB, Wang X, Li J, Compton RO, Rainville LK, et al. 2011. Comparative analysis <strong>of</strong> peanut<br />

NBS-LRR gene clusters suggests evolutionary innovation among duplicated domains and erosion <strong>of</strong> gene<br />

microsynteny. New Phytol. 192:164–78<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 303


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

79. Provides the initial<br />

report <strong>of</strong> the Lotus<br />

japonicus genome<br />

sequence.<br />

81. Provides the initial<br />

report <strong>of</strong> the Glycine<br />

max genome sequence.<br />

87. Gives an overview <strong>of</strong><br />

an alternative legume,<br />

Chamaecrista, found<br />

within one <strong>of</strong> the clades<br />

not generally targeted<br />

for genomic analysis.<br />

100. Provides the initial<br />

report <strong>of</strong> the Medicago<br />

truncatula genome<br />

sequence.<br />

77. Rausch T, Koren S, Denisov G, Weese D, Emde AK, et al. 2009. A consistency-based consensus algorithm<br />

for de novo and reference-guided sequence assembly <strong>of</strong> short reads. Bioinformatics 25:1118–24<br />

78. Sato S, Isobe S, Tabata S. 2010. Structural analyses <strong>of</strong> the genomes in legumes. Curr. Opin. Plant Biol.<br />

13:1–7<br />

79. Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, et al. 2008. <strong>Genome</strong> structure <strong>of</strong> the legume,<br />

Lotus japonicus. DNA Res. 15:1–8<br />

80. Schlueter JA, Dixon P, Granger C, Grant D, Clark L, et al. 2004. Mining EST databases to resolve<br />

evolutionary events in major crop species. <strong>Genome</strong> 47:868–76<br />

81. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, et al. 2010. <strong>Genome</strong> sequence <strong>of</strong> the<br />

palaeopolyploid soybean. Nature 463:178–83<br />

82. Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, et al. 2011. Reference-guided assembly <strong>of</strong> four<br />

diverse Arabidopsis thaliana genomes. Proc. Natl. Acad. Sci. USA 108:10249–54<br />

83. Shin JH, Van K, Kim DH, Kim KD, Jang YE, et al. 2008. The lipoxygenase gene family: a genomic<br />

fossil <strong>of</strong> shared polyploidy between Glycine max and Medicago truncatula. BMC Plant Biol. 8:133<br />

84. Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, et al. 1996. <strong>Genome</strong> duplication in soybean<br />

(Glycine subgenus soja). Genetics 144:329–38<br />

85. Shoemaker RC, Schlueter J, Doyle JJ. 2006. Paleopolyploidy and gene duplication in soybean and other<br />

legumes. Curr. Opin. Plant Biol. 9:104–9<br />

86. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, et al. 2009. ABySS: a parallel assembler for<br />

short read sequence data. <strong>Genome</strong> Res. 19:1117–23<br />

87. Singer SR, Maki SL, Farmer AD, Ilut D, May GD, et al. 2009. Venturing beyond beans and peas:<br />

What can we learn from Chamaecrista? Plant Physiol. 151:1041–47<br />

88. Soltis DE, Soltis PS, Morgan DR, Swensen SM, Mullin BC, et al. 1995. Chloroplast gene sequence data<br />

suggest a single origin <strong>of</strong> the predisposition for symbiotic nitrogen fixation in angiosperms. Proc. Natl.<br />

Acad. Sci. USA 92:2647–51<br />

89. Sprent JI. 2008. 60 Ma <strong>of</strong> legume nodulation: What’s new? What’s changing? J. Exp. Bot. 59:1081–84<br />

90. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, et al. 2011. <strong>Genome</strong>-wide association study <strong>of</strong> leaf<br />

architecture in the maize nested association mapping population. Nat. Genet. 43:159–62<br />

91. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, et al. 2006. The genome <strong>of</strong> black cottonwood,<br />

Populus trichocarpa (Torr. & Gray). Science 313:1596–604<br />

92. Van de Velde W, Zehirov G, Szatmari A, Debreczeny M, Ishihara H, et al. 2010. Plant peptides govern<br />

terminal differentiation <strong>of</strong> bacteria in symbiosis. Science 327:1122–26<br />

93. van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, et al. 2011. Sequence-based physical<br />

mapping <strong>of</strong> complex genomes by whole genome pr<strong>of</strong>iling. <strong>Genome</strong> Res. 21:618–25<br />

94. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, et al. 2012. Draft genome sequence <strong>of</strong> pigeonpea<br />

(Cajanus cajan), an orphan legume crop <strong>of</strong> resource-poor farmers. Nat. Biotechnol. 30:83–89<br />

95. Varshney RK, Close TJ, Singh NK, Hoisington DA, Cook DR. 2009. Orphan legume crops enter the<br />

genomics era! Curr. Opin. Plant Biol. 12:202–10<br />

96. Vernié T, Moreau S, de Billy F, Plet J, Combier JP, et al. 2008. EFD is an ERF transcription factor<br />

involved in the control <strong>of</strong> nodule number and differentiation in Medicago truncatula. Plant Cell 20:2696–<br />

713<br />

97. Wojciechowski MF, Sanderson MJ, Steele KP, Liston A. 2000. Molecular phylogeny <strong>of</strong> the “temperate<br />

herbaceous tribes” <strong>of</strong> papilionoid legumes: a supertree approach. In Advances in <strong>Legume</strong> Systematics, Part<br />

9, ed. PS Herendeen, A Bruneau, pp. 277–98. Kew, UK: R. Bot. Gard.<br />

98. Yang S, Feng Z, Zhang X, Jiang K, Jin X, et al. 2006. <strong>Genome</strong>-wide investigation on the genetic variations<br />

<strong>of</strong> rice disease resistance genes. Plant Mol. Biol. 62:181–83<br />

99. Yang S, Gao M, Xu C, Gao J, Deshpande S, et al. 2008. Alfalfa benefits from Medicago truncatula: the<br />

RCT1 gene from M. truncatula confers broad-spectrum resistance to anthracnose in alfalfa. Proc. Natl.<br />

Acad. Sci. USA 105:12164–69<br />

100. Young N, Debellé F, Oldroyd G, Geurts R, Cannon SB, et al. 2011. The Medicago genome<br />

provides insight <strong>into</strong> the evolution <strong>of</strong> rhizobial symbioses. Nature 480:520–24<br />

101. Young ND, Udvardi M. 2009. Translating Medicago truncatula genomics to crop legumes. Curr. Opin.<br />

Plant Biol. 12:193–201<br />

304 Young·Bharti


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

102. Zhang M, Wu YH, Lee MK, Liu YH, Rong Y, et al. 2010. Numbers <strong>of</strong> genes in the NBS and RLK<br />

families vary by more than four-fold within a plant species and are regulated by multiple factors. Nucleic<br />

Acids Res. 38:6513–25<br />

103. Zhang XC, Wu X, Findley S, Wan J, Libault M, et al. 2007. Molecular evolution <strong>of</strong> lysin motif-type<br />

receptor-like kinases in plants. Plant Physiol. 144:623–36<br />

104. Zhou S, Bechner MC, Place M, Churas CP, Pape L, et al. 2007. Validation <strong>of</strong> rice genome sequences<br />

by optical mapping. BMC Genomics 15:278<br />

www.annualreviews.org • <strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong> 305


Contents<br />

Annual Review <strong>of</strong><br />

Plant <strong>Biology</strong><br />

Volume 63, 2012<br />

Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

There Ought to Be an Equation for That<br />

Joseph A. Berry ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣1<br />

Photorespiration and the Evolution <strong>of</strong> C 4 Photosynthesis<br />

Rowan F. Sage, Tammy L. Sage, and Ferit Kocacinar ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣19<br />

The Evolution <strong>of</strong> Flavin-Binding Photoreceptors: An Ancient<br />

Chromophore Serving Trendy Blue-Light Sensors<br />

Aba Losi and Wolfgang Gärtner ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣49<br />

The Shikimate Pathway and Aromatic Amino Acid Biosynthesis<br />

in Plants<br />

Hiroshi Maeda and Natalia Dudareva ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣73<br />

Regulation <strong>of</strong> Seed Germination and Seedling Growth by Chemical<br />

Signals from Burning Vegetation<br />

David C. Nelson, Gavin R. Flematti, Emilio L. Ghisalberti, Kingsley W. Dixon,<br />

and Steven M. Smith ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣107<br />

Iron Uptake, Translocation, and Regulation in Higher Plants<br />

Takanori Kobayashi and Naoko K. Nishizawa ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣131<br />

Plant Nitrogen Assimilation and Use Efficiency<br />

Guohua Xu, Xiaorong Fan, and Anthony J. Miller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣153<br />

Vacuolar Transporters in Their Physiological Context<br />

Enrico Martinoia, Stefan Meyer, Alexis De Angeli, and Réka Nagy ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣183<br />

Autophagy: Pathways for Self-Eating in Plant Cells<br />

Yimo Liu and Diane C. Bassham ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣215<br />

Plasmodesmata Paradigm Shift: Regulation from Without<br />

Versus Within<br />

Tessa M. Burch-Smith and Patricia C. Zambryski ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣239<br />

Small Molecules Present Large Opportunities in Plant <strong>Biology</strong><br />

Glenn R. Hicks and Natasha V. Raikhel ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣261<br />

<strong>Genome</strong>-<strong>Enabled</strong> <strong>Insights</strong> <strong>into</strong> <strong>Legume</strong> <strong>Biology</strong><br />

Nevin D. Young and Arvind K. Bharti ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣283<br />

v


Annu. Rev. Plant Biol. 2012.63:283-305. Downloaded from www.annualreviews.org<br />

by <strong>University</strong> <strong>of</strong> Minnesota - Twin Cities - Wilson Library on 05/07/12. For personal use only.<br />

Synthetic Chromosome Platforms in Plants<br />

Robert T. Gaeta, Rick E. Masonbrink, Lakshminarasimhan Krishnaswamy,<br />

Changzeng Zhao, and James A. Birchler ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣307<br />

Epigenetic Mechanisms Underlying Genomic Imprinting in Plants<br />

Claudia Köhler, Philip Wolff, and Charles Spillane ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣331<br />

Cytokinin Signaling Networks<br />

Ildoo Hwang, Jen Sheen, and Bruno Müller ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣353<br />

Growth Control and Cell Wall Signaling in Plants<br />

Sebastian Wolf, Kian Hématy, and Herman Höfte ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣381<br />

Phosphoinositide Signaling<br />

Wendy F. Boss and Yang Ju Im ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣409<br />

Plant Defense Against Herbivores: Chemical Aspects<br />

Axel Mithöfer and Wilhelm Boland ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣431<br />

Plant Innate Immunity: Perception <strong>of</strong> Conserved Microbial Signatures<br />

Benjamin Schwessinger and Pamela C. Ronald ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣451<br />

Early Embryogenesis in Flowering Plants: Setting Up<br />

the Basic Body Pattern<br />

Steffen Lau, Daniel Slane, Ole Herud, Jixiang Kong, and Gerd Jürgens<br />

♣♣♣♣♣♣♣♣♣♣♣♣♣♣483<br />

Seed Germination and Vigor<br />

Loïc Rajjou, Manuel Duval, Karine Gallardo, Julie Catusse, Julia Bally,<br />

Claudette Job, and Dominique Job ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣507<br />

A New Development: Evolving Concepts in Leaf Ontogeny<br />

Brad T. Townsley and Neelima R. Sinha ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣535<br />

Control <strong>of</strong> Arabidopsis Root Development<br />

Jalean J. Petricka, Cara M. Winter, and Philip N. Benfey ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣563<br />

Mechanisms <strong>of</strong> Stomatal Development<br />

Lynn Jo Pillitteri and Keiko U. Torii ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣591<br />

Plant Stem Cell Niches<br />

Ernst Aichinger, Noortje Kornet, Thomas Friedrich, and Thomas Laux ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣615<br />

The Effects <strong>of</strong> Tropospheric Ozone on Net Primary Productivity<br />

and Implications for Climate Change<br />

Elizabeth A. Ainsworth, Craig R. Yendrek, Stephen Sitch, William J. Collins,<br />

and Lisa D. Emberson ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣637<br />

Quantitative Imaging with Fluorescent Biosensors<br />

Sakiko Okumoto, Alexander Jones, and Wolf B. Frommer<br />

♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣663<br />

vi<br />

Contents

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!