03.11.2012 Views

Comparative day/night metatranscriptomic analysis of microbial ...

Comparative day/night metatranscriptomic analysis of microbial ...

Comparative day/night metatranscriptomic analysis of microbial ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Environmental Microbiology (2009) 11(6), 1358–1375 doi:10.1111/j.1462-2920.2008.01863.x<br />

<strong>Comparative</strong> <strong>day</strong>/<strong>night</strong> <strong>metatranscriptomic</strong> <strong>analysis</strong><br />

<strong>of</strong> <strong>microbial</strong> communities in the North Pacific<br />

subtropical gyreemi_1863 1358..1375<br />

Rachel S. Poretsky, 1 Ian Hewson, 2 Shulei Sun, 1<br />

Andrew E. Allen, 3 Jonathan P. Zehr2 and<br />

Mary Ann Moran1 *<br />

1University <strong>of</strong> Georgia, Department <strong>of</strong> Marine Sciences,<br />

Athens, GA 30602, USA.<br />

2University <strong>of</strong> California Santa Cruz, Department <strong>of</strong><br />

Ocean Sciences, Santa Cruz, CA 95064, USA.<br />

3J. Craig Venter Institute, Microbial and Environmental<br />

Genomics, San Diego, CA 92121, USA.<br />

Summary<br />

Metatranscriptomic analyses <strong>of</strong> <strong>microbial</strong> assemblages<br />

(< 5 mm) from surface water at the Hawaiian<br />

Ocean Time-Series (HOT) revealed community-wide<br />

metabolic activities and <strong>day</strong>/<strong>night</strong> patterns <strong>of</strong> differential<br />

gene expression. Pyrosequencing produced<br />

75 558 putative mRNA reads from a <strong>day</strong> transcriptome<br />

and 75 946 from a <strong>night</strong> transcriptome. Taxonomic<br />

binning <strong>of</strong> annotated mRNAs indicated that Cyanobacteria<br />

contributed a greater percentage <strong>of</strong> the transcripts<br />

(54% <strong>of</strong> annotated sequences) than expected<br />

based on abundance (35% <strong>of</strong> cell counts and 21% 16S<br />

rRNA <strong>of</strong> libraries), and may represent the most<br />

actively transcribing cells in this surface ocean community<br />

in both the <strong>day</strong> and <strong>night</strong>. Major heterotrophic<br />

taxa contributing to the community transcriptome<br />

included a-Proteobacteria (19% <strong>of</strong> annotated<br />

sequences, most <strong>of</strong> which were SAR11-related) and<br />

g-Proteobacteria (4%). The composition <strong>of</strong> transcript<br />

pools was consistent with models <strong>of</strong> prokaryotic gene<br />

expression, including operon-based transcription<br />

patterns and an abundance <strong>of</strong> genes predicted to be<br />

highly expressed. Metabolic activities that are shared<br />

by many <strong>microbial</strong> taxa (e.g. glycolysis, citric acid<br />

cycle, amino acid biosynthesis and transcription and<br />

translation machinery) were well represented among<br />

the community transcripts. There was an overabundance<br />

<strong>of</strong> transcripts for photosynthesis, C1<br />

metabolism and oxidative phosphorylation in the<br />

Received 17 September, 2008; accepted 3 December, 2008. *For<br />

correspondence. E-mail mmoran@uga.edu; Tel. 706-542-6481; Fax<br />

706-542-5888.<br />

<strong>day</strong> compared with <strong>night</strong>, and evidence that energy<br />

acquisition is coordinated with solar radiation levels<br />

for both autotrophic and heterotrophic microbes. In<br />

contrast, housekeeping activities such as amino acid<br />

biosynthesis, membrane synthesis and repair, and<br />

vitamin biosynthesis were overrepresented in the<br />

<strong>night</strong> transcriptome. Direct sequencing <strong>of</strong> these environmental<br />

transcripts has provided detailed information<br />

on metabolic and biogeochemical responses <strong>of</strong> a<br />

<strong>microbial</strong> community to solar forcing.<br />

Introduction<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd<br />

Oceanic subtropical gyres make up 40% <strong>of</strong> the Earth’s<br />

surface and play critical roles in carbon fixation and nutrient<br />

cycling. The Hawaii Ocean Time-Series (HOT) in the North<br />

Pacific subtropical gyre was established to provide a longterm<br />

perspective on oceanographic properties <strong>of</strong> such<br />

systems (Karl and Lukas, 1996) and has served as the<br />

focus <strong>of</strong> substantial research into the role <strong>of</strong> marine microorganisms<br />

in ocean biogeochemistry (Karl et al., 1997;<br />

Cavender-Bares et al., 2001; Zehr et al., 2001). Station<br />

ALOHA, the core study site at HOT, is characterized by<br />

warm (> 23°C) surface waters with low NO3 - concentrations<br />

(< 15 nM), seasonally variable surface mixed-layers<br />

(10–120 m), low standing biomass <strong>of</strong> living organisms<br />

(10–15 mgCl -1 ) and a persistent deep (75–140 m) chlorophyll<br />

a maximum layer. Since 1988, regular measurements<br />

<strong>of</strong> physical, chemical and biological parameters have been<br />

obtained with monthly ship-based monitoring as well as<br />

bottom-moored instruments and buoys. Recent metagenomic<br />

sampling efforts at Station ALOHA have provided<br />

information about the genes harboured by the bacterioplankton<br />

community and how they are distributed with<br />

depth (DeLong et al., 2006). Characterizing patterns <strong>of</strong><br />

expression <strong>of</strong> these <strong>microbial</strong> genes and identifying what<br />

factors induce their expression is the next critical step in<br />

understanding this oceanic ecosystem.<br />

Analogous to metagenomics, environmental transcriptomics<br />

(<strong>metatranscriptomic</strong>s) retrieves and sequences<br />

environmental mRNAs from a <strong>microbial</strong> assemblage<br />

without prior knowledge <strong>of</strong> what genes the community<br />

might be expressing (Poretsky et al., 2005; Frias-Lopez<br />

et al., 2008). Thus it provides a less biased perspective on


<strong>microbial</strong> gene expression in situ compared with other<br />

approaches (Wawrik et al., 2002; Bürgmann et al., 2003;<br />

Zhou, 2003). Environmental transcriptomics protocols are<br />

technically difficult, however, as prokaryotic mRNAs generally<br />

lack the poly(A) tails that make isolation <strong>of</strong> eukaryotic<br />

messages relatively straightforward (Liang and<br />

Pardee, 1992) and because <strong>of</strong> the relatively short halflives<br />

<strong>of</strong> mRNAs (Belasco, 1993). In addition, mRNAs are<br />

much less abundant than rRNAs in total RNA extracts,<br />

thus an rRNA background <strong>of</strong>ten overwhelms mRNA<br />

signals.<br />

A first <strong>analysis</strong> <strong>of</strong> environmental transcriptomes by creating<br />

clone libraries using random primers to reversetranscribe<br />

and amplify environmental mRNAs was<br />

successful in two different natural environments<br />

(Poretsky et al., 2005), but results were biased by selection<br />

<strong>of</strong> the random primers used to initiate cDNA synthesis.<br />

Techniques to linearly amplify mRNA obviate the<br />

need for random primers in the amplification step and<br />

make it possible to use less starting material (Gelder<br />

et al., 1990), while recently developed pyrosequencing<br />

technologies allow direct sequencing (without cloning)<br />

(Margulies et al., 2005). Initial application <strong>of</strong> this<br />

approach at Station ALOHA (Frias-Lopez et al., 2008)<br />

and in coastal water mesocosms (Gilbert et al., 2008)<br />

demonstrated its utility for characterizing <strong>microbial</strong> community<br />

gene expression.<br />

Here we use environmental transcriptomics to elucidate<br />

<strong>day</strong>/<strong>night</strong> differences in gene expression in surface<br />

waters <strong>of</strong> the North Pacific subtropical gyre (Karl and<br />

Lukas, 1996). This <strong>analysis</strong> provides information on the<br />

dominant metabolic processes within the bacterioplankton<br />

assemblages and reveals changes in expression patterns<br />

<strong>of</strong> biogeochemically relevant processes.<br />

Results<br />

cDNA sequence annotation<br />

The cDNAs prepared from amplified RNA (collected from<br />

the 0.2–5 mm size fraction) ranged in size from 100 bp to<br />

1 kb, with the majority between 200 and 500 bp. The<br />

average picoliter reactor pyrosequencing read length<br />

was 99 bp, typical for the GS 20 sequencing platform.<br />

Predicted rRNA sequences were removed based on<br />

sequence similarity to the nt database using BLASTN.<br />

While more laborious than our initial approach that used<br />

sequence similarity to the RDP II database supplemented<br />

with a 18S, 23S and 28S rRNA database from genome<br />

sequences, it identified nearly all <strong>of</strong> the rRNA sequences<br />

in our libraries. Accurate identification <strong>of</strong> rRNAs is crucial<br />

because <strong>of</strong> numerous misidentified sequences in the<br />

RefSeq protein database (i.e. rRNA sequences that are<br />

incorrectly annotated as putative proteins). Relatively low<br />

rRNA sequence contamination (37%) compared with the<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1359<br />

rRNA content <strong>of</strong> prokaryotic cells (> 80%; Ingraham et al.,<br />

1983) indicated that the steps for excluding rRNAs<br />

through selective degradation and subtractive hybridization<br />

were largely successful.<br />

Sequences remaining after deletion <strong>of</strong> rRNA<br />

sequences (75 558 from the <strong>day</strong> and 75 946 from the<br />

<strong>night</strong>) were categorized as possible protein encoding<br />

sequences and BLASTX-queried against the NCBI<br />

curated, non-redundant reference sequence database<br />

(RefSeq) to determine putative functions (Fig. 1). About<br />

one-third <strong>of</strong> HOT pyrosequences in each library met the<br />

criteria for gene predictions determined empirically by in<br />

silico <strong>analysis</strong> <strong>of</strong> known functional gene sequences fragmented<br />

into 100 bp pieces (see Experimental procedures<br />

for more details). This is nearly twice the fraction <strong>of</strong> reads<br />

identified in metagenomic efforts with similar pyrosequencing<br />

read lengths (Frias-Lopez et al., 2008; Mou<br />

et al., 2008), as might be expected for sequences biased<br />

towards coding regions <strong>of</strong> genomes. These sequences<br />

were subsequently assigned to the function <strong>of</strong> their best<br />

hit in RefSeq. Transcript abundance was analysed as<br />

relative abundance within the collective community transcriptome<br />

rather than per-gene expression levels (see<br />

Frias-Lopez et al., 2008). Empirically derived criteria were<br />

established in separate in silico analyses for the Clusters<br />

<strong>of</strong> Orthologous Groups (COG) and Kyoto Encyclopedia <strong>of</strong><br />

Genes and Genomes (KEGG) databases, which contain<br />

fewer sequences than RefSeq (Fig. 1). Some <strong>of</strong> the<br />

sequences without hits in RefSeq were similar to proteins<br />

in the Global Ocean Sampling database, indicating that<br />

similar sequences have been found in marine bacterioplankton<br />

communities, but functional annotation is not<br />

currently possible.<br />

At the end <strong>of</strong> the annotation pipeline, half <strong>of</strong> the possible<br />

protein-encoding sequences in each library had no<br />

significant hits to previously sequenced genes. To<br />

examine how sequences from uncultured marine bacterial<br />

taxa might decrease annotation success or skew<br />

taxonomic assignments, we randomly selected 100 bp<br />

sequences from the coding regions <strong>of</strong> genome fragments<br />

from SAR86 and SAR116 cells captured in environmental<br />

BAC libraries (SAR86 BAC, AF279106; SAR86 BAC,<br />

AY552545; SAR116 BAC, AY744399). Excluding selfhits,<br />

approximately 60% <strong>of</strong> the sequences from the BACs<br />

had no hits in RefSeq (Table S1). In a similar <strong>analysis</strong> <strong>of</strong><br />

coding sequences from cultured taxa with genome<br />

sequences available (Pelagibacter ubique HTCC1062<br />

and Prochlorococcus marinus MIT9312), only ~20% <strong>of</strong><br />

the sequences had no hits in RefSeq. Many unannotated<br />

sequences in the HOT libraries are therefore likely to be<br />

transcripts from poorly known taxa, but also include<br />

some transcripts from well-known taxa with poor identity<br />

to sequence databases for that particular 100 bp fragment.<br />

In support <strong>of</strong> the latter, a preliminary <strong>analysis</strong> <strong>of</strong> a<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1360 R. S. Poretsky et al.<br />

37%<br />

88,916<br />

rRNA sequences<br />

BLASTX against<br />

COG<br />

240,422 Total 454<br />

Sequences<br />

63%<br />

48,648 Identified sequences<br />

10%<br />

BLASTN against nt<br />

21%<br />

151,504 Possible proteinencoding<br />

sequences<br />

BLASTX against RefSeq<br />

BLASTX against<br />

KEGG<br />

15%<br />

24,474 35,927<br />

sequences sequences<br />

marine environmental transcriptome consisting <strong>of</strong> longer<br />

reads (~200 bp; 454 GS FLX sequencing platform; R.S.<br />

Poretsky and M.A. Moran, unpublished; and Table S1)<br />

resulted in twice the frequency <strong>of</strong> annotated sequences<br />

as the HOT metatranscriptome. For the 100 bp genome<br />

fragments from uncultured taxa that had significant hits<br />

in RefSeq, they were almost always to a gene from an<br />

organism in the same phylum (90%) or subphylum<br />

(70%), and thus did not significantly skew the taxonomic<br />

assignments (Table S1). SAR86, SAR116 and other currently<br />

recognized uncultured groups made up ~4% <strong>of</strong> the<br />

16S rRNA amplicons from these samples (see below).<br />

Finally, to examine the possibility that the unidentified<br />

sequences were from non-protein-coding regions, these<br />

sequences were BLAST-queried to tRNA genes, 5S rRNA<br />

genes and intergenic region sequences from three<br />

P. marinus genomes (MIT9301, MIT9312 and AS601)<br />

and two P. ubique genomes (HTCC1002 and<br />

HTCC1062). Based on this <strong>analysis</strong>, ~4% <strong>of</strong> the 76 327<br />

unidentified sequences were from non-protein-coding<br />

regions <strong>of</strong> these genomes, and these primarily hit intergenic<br />

regions.<br />

BLASTX<br />

against nr<br />

0.07%<br />

42%<br />

163 sequences<br />

102,856 Unidentified<br />

32%<br />

76,327<br />

unidentified<br />

sequences<br />

BLASTX against<br />

GOS<br />

11%<br />

26,366 GOS sequences<br />

Fig. 1. The mRNA annotation pipeline developed for 454 transcript reads showing combined counts for the <strong>day</strong> and <strong>night</strong> transcriptomes. All<br />

percentages are relative to the total number <strong>of</strong> sequences entering the pipeline.<br />

Community composition and taxonomic origin<br />

<strong>of</strong> transcripts<br />

Prochlorococcus are the most abundant Cyanobacteria at<br />

Station ALOHA (> 95% <strong>of</strong> photosynthetic picoplankton<br />

cells; Campbell and Vaulot, 1993) and in this study<br />

accounted for approximately 2 ¥ 10 5 cell ml -1 (based on<br />

flow cytometric counting; http://hahana.soest.hawaii.edu/<br />

hot/hot-dogs/), or ~30% <strong>of</strong> the total <strong>microbial</strong> community<br />

(Fig. 2). Heterotrophic bacteria (including phototrophs)<br />

were numerically dominant with ~5 ¥ 10 5 cell ml -1 ,<br />

accounting for ~65% <strong>of</strong> the <strong>microbial</strong> community present<br />

at the time <strong>of</strong> sampling. Direct counts also indicated the<br />

presence <strong>of</strong> ~800 cell ml -1 <strong>of</strong> pigmented nanoeukaryotes<br />

(0.2%; Fig. 2).<br />

Companion PCR-based 16S rRNA clone libraries were<br />

generated from DNA collected in tandem with the RNA<br />

samples and demonstrated close agreement with the flow<br />

cytometric data in terms <strong>of</strong> taxonomic composition at<br />

Station ALOHA. Cyanobacteria accounted for ~20% <strong>of</strong> the<br />

16S rRNA sequences, and heterotrophic bacterial groups<br />

were ~80% (Fig. 3). Among the heterotrophic 16S rRNA<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


Depth (m)<br />

0<br />

50<br />

100<br />

150<br />

200<br />

sequences, Proteobacteria were most abundant (41%;<br />

Fig. 3) and were dominated by a-Proteobacteria (22%),<br />

b-Proteobacteria (8%) and g-Proteobacteria (8%).<br />

Bacteroidetes (8%) and Firmicutes (12%, biased towards<br />

the <strong>day</strong> sample) were also well represented.<br />

Taxonomically binned mRNA sequences were compared<br />

with community composition data to ask whether<br />

taxa contributed to the HOT community mRNA in proportion<br />

to their representation in the <strong>microbial</strong> assemblage<br />

(i.e. whether taxa are equally transcriptionally active on a<br />

per-cell basis). Cyanobacteria dominated the transcript<br />

libraries (55% <strong>of</strong> sequences) with about tw<strong>of</strong>old higher<br />

representation than in the 16S rRNA amplicons or the cell<br />

count data (Fig. 3), indicating that there is more gene<br />

expression in these autotrophic bacterioplankton than in<br />

co-occurring heterotrophs (or possibly that their transcripts<br />

are longer-lived). When relative 16S rRNA abundance<br />

was calculated among just the heterotrophic<br />

groups (i.e. with cyanobacterial sequences removed),<br />

many taxa had similar contributions to the transcript pool<br />

and amplicon pool, suggesting comparable levels <strong>of</strong><br />

transcriptional activity on a per-gene basis within the limits<br />

<strong>of</strong> recognized biases <strong>of</strong> PCR amplification (Fig. 3).<br />

0 200 400 600<br />

chla (10 -3 μg l -1 )<br />

Prochlorococcus x 10 3 cells ml -1<br />

Synechococcus x 10 2 cells ml -1<br />

Nanoeukaryotes x 10 2 cells ml -1<br />

Heterotrophic bateria x 10 3 cells ml -1<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1361<br />

Fig. 2. Depth pr<strong>of</strong>iles <strong>of</strong> Prochlorococcus-like, Synechococcus-like, heterotrophic bacteria and pigmented nanoeukaryotes during the HOT-175<br />

cruise, as determined by flow cytometry. The horizontal line indicates the mixed layer depth. The depth pr<strong>of</strong>ile for chlorophyll a is also<br />

indicated. Data were collected through the HOT project and downloaded from the HOT Data Organization and Graphical System<br />

(http://hahana.soest.hawaii.edu/hot/hot-dogs/).<br />

Proteobacteria contributed the second largest number <strong>of</strong><br />

transcript sequences (28%), most <strong>of</strong> which were attributed<br />

to a-Proteobacteria (19%) and g-Proteobacteria<br />

(4%). Approximately 2% <strong>of</strong> the total transcripts were <strong>of</strong><br />

eukaryotic origin. Comparing putative taxonomic assignments<br />

<strong>of</strong> transcripts between <strong>day</strong> and <strong>night</strong>, Cyanobacteria<br />

contributed equally to the <strong>day</strong> and <strong>night</strong> transcriptome<br />

(55% versus 56%) as did a-Proteobacteria (40% versus<br />

45% <strong>of</strong> heterotrophic transcripts) and g-Proteobacteria<br />

(11% versus 8% <strong>of</strong> heterotrophic transcripts) (Fig. 3).<br />

More detailed taxonomic assignment <strong>of</strong> transcripts was<br />

carried out for the best represented clades. The Cyanobacteria<br />

transcripts were dominated by Prochlorococcuslike<br />

sequences most similar to P. marinus AS9601,<br />

P. marinus MIT 9301 and P. marinus MIT 9312 (Table 1).<br />

The a-Proteobacteria, the most transcriptionally active<br />

among the heterotrophic groups, mostly contained<br />

sequences with similarity to the SAR11 group members<br />

P. ubique HTCC1002 and P. ubique HTCC1062 (~10% <strong>of</strong><br />

prokaryotic transcripts). Roseobacter-like sequences<br />

were also represented and were primarily assigned to<br />

Dinoroseobacter shibae DFL 12, Jannaschia sp. CCS1,<br />

Silicibacter pomeroyi DSS-3, Roseobacter denitrificans<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1362 R. S. Poretsky et al.<br />

A<br />

16S rRNA<br />

genes<br />

mRNA<br />

B<br />

16S rRNA<br />

genes<br />

mRNA<br />

Och 114 and Silicibacter sp. TM1040 (Table 1 and Fig. 4).<br />

These assignments do not imply that these actual species<br />

were present at the time <strong>of</strong> sample collection, but rather<br />

they represent the best current sequence matches for<br />

some <strong>of</strong> the more abundant environmental transcripts.<br />

Transcriptome coverage<br />

Cyanobacteria<br />

18 %<br />

Cyanobacteria<br />

55 %<br />

Cyanobacteria<br />

21 %<br />

Cyanobacteria<br />

56 %<br />

Other<br />

82%<br />

Other<br />

45%<br />

Other<br />

79%<br />

Other<br />

44%<br />

To estimate transcriptome coverage, 16S rRNA clone<br />

library data were used to establish a taxon-abundance<br />

model for the HOT community at an identity level <strong>of</strong> 99%.<br />

Assuming that each taxon expresses 1000 different<br />

genes at any given time (based on the Escherichia coli<br />

model; Ingraham et al., 1983) and that genome coverage<br />

Cyanobacteria<br />

Alphaproteobacteria<br />

Gammaproteobacteria<br />

Betaproteobacteria<br />

Deltaproteobacteria<br />

Epsilonproteobacteria<br />

Other Proteobacteria<br />

Actinobacteria<br />

Bacteroidetes<br />

Chlamydiae<br />

Chlorobi<br />

Chlor<strong>of</strong>lexi<br />

Chrysiogenetes<br />

Acidobacteria<br />

Firmicutes<br />

Lentispaerae<br />

Planctomycetes<br />

Spirochaetes<br />

Thermotogae<br />

Verrucomicrobia<br />

Fig. 3. Contribution <strong>of</strong> taxa to the 16S rRNA amplicon pool and transcript pool for the <strong>day</strong> (A) and <strong>night</strong> (B) samples. Taxonomy is presented<br />

to the phylum level (based on NCBI taxonomy) except for Proteobacteria, which is at the subphylum level. The dashed red lines indicate<br />

cyanobacterial abundance in the <strong>night</strong> sample as determined by flow cytometric counting.<br />

follows a Lander–Waterman model (Lander and Waterman,<br />

1988), we estimate that the most abundant taxon in<br />

the <strong>day</strong> or <strong>night</strong> sample had over 90% transcriptome<br />

coverage (i.e. 90% <strong>of</strong> the expressed genes were<br />

sequenced at least once), while the 15 most abundant<br />

taxa had more than half <strong>of</strong> their transcriptome represented<br />

(Table S2). Alternately, we determined the singletons<br />

and doubletons among the COG categories (i.e. the<br />

number <strong>of</strong> COGs containing only one or two sequences)<br />

and applied the Chao1 index <strong>of</strong> diversity to determine the<br />

theoretical abundance <strong>of</strong> COGs in the <strong>day</strong> and <strong>night</strong>. The<br />

sequencing effort captured about 80% <strong>of</strong> the COGs predicted<br />

to be present in the <strong>night</strong> transcriptome and 70% <strong>of</strong><br />

the COGs predicted for the <strong>day</strong> transcriptome (Table S2).<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


% PHX Genes Frequency<br />

Frequency<br />

Based on these coverage estimates, increased<br />

sequencing depth would have been required to fully<br />

capture some specialized processes carried out by rarer<br />

members <strong>of</strong> the HOT community, but frequently transcribed<br />

genes from abundant taxa were well represented.<br />

In support <strong>of</strong> this, transcript mapping to the three P. mari-<br />

Number <strong>of</strong> Adjacent Genes<br />

Fig. 4. Evidence for prokaryotic gene expression patterns in the community transcriptome based on P. marinus, P. ubique and Roseobacter<br />

genome bins.<br />

A. Operon-based expression was evaluated by comparing the number <strong>of</strong> adjacent transcripts (closed circles) to the number <strong>of</strong> adjacent genes<br />

found in 1000 random samples <strong>of</strong> the same size from the reference genome (black lines).<br />

B. Preferential representation <strong>of</strong> transcripts from genes predicted to be highly expressed was evaluated by comparing the per cent <strong>of</strong> PHX<br />

genes in the reference genome (grey bar) to the per cent in the transcript pool (black bar). Differences between transcript pools and reference<br />

genomes were significant for both operon and PHX analyses (Wilcoxon signed-rank test; P < 0.05).<br />

Table 1. Number <strong>of</strong> sequences from the community transcriptome<br />

with highest homology to the listed reference genomes, as determined<br />

by top BLASTX hit to RefSeq.<br />

Night Day<br />

Prochlorococcus marinus str. MIT 9301 6309 6292<br />

Prochlorococcus marinus str. AS9601 3214 2849<br />

Pelagibacter ubique HTCC1002 2541 1851<br />

Prochlorococcus marinus str. MIT 9312 1430 1264<br />

Pelagibacter ubique HTCC1062 1308 944<br />

Dinoroseobacter shibae DFL 12 48 34<br />

Jannaschia sp. CCS1 41 27<br />

Silicibacter pomeroyi DSS-3 39 30<br />

Roseobacter denitrificans Och 114 30 28<br />

Silicibacter sp. TM1040 19 26<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1363<br />

nus and two P. ubique reference genomes showed<br />

sequences with homology to approximately half the<br />

genes, at coverage depths ranging from 1 to nearly 500<br />

hits per gene (Fig. 5). Moreover, many <strong>of</strong> the reference<br />

genes with the greatest coverage are those mediating<br />

metabolic processes expected to be dominant in the HOT<br />

bacterioplankton community (e.g. the photosynthesis<br />

genes psaA and psaB, the light-harvesting complex and<br />

RuBisCo, ammonium transporters and transcriptionrelated<br />

genes; Fig. 5). Other genes on the reference<br />

genomes for which there is similarly deep transcript coverage<br />

(e.g. proteorhodopsin, Na+/solute symporters,<br />

colicin V production and several hypothetical proteins)<br />

can be hypothesized to also represent dominant metabolic<br />

activities (Fig. 5).<br />

Operon signature in environmental transcript pools<br />

Genes that encode steps in the same metabolic pathway<br />

are frequently clustered into operons in prokaryotic<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1364 R. S. Poretsky et al.<br />

Occurences<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000<br />

500<br />

450<br />

400<br />

150<br />

100<br />

50<br />

425<br />

400<br />

75<br />

50<br />

25<br />

80<br />

60<br />

40<br />

20<br />

Hypothetical protein<br />

Photosytem II PsbJ protein<br />

Ammonium transporter family<br />

Photosystem II PsbB (CP47)<br />

Ribulose bisphosphate carboxylase<br />

MIT9312<br />

Photosystem II D2<br />

Cytochrome b559, beta subunit<br />

MIT9301<br />

Protoporphyrin IX magnesium chelatase,<br />

subunit chlH<br />

0<br />

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000<br />

AS9601<br />

0<br />

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000<br />

HTCC 1002<br />

0<br />

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400<br />

35<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

A<br />

E<br />

B<br />

D<br />

C<br />

Photosystem II PsbA (D1)<br />

lipoprotein<br />

precursor<br />

Bacteriorhodopsin<br />

light-harvesting complex protein<br />

Na+/solute symporter<br />

AcrB/AcrD/AcrF family protein<br />

(Acriflavin resistance)<br />

Chromosome segregation<br />

SMC family protein<br />

30S ribosomal protein S1<br />

excinuclease<br />

ABC subunit C<br />

heat shock protein a<br />

HTCC 1062<br />

Integral membrane protein,<br />

interacts with FtsH<br />

Ribosomal protein L14<br />

Ribosomal protein L20<br />

Photosystem I PsaA<br />

Hypothetical protein<br />

Elongation factor Tu<br />

Photosystem I PsaB<br />

30S ribosomal protein S3 Photosystem II<br />

reaction center Z<br />

DNA-directed RNA polymerase<br />

beta prime chain<br />

octaprenyl-diphosphate synthase<br />

translation elongation factor EF-G<br />

adenylylsulfate reductase<br />

0<br />

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400<br />

Fig. 5. Mapping <strong>of</strong> transcripts to five reference genomes. A–C are P. marinus strains; D–E are P. ubique strains. The x-axis shows gene<br />

number in the reference genome. Shaded areas represent possible hypervariable regions with few mapped transcripts.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


genomes (Overbeek et al., 1999) to facilitate coordinated<br />

transcription. Thus a cell’s transcript pool is anticipated to<br />

include more mRNAs from adjacent genes than what is<br />

expected from a random sampling <strong>of</strong> the genome. We<br />

tested this using the transcripts assigned to taxonomic<br />

bins for P. marinus, P. ubique and Roseobacter by counting<br />

the frequency with which transcripts from two adjacent<br />

genes on the reference strain genome (defined as � 1<br />

gene intervening) were both present in the bin, recognizing<br />

that the wild and reference organisms will not be fully<br />

syntenic. In all cases, the transcript bins had significantly<br />

more adjacent genes than a null distribution generated<br />

from the reference genomes (Fig. 4A), suggesting that<br />

random transcript sequencing captures operon-based<br />

expression patterns in natural marine bacterioplankton<br />

communities.<br />

Predicted highly expressed genes in environmental<br />

transcript pools<br />

Genes that are frequently transcribed by a cell can be<br />

identified based on patterns in codon usage (Karlin and<br />

Mrázek, 2000). We identified predicted highly expressed<br />

(PHX) genes for the reference genomes, and then<br />

assigned PHX status to the transcripts with best hits to<br />

that reference genome based on homology. For all taxa,<br />

and in accordance with biological expectations, the environmental<br />

transcript bins had a significantly higher percentage<br />

<strong>of</strong> PHX genes than the reference genomes<br />

(Fig. 4B). This pattern was particularly evident for the<br />

Roseobacters (9% <strong>of</strong> the genes in the reference genomes<br />

are PHX versus 30% <strong>of</strong> the transcripts; 3.1-fold enrichment)<br />

and for P. marinus MIT9301 (4.6% versus 12.9%;<br />

2.8-fold enrichment). A larger proportion <strong>of</strong> PHX transcripts<br />

were found in the <strong>day</strong> for all P. marinus bins and<br />

the Roseobacter bin (although not for P. ubique), suggesting<br />

that highly expressed genes more frequently mediate<br />

<strong>day</strong>time-biased processes (data not shown).<br />

Metatranscriptomic comparison <strong>of</strong> <strong>day</strong> and <strong>night</strong><br />

samples<br />

The majority <strong>of</strong> annotated transcripts (~80%) were<br />

assigned to genes related to metabolism, and in particular<br />

to three KEGG categories: amino acid transport and<br />

metabolism, energy production and conversion (particularly<br />

oxidative phosphorylation, carbon fixation and nitrogen<br />

metabolism), and carbohydrate transport (Fig. 6).<br />

Membrane transport and signal transduction pathways<br />

were also common in the community transcriptome,<br />

specifically for ABC transporters <strong>of</strong> amino acids, glycine<br />

betaine/L-proline, polyamines (spermidine and<br />

putrescine), iron and nutrients in the form <strong>of</strong> nitrate, phosphate<br />

and phosphonate.<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1365<br />

The <strong>day</strong>/<strong>night</strong> samples allowed comparison <strong>of</strong> dominant<br />

expression patterns in the presence and absence <strong>of</strong> solar<br />

radiation in the bacterioplankton community. Among the<br />

167 KEGG metabolic pathways represented in the annotated<br />

sequences, four pathways were better represented<br />

at <strong>night</strong> (including those for glycospingolipid biosynthesis<br />

and nucleotide sugars metabolism) and six were better<br />

represented in the <strong>day</strong> (including photosynthesis and oxidative<br />

phosphorylation) (95% confidence level; Table 2).<br />

Some KEGG pathways had significant diel differences in<br />

frequency for individual taxonomic bins. These include:<br />

histidine biosynthesis, with evidence for expression <strong>of</strong> all<br />

or nearly all genes in the pathway (both P. ubique and<br />

P. marinus at <strong>night</strong>; Fig. 7A and Fig. S1A); metabolism <strong>of</strong><br />

glutathione, a reductant with multiple detoxifying and cytoprotective<br />

capabilities (P. marinus at <strong>night</strong>); the photosynthesis<br />

pathway (phycobilisome, photosystem I and II,<br />

cytochromes, ATP synthase) and nearly all genes<br />

involved in biosynthesis <strong>of</strong> phytoene, and subsequent<br />

conversion into carotenoids (P. marinus in the <strong>day</strong>;<br />

Fig. 7B); nucleotide sugars metabolism, glycosphingolipid<br />

biosynthesis, carotenoid biosynthesis and vitamin B6<br />

metabolism (P. ubique in the <strong>night</strong>; Fig. S1B); and transfer<br />

<strong>of</strong> methyl groups for C1 metabolism (P. ubique and<br />

Roseobacter in the <strong>day</strong>) (Table S3).<br />

Transcript annotation based on the COG database was<br />

comparable. Among the 1577 COGs represented, statistical<br />

comparisons identified 12 that were better represented<br />

at <strong>night</strong> and 13 that were better represented in the<br />

<strong>day</strong> (Table S4). These included amino acid and nucleotide<br />

metabolism, membrane biosynthesis and polyamine<br />

dehydrogenation at <strong>night</strong>, and light-mediated energy production,<br />

protein turnover, catalase synthesis and inorganic<br />

ion transport and metabolism in the <strong>day</strong>.<br />

Statistically significant differences in the distribution <strong>of</strong><br />

transcripts between the <strong>day</strong> and <strong>night</strong> samples were also<br />

assessed independently <strong>of</strong> KEGG and COG assignments<br />

in order to capture signals from genes not currently classified<br />

by these annotation systems. Among the additional<br />

significant functions overrepresented in the <strong>night</strong> transcriptome<br />

were those for ABC-type spermidine/putrescine<br />

transport system permeases, RNA methyltransferases<br />

and signal transduction histidine kinases. For the <strong>day</strong><br />

transcriptome, genes encoding proteorhodopsin and an<br />

aromatic-ring hydroxylase were significantly overrepresented<br />

(Table S5).<br />

Eukaryotic sequences<br />

The majority <strong>of</strong> eukaryotic transcripts were most closely<br />

affiliated with sequences from green-lineage organisms<br />

(Viridiplantae), such as the picoeukaryotic prasinophytes<br />

Ostreococcus spp. (Derelle et al., 2006) and Micromonas<br />

spp. A large number <strong>of</strong> transcripts also appeared to be<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1366 R. S. Poretsky et al.<br />

Fig. 6. The 50 most abundant KEGG pathways in the <strong>night</strong> (black) and <strong>day</strong> (gray) transcriptomes. The pathways marked with stars were<br />

significantly overexpressed in one <strong>of</strong> the pools as determined by comparisons with P < 0.05 (Rodriguez-Brito et al., 2006).<br />

most closely related to genes in Chromalveoltae<br />

(Stramenopile or Alevolate) genomes. These groups are<br />

major components <strong>of</strong> the picoeukaryotic phytoplankton<br />

(McDonald et al., 2007) and are small enough to pass the<br />

5 mm prefilter used in this study. Gene transcripts that<br />

most closely matched reference genomes <strong>of</strong> photosynthetic<br />

eukaryotes were more abundant in the <strong>day</strong> compared<br />

with <strong>night</strong> sample. Among the most highly<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


Table 2. KEGG pathways significantly overrepresented in the <strong>night</strong> (grey shading) and <strong>day</strong> (no shading) transcriptomes (P < 0.05).<br />

Pathway ID Pathway Category<br />

expressed genes detected from eukaryotic organisms<br />

were those encoding chlorophyll binding proteins, light<br />

harvesting reactions and photosynthetic machinery<br />

(Fig. 8). These included a photosystem II D1 reactioncentre<br />

protein related to that from the diatom Thalassiosira<br />

psuedonana, as well as the plastid-encoded<br />

photosystem I subunit protein similar to psaB from the<br />

diatom Odontella sinensis. Evidence for stramenopile<br />

nitrogen metabolism via urea cycle activity was also<br />

detected based on several transcripts that most closely<br />

matched stramenopile carbamoyl phosphate synthetase<br />

III, indicating that the unique diatom urea cycle (Armbrust<br />

et al., 2004; Allen et al., 2006) is likely active in natural<br />

populations <strong>of</strong> stramenopile picophytoplankton.<br />

qPCR quality control<br />

The half-life <strong>of</strong> <strong>microbial</strong> transcripts can be as short as<br />

30 s based on studies <strong>of</strong> mRNAs <strong>of</strong> cultured bacteria<br />

(Belasco, 1993), while processing times for environmental<br />

nucleic acid samples can take hours (Fuhrman et al.,<br />

1988). Linear amplification <strong>of</strong> RNA greatly reduces the<br />

time between initiation <strong>of</strong> sampling and capture <strong>of</strong> transcripts<br />

because sample volumes can be reduced, but it<br />

has potential to introduce bias into the sequenced mRNA<br />

pool. A previous test with mRNA from the cultured marine<br />

bacterium S. pomeroyi DSS-3 demonstrated minor bias<br />

and good repeatability during linear amplification (Bürgmann<br />

et al., 2007). Here, we assessed the full environmental<br />

transcriptomic sequencing protocol by comparing<br />

qPCR-based ratios <strong>of</strong> selected genes in <strong>day</strong> versus <strong>night</strong><br />

total RNA fractions to the pyrosequencing-based ratio <strong>of</strong><br />

these same genes in the sequenced transcript pools. Five<br />

genes common in the transcriptome (P. marinus-like recA<br />

and psaA, P. ubique-like proteorhodopsin and Na+/solute<br />

symporter, and P. torquis-like membrane proteinase)<br />

showed a strong positive correlation between <strong>night</strong> and<br />

<strong>day</strong> ratios in the original RNA pool and the pyrosequence<br />

data sets (r = 0.94, Fig. S2), indicating that the sequenced<br />

metatranscriptome was representative <strong>of</strong> the unamplified<br />

mRNA pool.<br />

Discussion<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1367<br />

path00520 Nucleotide sugars metabolism Carbohydrate Metabolism<br />

path00521 Streptomycin biosynthesis Biosynthesis <strong>of</strong> Secondary Metabolites<br />

path00602 Glycosphingolipid biosynthesis – neo-lactoseries Glycan Biosynthesis and Metabolism<br />

path00603 Glycosphingolipid biosynthesis – globoseries Glycan Biosynthesis and Metabolism<br />

path00190 Oxidative phosphorylation Energy Metabolism<br />

path00195 Photosynthesis Energy Metabolism<br />

path03010 Ribosome Translation<br />

path03020 RNA polymerase Transcription<br />

path04940 Chaperonin N/A<br />

path05060 Chaperonin N/A<br />

The HOT program provides comprehensive, long-term<br />

oceanographic information for the oligotrophic North<br />

Pacific Ocean (Karl and Lukas, 1996). In situ dissolved<br />

organic constituents at 25 m depth at Station ALOHA are<br />

typically 70–110 mM for carbon, 5–6 mM for nitrogen and<br />

0.2–0.3 mM for phosphorus; ammonium concentrations in<br />

these waters (~50 nM) are below the detection limit <strong>of</strong><br />

standard nutrient <strong>analysis</strong> (http://hahana.soest.hawaii.<br />

edu/hot/hot-dogs/). Surface water nutrient data over the<br />

past several decades for the month <strong>of</strong> November, the<br />

month in which the community transcriptomes in this<br />

study were obtained, and taken during various times <strong>of</strong><br />

<strong>day</strong> show no discernable differences in organic and inorganic<br />

carbon, nitrogen, and/or phosphorus concentrations<br />

at Station ALOHA on a diel basis.<br />

Building on previous metagenomic and transcriptomic<br />

analyses <strong>of</strong> this system (DeLong et al., 2006; Frias-Lopez<br />

et al., 2008), this <strong>day</strong>/<strong>night</strong> environmental transcriptomics<br />

effort provides insight into the temporal patterns <strong>of</strong> bacterioplankton<br />

metabolic processes and ecological activities<br />

(Table 3). Three important caveats <strong>of</strong> the <strong>analysis</strong> are<br />

that: (i) the composition <strong>of</strong> the environmental transcriptomes<br />

may be inadvertently shaped by collection and<br />

filtration manipulations, (ii) mRNAs with intrinsically<br />

shorter half-lives are less likely to be stabilized and<br />

sequenced and (iii) only 32% <strong>of</strong> the 151 000 possible<br />

transcript sequences could be confidently assigned to a<br />

known function (Fig. 1). Despite these concerns, the community<br />

transcriptomes provided reasonable coverage <strong>of</strong><br />

mRNAs from the dominant organisms, and the relative<br />

representation <strong>of</strong> transcripts was corroborated by RT<br />

qPCR-based expression analyses (Fig. S2).<br />

The community transcriptomes had properties consistent<br />

with expected attributes <strong>of</strong> the HOT ecosystem,<br />

including the apparent taxonomic affiliations <strong>of</strong> transcripts.<br />

Closely related P. marinus reference strains that<br />

are members <strong>of</strong> high light clade eMIT9312 comprised the<br />

most populated transcript bin. This clade has been shown<br />

to dominate in the upper euphotic zone (< 50 m) at low<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1368 R. S. Poretsky et al.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


Fig. 7. Transcript mapping to the KEGG histidine metabolism pathway for P. ubique, overrepresented at <strong>night</strong> (A) and the biosynthesis <strong>of</strong><br />

steroids and carotenoids pathway for P. marinus, overrepresented in the <strong>day</strong> (B). Colour (blue for <strong>night</strong>, yellow for <strong>day</strong>) indicates that<br />

transcripts were found; grey indicates that genes were present in the reference genome but no transcripts were found; white indicates that<br />

genes were not present in the reference genomes.<br />

and mid latitudes (below 30°) (Johnson et al., 2006),<br />

much like the HOT stations from which our samples were<br />

collected. SAR11-like sequences comprised the second<br />

largest taxonomic bin. This taxon is the most numerous<br />

heterotrophic marine bacterioplankton group, particularly<br />

in oligotrophic oceans where it makes up 30–40% <strong>of</strong> cells<br />

in the euphotic zone (Morris et al., 2002).<br />

Studies <strong>of</strong> taxonomic composition <strong>of</strong> ocean assemblages<br />

consistently show the numerical importance <strong>of</strong> aand<br />

g-Proteobacteria, Cyanobacteria, and Bacteriodetes<br />

(Morris et al., 2002; DeLong et al., 2006; Rusch<br />

et al., 2007), but little is known about how abundance<br />

specifically relates to activity levels. Based on comparisons<br />

<strong>of</strong> the relative abundance <strong>of</strong> taxa (flow cytometry<br />

counts and 16S rRNA amplicons) to their representation<br />

in the community transcriptome, by far the highest per-cell<br />

transcriptional activity level in the HOT ecosystem was<br />

seen for the Cyanobacteria. Assuming similar mRNA half-<br />

electron transport<br />

photosynthesis, light reaction<br />

phosphorus metabolic process<br />

oxidative phosphorylation<br />

ion transmembrane transporter activity<br />

energy derivation by oxidation <strong>of</strong> organic compounds<br />

heme binding<br />

cellular biosynthetic process<br />

protein metabolic process<br />

cellular macromolecule metabolic process<br />

organelle organization and biogenesis<br />

DNA metabolism<br />

organic acid metabolic process<br />

carbon utilization by fixation <strong>of</strong> carbon dioxide<br />

aldehyde metabolic process<br />

macromolecular complex assembly<br />

cellular component assembly<br />

ribonucleoprotein complex biogenesis and assembly<br />

macromolecule biosynthetic process<br />

intracellular transport<br />

aromatic compound metabolic process<br />

biopolymer metabolic process<br />

amino acid and derivative metabolic process<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1369<br />

lives across the prokaryotic taxa, dominant autotrophs<br />

produced more transcripts per gene than any<br />

co-occurring heterotrophic group not only in the <strong>day</strong>, but<br />

also at <strong>night</strong> (Fig. 3). This may reflect an advantage <strong>of</strong><br />

autotrophy over heterotrophy for maintaining cellular<br />

activity levels given the low concentration and refractory<br />

nature <strong>of</strong> organic carbon fuelling heterotrophic activity in<br />

the oligotrophic ocean (Bauer et al., 1992).<br />

As expected, many transcripts involved in lightmediated<br />

processes, such as photosynthesis and proteorhodopsin<br />

activity, were among those overrepresented in<br />

the community transcriptome in the <strong>day</strong>. Transcripts<br />

involved in protection or repair <strong>of</strong> light-induced DNA and<br />

protein damage (e.g. catalase, chaperones, photolyases,<br />

superoxide dismutase and various DNA repair proteins)<br />

were also common in the <strong>day</strong> sample. Evidence<br />

<strong>of</strong> <strong>day</strong>time C1 utilization by some heterotrophs suggests<br />

a source <strong>of</strong> C1 compounds or methyl groups in this<br />

0 20 40 60 80 100 120 140 160 180<br />

Fig. 8. Number <strong>of</strong> eukaryotic transcripts in <strong>day</strong> (top bars) compared with <strong>night</strong> (bottom bars) samples. The relative contribution <strong>of</strong><br />

Viridiplanteae (green), photosynthetic Chromist algae (yellow), and other Chromist (red) transcripts to each Gene Ontology (GO) annotation<br />

category are depicted.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1370 R. S. Poretsky et al.<br />

Table 3. Selected biogeochemically relevant genes in the HOT metatranscriptome.<br />

ecosystem. Compounds such as methanol and formaldehyde<br />

(Heikes et al., 2002; Carpenter et al., 2004; Giovannoni<br />

et al., 2008), methane (Ward et al., 1987), and<br />

methylhalides (Woodall et al., 2001; Schaefer et al., 2002)<br />

may be available to heterotrophic bacterioplankton in<br />

surface sea water. Dimethylsulphoniopropionate, an<br />

organic sulphur compound produced in abundance by<br />

marine phytoplankton (Kiene et al., 2000), is a rich source<br />

<strong>of</strong> methyl groups for surface ocean bacterioplankton, and<br />

tetrahydr<strong>of</strong>olate-mediated C1 transfer (i.e. transcripts<br />

mapping to the C1 pool by folate and methane metabolism<br />

KEGG pathway; Table S5) has been shown to play a role<br />

in its metabolism (Howard et al., 2006). Recovery <strong>of</strong> nearly<br />

Night Day<br />

Nitrogen Nitrogenase (N fixation) nifH, nifU, nifS, nifB + +<br />

Ammonium transport amt + +*<br />

Ammonia monooxygenase amoA<br />

Assimilatory nitrate reductase narB +<br />

Hydroxylamine oxidoreductase hao<br />

Nitrate permease napA +<br />

Nitrite reductase nirA +<br />

Dissimilatory nitrite reductase nirK, nirS<br />

Nitric oxide reductase norQ +<br />

Nitrate transporter narK +<br />

Urease ureC, ureE, ureF + +<br />

Methylotrophy Serine-glyoxylate aminotransferase + +<br />

Formate dehydrogenase fdh, fdsD + +<br />

Methylene tetrahydr<strong>of</strong>olate reductase metF + +<br />

Methane monooxygenase mmo<br />

Methanol dehydrogenase mxa +<br />

Methenyltetrahydromethanopterin cyclohydrolase mch + +<br />

Crotonyl-CoA reductase + +<br />

Formaldehyde-activating enzyme fae +<br />

Polyamine degradation Deoxyhypusine synthase dys2 +* +<br />

Spermidine/putrescine transport system permease potC +* +<br />

Acetylpolyamine aminohydrolase aphA<br />

Sulphur cycle Sulphur oxidation soxB, soxC, soxA, soxZ, soxF + +<br />

Dimethylsulphoniopropionate demethylase dmdA<br />

Glycine betaine Dimethylglycine dehydrogenase dmgdh + +<br />

Glycine cleavage system (amnomethyltransferase) gcvT +* +<br />

Aromatic compounds Aromatic ring hydroxylase chlP + +*<br />

protocatechuate 3,4-dioxygenase pcaH<br />

Benzoyl-CoA oxygenase boxA +<br />

Carbon monoxide Carbon monoxide dehydrogenase cosS, coxM, coxL + +<br />

Phototrophy and C fixation Photosystem I multiple + +*<br />

Photosystem II multiple + +*<br />

Rubisco rbcL, rbcS + +*<br />

Photosynthetic reaction centre, M subunit pufM +<br />

Proteorhodopsin + +*<br />

Phosphate assimilation Phosphonate uptake phnD, phnC + +<br />

Alkaline phosphatase phoA + +<br />

Phosphate uptake pstA, pstS + +<br />

Amino acid metabolism Glutamate synthase gltB + +<br />

Glutathione reductase gor +* +<br />

Histidine kinase baeS +* +<br />

Threonine synthase thrC +* +<br />

Trace metal uptake Selenium +* +<br />

Iron tonB + +<br />

Arsenite +<br />

Arsenate reductase arsC + +<br />

A‘+’ indicates occurrence in the <strong>night</strong> or <strong>day</strong> sample. An asterisk indicates significantly higher transcript frequency in one.<br />

four times as much mRNA per volume <strong>of</strong> sea water in the<br />

<strong>day</strong> (~30 ng l -1 ) compared with <strong>night</strong> (~8 ngl -1 ) is consistent<br />

with high relative abundance <strong>of</strong> RNA polymerase<br />

transcripts in the <strong>day</strong> (Table 2) and likely reflects increased<br />

gene expression when solar radiation is available.<br />

Night-biased synthesis <strong>of</strong> vitamin B6, essential for a<br />

variety <strong>of</strong> amino acid conversions including transaminations,<br />

decarboxylations and dehydrations, in conjunction<br />

with evidence for other <strong>night</strong>-time activities such as the<br />

g-glutamyl pathway for amino acid uptake, the overrepresentation<br />

<strong>of</strong> amino acid transport and metabolism genes,<br />

and the histidine synthesis pathway (Table 3 and<br />

Tables S4–S6), indicate that amino acid acquisition in<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


general may be a relatively more important metabolic<br />

activity in the <strong>night</strong>. Prochlorococcus marinus has recently<br />

been shown to exhibit diel patterns <strong>of</strong> amino acid uptake,<br />

with acquisition occurring predominantly at dusk (Mary<br />

et al., 2008). Our data agree with this and further suggest<br />

that heterotrophic taxa also devote a greater percentage<br />

<strong>of</strong> their transcriptome to transporting and synthesizing<br />

amino acids at <strong>night</strong>. Night-time accumulation <strong>of</strong> amino<br />

acids might be a mechanism for nitrogen storage by many<br />

organisms, particularly for P. marinus, which undergoes<br />

cell division at <strong>night</strong>. Histidine, the amino acid with the<br />

most consistent signal for synthesis at <strong>night</strong> by both<br />

autotrophs and heterotrophs (Fig. 7A and Fig. S1), is one<br />

<strong>of</strong> the most nitrogen-rich amino acids (only arginine has<br />

more amino groups).<br />

Overall, bacterial community investment in this oligotrophic<br />

ocean system was skewed towards energy<br />

acquisition and metabolism during the <strong>day</strong>, while biosynthesis<br />

(specifically <strong>of</strong> membranes, amino acids and vitamins)<br />

received relatively greater investments at <strong>night</strong>.<br />

Many <strong>microbial</strong> processes expected to be differentially<br />

expressed over a <strong>day</strong>/<strong>night</strong> cycle, such as photosynthesis,<br />

oxidative phosphorylation and proteorhodopsin activity,<br />

were indeed captured in the sequence data. Less<br />

anticipated processes that emerged included the utilization<br />

<strong>of</strong> C1 compounds, the uptake <strong>of</strong> polyamines and the<br />

degradation <strong>of</strong> aromatic compounds (Table 3). Other<br />

metabolic processes ongoing in this <strong>microbial</strong> community,<br />

although without statistical evidence for <strong>day</strong>/<strong>night</strong> patterns,<br />

included: use <strong>of</strong> nitrate and urea as nitrogen<br />

sources; use <strong>of</strong> phosphate, phosphonate and carbonoxygen-phosphorus<br />

(C-O-P) compounds as phosphorus<br />

sources; oxidation <strong>of</strong> reduced sulphur compounds; oxidation<br />

<strong>of</strong> carbon monoxide; and uptake <strong>of</strong> multiple trace<br />

metals (Table 3). This comparative <strong>analysis</strong> <strong>of</strong> <strong>microbial</strong><br />

community transcripts has provided an inventory <strong>of</strong><br />

ongoing metabolic processes, <strong>of</strong>fered insights into their<br />

temporal patterns and supplied a new type <strong>of</strong> data for<br />

predictive modelling <strong>of</strong> environmental controls on ecosystem<br />

properties.<br />

Experimental procedures<br />

Sample collection<br />

Samples were collected at the Hawaiian Ocean Time-series<br />

(HOT) Station ALOHA, defined by the 6-nautical-mile radius<br />

circle centred at 22°45′N, 158°W in November, 2005 (HOT-<br />

175). For RNA extraction, sea water was collected from a<br />

depth <strong>of</strong> 25 m using Niskin bottles on a conductivitytemperature-depth<br />

rosette sampler. A <strong>night</strong> sample was collected<br />

at 03:00 on 11 November 2005, and a <strong>day</strong>time<br />

sample was collected at 13:00 on 13 November 2005.<br />

During HOT-175, the peak PAR level was at 12:00, with<br />

sunrise occurring around 07:00 and sunset just before<br />

18:00. Sea water (80 l for the <strong>night</strong> sample and 40 l for the<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1371<br />

<strong>day</strong> sample) was prefiltered through a 5 mm, 142 mm polycarbonate<br />

filter (GE Osmonics, Minnetonka, MN) followed<br />

by a 0.2 mm, 142 mm Durapore (Millipore) filter using<br />

positive air pressure. The 0.2 mm filters were placed in a<br />

15 ml tube containing 2 ml Buffer RLT (containing<br />

b-mercaptoethanol) from the RNeasy kit (Qiagen, Valencia,<br />

CA) and flash-frozen in liquid nitrogen for RNA extraction.<br />

For DNA extraction, an additional 20 l <strong>of</strong> sea water were<br />

simultaneously filtered using the protocol outlined above at<br />

both time points. The 0.2 mm filters were placed in Whirlpack<br />

bags and flash-frozen. The total sampling time from initiation<br />

<strong>of</strong> collection until freezing in liquid nitrogen was approximately<br />

1.5 h. We obtained ~1 mg <strong>of</strong> total RNA from 40 to 80 l<br />

<strong>of</strong> sea water. Following mRNA enrichment and amplification,<br />

30–100 mg <strong>of</strong> mRNA was available for conversion to cDNA<br />

for sequencing. Typically, only 3–5 mg <strong>of</strong> DNA was required<br />

for pyrosequencing.<br />

RNA and DNA preparation<br />

DNA was extracted using a phenol : chlor<strong>of</strong>orm-based protocol<br />

(Fuhrman et al., 1988). Briefly, frozen filters inside Whirlpak<br />

bags were transferred to 50 ml Falcon centrifuge tubes.<br />

Ten millilitre extraction buffer [SDS (10% Sodium Doecyl<br />

Sulphate) : STE (100 mM NaCl, 10 mM Tris, 1 mM EDTA),<br />

9:1] was added to the tubes and boiled in a water bath for<br />

5 min. The extraction buffer was then removed from the<br />

tubes, placed into Oak Ridge round-bottom centrifuge tubes,<br />

to which 3 ml NaOAc and 28 ml 100% EtOH were added.<br />

Organic macromolecules were precipitated over<strong>night</strong> at<br />

-20°C, before the tubes were centrifuged for 1 h at 15 000 g.<br />

The supernatant was decanted, and pellets dried for 30 min<br />

in the air. The pellets were resuspended in 600 ml deionized<br />

water, and sequentially extracted with 500 ml phenol, 500 ml<br />

phenol : chlor<strong>of</strong>orm : isoamyl alcohol (24:1:0.1), and 500 ml<br />

chlor<strong>of</strong>orm:isoamyl alcohol (9:1); after each extraction the<br />

organic phase was removed and discarded. The supernatant<br />

was removed into a fresh tube at the end <strong>of</strong> last extraction,<br />

amended with 150 ml NaOAc and 1.2 ml 100% EtOH, and<br />

precipitated over<strong>night</strong>. The tube contents were then centrifuged<br />

at 15 000 g for 1 h, the supernatant decanted, and<br />

pellets dried in a speed vacuum dryer for 10 min. The DNA<br />

pellets were resuspended in 100 ml DNAse and RNAse-free<br />

deionized water (Ambion).<br />

RNA was extracted using a modified version <strong>of</strong> the RNeasy<br />

kit (Qiagen) that results in high RNA yields from material on<br />

polycarbonate filters (Poretsky et al., 2008). Frozen samples<br />

were first thawed slightly for 2 min in a 40–50°C water bath<br />

and then vortexed for 10 min with RNase-free beads from the<br />

Mo-Bio RNA PowerSoil kit (Carlsbad, CA). Following centrifugation<br />

for 5 min at 3000–5000 g, the supernatant was transferred<br />

to a new tube. Beginning with the RNeasy Midi kit,<br />

1 vol. <strong>of</strong> 70% ethanol was added to the lysate and, in order to<br />

shear large-molecular-weight nucleic acids, the lysate was<br />

drawn through a 22-gauge needle several (~5) times. RNA<br />

extraction then continued with the RNeasy Mini kit according<br />

to the manufacturer’s instructions.<br />

Following extraction, RNA was treated with DNase using<br />

the TURBO DNA-free kit (Ambion, Austin, TX). Two methods<br />

were employed to rid the RNA samples <strong>of</strong> rRNA. The RNA<br />

was first treated enzymatically with the mRNA-ONLY<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1372 R. S. Poretsky et al.<br />

Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies,<br />

Madison, WI) that uses a 5′-phosphate-dependent exonuclease<br />

to degrade rRNAs. The MICROBExpress kit (Ambion)<br />

subtractive hybridization with capture oligonucleotides<br />

hybridized to magnetic beads was subsequently used as an<br />

additional mRNA enrichment step.<br />

In order to obtain mg quantities <strong>of</strong> mRNA, approximately<br />

500 ng <strong>of</strong> RNA was linearly amplified using the MessageAmp<br />

II-Bacteria Kit (Ambion) according to the manufacturer’s<br />

instructions. Finally, the amplified, antisense RNA (aRNA)<br />

was converted to double-stranded cDNA with random hexamers<br />

using the Universal RiboClone cDNA Synthesis<br />

System (Promega, Madison, WI). The cDNA was purified with<br />

the Wizard DNA Clean-up System (Promega). The quality<br />

and quantity <strong>of</strong> the total RNA, mRNA, aRNA and cDNA were<br />

assessed by measurement on the NanoDrop-1000 Spectrophotometer<br />

(NanoDrop Technologies, Wilmington, DE) and<br />

the Experion Automated Electrophoresis System (Bio-Rad,<br />

Hercules, CA).<br />

cDNA sequencing and quality control<br />

cDNAs from each sample (<strong>night</strong> and <strong>day</strong>) were sequenced<br />

using the GS 20 sequencing system by 454 Life Sciences<br />

(Branford, CT) (Margulies et al., 2005), resulting in<br />

10 682 120 bp from 106 907 reads for the <strong>night</strong> sample and<br />

13 255 704 bp from 133 515 reads for the <strong>day</strong> sample. The<br />

average sequence length was 99 bp. The sequences have<br />

been deposited in the NCBI Short Read Archive with the<br />

Genome Project ID #33463.<br />

rRNA identification and removal<br />

For rRNA sequence identification, the sequences were clustered<br />

at an identity threshold <strong>of</strong> 98% based on a local alignment<br />

(number <strong>of</strong> identical residues divided by length <strong>of</strong><br />

alignment) using the program Cd-hit (Li and Godzik, 2006).<br />

Ribosomal RNA sequences were identified by BLASTN queries<br />

<strong>of</strong> the reference sequence <strong>of</strong> each cluster against the noncurated,<br />

GenBank nucleotide database (nt) (Benson et al.,<br />

2007) using cut-<strong>of</strong>f criteria <strong>of</strong> E-value � 10 -3 , nucleic acid<br />

length � 69 and per cent identity � 40% previously established<br />

with in silico tests for rRNA sequence predictions <strong>of</strong><br />

short pyrosequences (Frias-Lopez et al., 2008; Mou et al.,<br />

2008). We conservatively identified a sequence as rRNAderived<br />

and removed it from the <strong>analysis</strong> pipeline if any <strong>of</strong> the<br />

top three BLASTN hits were to an rRNA gene.<br />

cDNA sequence annotation<br />

The criteria for protein predictions generated using BLASTX<br />

against the NCBI curated, non-redundant reference<br />

sequence database (RefSeq) (Pruitt et al., 2005) were established<br />

with in silico tests to determine suitable cut-<strong>of</strong>f limits for<br />

reliable functional prediction. For these tests, 100 arbitrarily<br />

selected, known functional gene sequences were fragmented<br />

into 20–500 bp fragments and analysed using BLASTX<br />

against RefSeq to determine if the best BLAST hit was to the<br />

correct gene function, excluding self-hits. Based on these<br />

analyses, the cut-<strong>of</strong>f criteria for protein prediction were<br />

set as E-value < 0.01, identity > 40% and overlapping<br />

length > 23 aa to the corresponding best hit.<br />

Sequences with hits to RefSeq were assigned functional<br />

protein or pathway predictions based on the COG database<br />

(Tatusov et al., 2000) or KEGG database (Kanehisa and<br />

Goto, 2000). The cut-<strong>of</strong>f criteria for functional protein prediction<br />

based on orthologous groups using BLASTX <strong>analysis</strong><br />

against the COG database were established using the same<br />

in silico approach with 100 bp fragments <strong>of</strong> known functional<br />

genes as E-value < 0.1, identity > 40% and overlapping<br />

length > 23 aa to the corresponding best hit. The COG cut-<strong>of</strong>f<br />

criteria were also applied to the KEGG database for pathway<br />

prediction because <strong>of</strong> the similarity in database size. Taxonomic<br />

binning <strong>of</strong> the sequences was carried out using MEGAN<br />

with the default settings for all parameters (Huson et al.,<br />

2007); this program assigns likely taxonomic origin to<br />

sequences based on the NCBI taxonomy <strong>of</strong> closest BLAST<br />

hits. The taxonomic affiliations <strong>of</strong> the putative mRNA<br />

sequences were predicted using MEGAN to the family level,<br />

and the top BLAST hit for any higher-resolution taxonomic<br />

assignments. All non-rRNA sequences that had no RefSeq<br />

hits were BLASTX-queried against the nr database as well as<br />

against CAMERA un-assembled ORFs predicted from the<br />

Global Ocean Survey reads (http://camera.calit2.net/<br />

index.php) (Seshadri et al., 2007).<br />

Eukaryotic sequence annotation<br />

Eukaryotic transcripts were binned by MEGAN. Sequences<br />

were queried (BLASTX) against a curated database <strong>of</strong> protein<br />

sequences derived from all available complete eukaryotic<br />

organelle and nuclear genomes (currently, 46 eukaryotic<br />

genomes). Transcripts that matched a reference protein<br />

sequence with > 60% identity and an E-value < e -10 were<br />

retained and the reference protein for the cluster was used for<br />

functional annotation. Functional annotation was performed<br />

using Java-based Blast2go (Conesa et al., 2005) that annotates<br />

genes based on similarity searches with statistical<br />

<strong>analysis</strong> and highlighted visualization on directed acyclic<br />

graphs.<br />

16S rRNA gene libraries<br />

PCR amplification <strong>of</strong> ribosomal DNA was carried out using<br />

primers 27F and 1522R (Johnson, 1994). The PCR conditions<br />

were as follows: 3 min at 96°C, followed by 30 cycles <strong>of</strong><br />

denaturation at 95°C for 50 s, annealing at 58°C for 50 s,<br />

primer extension at 72°C for 1 min and a final extension at<br />

72°C for 10 min. PCR products were cleaned using the<br />

QIAquick PCR Purification Kit (Qiagen) and multiple PCR<br />

reactions were pooled and cloned into pCR2.1 vector using<br />

the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR<br />

amplifications included standard no-template controls.<br />

Clones from each sample (192) were sequenced at the University<br />

<strong>of</strong> Georgia Sequencing Facility on an ABI 3100<br />

(Applied Biosystems, Foster City, CA).<br />

Predicted highly expressed genes<br />

The PHX genes were determined for cultured representatives<br />

<strong>of</strong> three prokaryotic taxa that were well represented in the<br />

transcript libraries (Prochlorococcus, Roseobacter and<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


SAR11) using an algorithm developed by Karlin and Mrázek<br />

(2000). The algorithm is based on comparisons with codon<br />

usage patterns in genes expected to be frequently transcribed<br />

in a prokaryotic genome (ribosomal proteins, chaperone<br />

proteins, etc.). Environmental transcript sequences<br />

that had best BLAST hits to one <strong>of</strong> the PHX genes were<br />

similarly designated as PHX.<br />

Statistical <strong>analysis</strong><br />

A statistical program designed for comparing gene frequency<br />

in metagenomic data sets (Rodriguez-Brito et al., 2006) was<br />

used to compare the <strong>night</strong> and <strong>day</strong> mRNA sequences categorized<br />

based on COGs, KEGGs and proteins. The program<br />

was run with 20 000 repeated samplings with a sample size<br />

<strong>of</strong> 10 000 for COGs, 9000 for KEGGs and 25 000 for proteins.<br />

The significance level (P) was set at < 0.05.<br />

qPCR verifications<br />

To confirm that the composition <strong>of</strong> the pyrosequence library<br />

was representative <strong>of</strong> the initial mRNAs, transcripts <strong>of</strong> five<br />

genes that were top hits to multiple sequences in both transcript<br />

pools were quantified in the total RNA pool. The qPCR<br />

primer sets were designed for the P. marinus str. AS9601<br />

recA and psaA, a proteorhodopsin gene and a Na+/solute<br />

symporter (Ssf family) gene from P. ubique HTCC1062, and a<br />

probable integral membrane proteinase attributed to Psychr<strong>of</strong>lexus<br />

torquis ATCC 700755 (sequences and annealing<br />

temps in Table S6). Reverse transcription reactions were<br />

carried out on 200 ng <strong>of</strong> RNA using the Omniscript RT kit<br />

(Qiagen) in 20 ml volumes containing 1¥ RT buffer, 0.3 mg ml -1<br />

<strong>of</strong> random hexamers (Invitrogen), 1 ml <strong>of</strong> 5 mM dNTPs, 2 U <strong>of</strong><br />

reverse transcriptase and 20 U <strong>of</strong> RNase inhibitor (Promega)<br />

at 37°C for 1 h, followed by inactivation <strong>of</strong> the reverse transcriptase<br />

at 95°C for 2 min. The <strong>day</strong> : <strong>night</strong> ratio <strong>of</strong> each gene<br />

transcript in the RNA pools was determined by qPCR amplification<br />

<strong>of</strong> a serial dilution <strong>of</strong> cDNAs in triplicate, and calculation<br />

<strong>of</strong> the difference in cycle threshold values (DCT)<br />

between the two samples. Quantitative amplification was<br />

done using the iCycler iQ RT PCR detection system (Bio-<br />

Rad) in a 20 ml reaction volume containing 10 ml <strong>of</strong> iQ SYBR<br />

Green Supermix (Bio-Rad), 0.4 ml each <strong>of</strong> 10 mM <strong>of</strong> the<br />

forward and reverse primers and 1 ml <strong>of</strong> the cDNA template.<br />

PCR conditions included a preliminary denaturation at 95°C<br />

for 3 min followed by 45 cycles <strong>of</strong> 95°C for 15 s, annealing for<br />

1.5 s, 95°C for 1 min and 55°C for 1 min. A melt curve was<br />

generated following the PCR, beginning with 55°C and<br />

increasing 0.4°C every 10 s until 95°C. A PCR control without<br />

an initial RT step was included with every set <strong>of</strong> reactions.<br />

Acknowledgements<br />

We thank the Captain and crew <strong>of</strong> the R/V Kilo Moana and Dr<br />

David Karl. Jennifer Oliver assisted with sample processing.<br />

Jonathan Badger assisted with data processing. Funding was<br />

provided by The Gordon and Betty Moore Foundation,<br />

National Science Foundation grants MCB-0702125 (M.A.M.),<br />

EF-0722374 (A.E.A) and OCE-0425363 (J.P.Z.), and the NSF<br />

C-MORE Center for Microbial Oceanography.<br />

References<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1373<br />

Allen, A.E., Vardi, A., and Bowler, C. (2006) An ecological<br />

and evolutionary context for integrated nitrogen metabolism<br />

and related signaling pathways in marine diatoms.<br />

Curr Opin Plant Biol 9: 264–273.<br />

Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez,<br />

D., Putnam, N.H., et al. (2004) The genome <strong>of</strong> the<br />

diatom Thalassiosira pseudonana: ecology, evolution, and<br />

metabolism. Science 306: 79–86.<br />

Bauer, J.E., Williams, P.M., and Druffel, E.R.M. (1992) 14C<br />

activity <strong>of</strong> dissolved organic carbon fractions in the northcentral<br />

Pacific and Sargasso Sea. Nature 357: 667–670.<br />

Belasco, J.G. (1993) mRNA degradation in prokaryotic cells:<br />

an overview. In Control <strong>of</strong> Messenger RNA Stability.<br />

Belasco, J.G., Brawerman, G. (eds). San Diego, CA, USA:<br />

Academic Press, pp. 3–11.<br />

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,<br />

and Wheeler, D.L. (2007) GenBank. Nucleic Acids Res 35:<br />

D21–D25.<br />

Bürgmann, H., Widmer, F., Sigler, W.V., and Zeyer, J. (2003)<br />

mRNA extraction and reverse transcription-PCR protocol<br />

for detection <strong>of</strong> nifH gene expression by Azotobacter vinelandii<br />

in soil. Appl Environ Microbiol 69: 1928–1935.<br />

Bürgmann, H., Howard, E.C., Ye, W., Sun, F., Sun, S., Napierala,<br />

S., and Moran, M.A. (2007) Transcriptional response<br />

<strong>of</strong> Silicibacter pomeroyi DSS-3 to dimethylsulfoniopropionate<br />

(DMSP). Environ Microbiol 9: 2742–2755.<br />

Campbell, L., and Vaulot, D. (1993) Photosynthetic picoplankton<br />

community structure in the subtropical North<br />

Pacific Ocean near Hawaii (Station ALOHA). Deep Sea<br />

Res. Part I Oceanogr Res Pap 40: 2043–2060.<br />

Carpenter, L.J., Lewis, A.C., Hopkins, J.R., Read, K.A.,<br />

Longley, I.D., and Gallagher, M.W. (2004) Uptake <strong>of</strong><br />

methanol to the North Atlantic Ocean surface. Global Biogeochem<br />

Cycles 18: GB4027.<br />

Cavender-Bares, K.K., Karl, D.M., and Chisholm, S.W.<br />

(2001) Nutrient gradients in the western North Atlantic<br />

Ocean: relationship to <strong>microbial</strong> community structure and<br />

comparison to patterns in the Pacific Ocean. Deep Sea<br />

Res. Part I Oceanogr Res Pap 48: 2373–2395.<br />

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon,<br />

M., and Robles, M. (2005) Blast2GO: a universal tool for<br />

annotation, visualization and <strong>analysis</strong> in functional genomics<br />

research. Bioinformatics 21: 3674–3676.<br />

DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam,<br />

S.J., Frigaard, N.-U., et al. (2006) Community genomics<br />

among stratified <strong>microbial</strong> assemblages in the ocean’s<br />

interior. Science 311: 496–503.<br />

Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden,<br />

A.Z., Robbens, S., et al. (2006) Genome <strong>analysis</strong> <strong>of</strong> the<br />

smallest free-living eukaryote Ostreococcus tauri unveils<br />

many unique features. Proc Natl Acad Sci USA 103:<br />

11647–11652.<br />

Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L.,<br />

Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008)<br />

Microbial community gene expression in ocean surface<br />

waters. Proc Natl Acad Sci USA 105: 3805–3810.<br />

Fuhrman, J.A., Comeau, D.E., Hagstrom, A., and Chan, A.M.<br />

(1988) Extraction from natural planktonic microorganisms<br />

<strong>of</strong> DNA suitable for molecular biological studies. Appl<br />

Environ Microbiol 54: 1426–1429.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


1374 R. S. Poretsky et al.<br />

Gelder, R.N.V., von Zastrow, M.E., Yool, A., Dement, W.C.,<br />

Barchas, J.D., and Eberwine, J.H. (1990) Amplified RNA<br />

synthesized from limited quantities <strong>of</strong> heterogeneous<br />

cDNA. Proc Natl Acad Sci USA 87: 1663–1667.<br />

Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna,<br />

P., and Joint, I. (2008) Detection <strong>of</strong> large numbers <strong>of</strong> novel<br />

sequences in the metatranscriptomes <strong>of</strong> complex marine<br />

<strong>microbial</strong> communities. PLoS ONE 3: e3042.<br />

Giovannoni, S.J., Hayakawa, D.H., Tripp, H.J., Stingl, U.,<br />

Givan, S.A., Cho, J.-C., et al. (2008) The small genome <strong>of</strong><br />

an abundant coastal ocean methylotroph. Environ Microbiol<br />

10: 1771–1782.<br />

Heikes, B.G., Chang, W.N., Pilson, M.E.Q., Swift, E., Singh,<br />

H.B., Guenther, A., et al. (2002) Atmospheric methanol<br />

budget and ocean implication. Global Biogeochem Cycles<br />

16: 80.81–80.80.13.<br />

Howard, E.C., Henriksen, J.R., Buchan, A., Reisch, C.R.,<br />

Burgmann, H., Welsh, R., et al. (2006) Bacterial taxa that<br />

limit sulfur flux from the ocean. Science 314: 649–652.<br />

Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007)<br />

MEGAN <strong>analysis</strong> <strong>of</strong> metagenomic data. Genome Res 17:<br />

377–386.<br />

Ingraham, J.L., Maaløe, O., and Neidhardt, F.C. (1983)<br />

Growth <strong>of</strong> the Bacterial Cell. Sunderland, MA, USA:<br />

Sinauer Associates.<br />

Johnson, J.L. (1994) Similarity <strong>analysis</strong> <strong>of</strong> rRNAs. In Methods<br />

for General and Molecular Bacteriology. Gerhardt, P.,<br />

Murray, R.G.E., Wood, W.A., and Krieg, N.R. (eds). Washington,<br />

DC: American Society for Microbiology, pp. 683–<br />

700.<br />

Johnson, Z.I., Zinser, E.R., Coe, A., McNulty, N.P., Woodward,<br />

E.M.S., and Chisholm, S.W. (2006) Niche partitioning<br />

among Prochlorococcus ecotypes along ocean-scale<br />

environmental gradients. Science 311: 1737–1740.<br />

Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia<br />

<strong>of</strong> genes and genomes. Nucleic Acids Res 28: 27–30.<br />

Karl, D., Letelier, R., Tupas, L., Dore, J., Christian, J., and<br />

Hebel, D. (1997) The role <strong>of</strong> nitrogen fixation in biogeochemical<br />

cycling in the subtropical North Pacific<br />

Ocean. Nature 388: 533–538.<br />

Karl, D.M., and Lukas, R. (1996) The Hawaii Ocean Timeseries<br />

(HOT) program: background, rationale and field<br />

implementation. Deep Sea Res. Part II Top Stud Oceanogr<br />

43: 129–156.<br />

Karlin, S., and Mrázek, J. (2000) Predicted highly expressed<br />

genes <strong>of</strong> diverse prokaryotic genomes. J Bacteriol 182:<br />

5238–5250.<br />

Kiene, R.P., Linn, L.J., and Bruton, J.A. (2000) New and<br />

important roles for DMSP in marine <strong>microbial</strong> communities.<br />

J Sea Res 43: 209–224.<br />

Lander, E.S., and Waterman, M.S. (1988) Genomic mapping<br />

by fingerprinting random clones: a mathematical <strong>analysis</strong>.<br />

Genomics 2: 231–239.<br />

Li, W., and Godzik, A. (2006) Cd-hit: a fast program for<br />

clustering and comparing large sets <strong>of</strong> protein or nucleotide<br />

sequences. Bioinformatics 22: 1658–1659.<br />

Liang, P., and Pardee, A.B. (1992) Differential display <strong>of</strong><br />

eukaryotic messenger RNA by means <strong>of</strong> the polymerase<br />

chain reaction. Science 257: 967–971.<br />

McDonald, S.M., Sarno, D., Scanlan, D.J., and Zingone, A.<br />

(2007) Genetic diversity <strong>of</strong> eukaryotic ultraphytoplankton in<br />

the Gulf <strong>of</strong> Naples during an annual cycle. Aquat Microb<br />

Ecol 50: 75–89.<br />

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader,<br />

J.S., Bemben, L.A., et al. (2005) Genome sequencing in<br />

micr<strong>of</strong>abricated high-density picolitre reactors. Nature 437:<br />

376–380.<br />

Mary, I., Garczarek, L., Tarran, G.A., Kolowrat, C., Terry,<br />

M.J., Scanlan, D.J., et al. (2008) Diel rhythmicity in amino<br />

acid uptake by Prochlorococcus. Environ Microbiol 10:<br />

2124–2131.<br />

Morris, R.M., Rappe, M.S., Connon, S.A., Vergin, K.L.,<br />

Siebold, W.A., Carlson, C.A., and Giovannoni, S.J. (2002)<br />

SAR11 clade dominates ocean surface bacterioplankton<br />

communities. Nature 420: 806–810.<br />

Mou, X., Sun, S., Edwards, R.A., Hodson, R.E., and Moran,<br />

M.A. (2008) Bacterial carbon processing by generalist<br />

species in the coastal ocean. Nature 451: 708–711.<br />

Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., and<br />

Maltsev, N. (1999) The use <strong>of</strong> gene clusters to infer functional<br />

coupling. Proc Natl Acad Sci USA 96: 2896–2901.<br />

Poretsky, R.S., Bano, N., Buchan, A., LeCleir, G.,<br />

Kleikemper, J., Pickering, M., et al. (2005) Analysis <strong>of</strong><br />

<strong>microbial</strong> gene transcripts in environmental samples. Appl<br />

Environ Microbiol 71: 4121–4126.<br />

Poretsky, R.S., Bano, N., Buchan, A., Moran M.A., and<br />

Hollibaugh, J.T. (2008) Environmental transcriptomics: a<br />

method to access expressed genes in complex <strong>microbial</strong><br />

communities. In Molecular Microbial Ecology Manual.<br />

Kowalchuk, G.A., de Bruijn, F.J., Head, I.M., Akkermans,<br />

A.D.L., and van Elsas, J.D. (eds). Dordrecht, Netherlands:<br />

Springer, pp. 1892–1904.<br />

Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI<br />

Reference Sequence (RefSeq): a curated non-redundant<br />

sequence database <strong>of</strong> genomes, transcripts and proteins.<br />

Nucleic Acids Res 33: D501–D504.<br />

Rodriguez-Brito, B., Rohwer, F., and Edwards, R. (2006) An<br />

application <strong>of</strong> statistics to comparative metagenomics.<br />

BMC Bioinformatics 7: 162.<br />

Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B.,<br />

Williamson, S., Yooseph, S., et al. (2007) The Sorcerer II<br />

Global Ocean Sampling Expedition: Northwest Atlantic<br />

through Eastern Tropical Pacific. PLoS Biol 5: e77.<br />

Schaefer, J.K., Goodwin, K.D., McDonald, I.R., Murrell, J.C.,<br />

and Oremland, R.S. (2002) Leisingera methylohatidivorans<br />

gen. nov., sp nov., a marine methylotroph that grows on<br />

methyl bromide. Int J Syst Evol Microbiol 52: 851–859.<br />

Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier,<br />

M. (2007) CAMERA: a community resource for metagenomics.<br />

PLoS Biol 5: 394–397.<br />

Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V.<br />

(2000) The COG database: a tool for genome-scale <strong>analysis</strong><br />

<strong>of</strong> protein functions and evolution. Nucleic Acids Res<br />

28: 33–36.<br />

Ward, B.B., Kilpatrick, K.A., Novelli, P.C., and Scranton, M.I.<br />

(1987) Methane oxidation and methane fluxes in the ocean<br />

surface-layer and deep anoxic waters. Nature 327: 226–<br />

229.<br />

Wawrik, B., Paul, J.H., and Tabita, F.R. (2002) Real-time<br />

PCR quantification <strong>of</strong> rbcL (ribulose-1,5-bisphosphate<br />

carboxylase/oxygenase) mRNA in diatoms and pelagophytes.<br />

Appl Environ Microbiol 68: 3771–3779.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375


Woodall, C.A., Warner, K.L., Oremland, R.S., Murrell, J.C.,<br />

and McDonald, I.R. (2001) Identification <strong>of</strong> methyl halideutilizing<br />

genes in the methyl bromide-utilizing bacterial<br />

strain IMB-1 suggests a high degree <strong>of</strong> conservation <strong>of</strong><br />

methyl halide-specific genes in gram-negative bacteria.<br />

Appl Environ Microbiol 67: 1959–1963.<br />

Zehr, J.P., Waterbury, J.B., Turner, P.J., Montoya, J.P.,<br />

Omoregie, E., Steward, G.F., et al. (2001) Unicellular<br />

cyanobacteria fix N 2 in the subtropical North Pacific Ocean.<br />

Nature 412: 635–638.<br />

Zhou, J.H. (2003) Microarrays for bacterial detection and<br />

<strong>microbial</strong> community <strong>analysis</strong>. Curr Opin Microbiol 6: 288–<br />

294.<br />

Supporting information<br />

Additional Supporting Information may be found in the online<br />

version <strong>of</strong> this article:<br />

Fig. S1. Transcript mapping to the KEGG histidine metabolism<br />

pathway for P. marinus (A) and the vitamin B6 metabolism<br />

pathway for P. ubique (B) at <strong>night</strong>. Blue shading indicates<br />

that transcripts were found; grey indicates genes that are<br />

present in the genome, but no transcripts were found; white<br />

indicates genes that are not present in the reference<br />

genomes.<br />

Fig. S2. Quality control <strong>of</strong> the pyrosequences using qPCR<br />

verifications <strong>of</strong> transcript ratios for five genes: recA and psaA<br />

from P. marinus str. AS9601, a bacteriorhodopsin and a<br />

Na+/solute symporter (Ssf family) gene from P. ubique<br />

HTCC1062, and a probable integral membrane proteinase<br />

attributed to P. torquis ATCC 700755. The <strong>night</strong> : <strong>day</strong> ratio <strong>of</strong><br />

transcripts in the pyrosequence libraries is plotted against the<br />

same ratio in the original total RNA fraction.<br />

<strong>Comparative</strong> Metatranscriptomic Analysis 1375<br />

Table S1. Results <strong>of</strong> bioinformatic pipeline for 100 and<br />

200 bp fragments from groups for which there are no genome<br />

sequences currently available. BACs from uncultured marine<br />

taxa (two from SAR86 and one from SAR116) were fragmented<br />

into random 100 bp pieces, using just the coding<br />

regions. Fragments were blasted against RefSeq, not allowing<br />

a self-hit. As controls, we did the same for P. ubique<br />

HTCC1062 and P. marinus MIT9312.<br />

Table S2. Estimates <strong>of</strong> coverage using two different models.<br />

The Lander–Waterman model uses the 16S rRNA clone<br />

library data to establish a taxon-abundance model for the<br />

system at a similarity level <strong>of</strong> 99%, and is based on the<br />

assumptions that each taxon produces 1000 transcripts at<br />

any given time and all expressed genes are expressed<br />

equally. The Chao1 richness estimators for COGs are computed<br />

using EstimateS (version 8.0, R. K. Colwell, http://<br />

purl.oclc.org/estimates).<br />

Table S3. KEGG pathways for three taxonomic bins<br />

(P. marinus, P. ubique and Roseobacters) significantly overrepresented<br />

in the <strong>night</strong> (grey shading) and <strong>day</strong> (no shading)<br />

transcriptomes (P < 0.10).<br />

Table S4. COGs significantly overrepresented in the <strong>night</strong><br />

(grey shading) and <strong>day</strong> (no shading) transcriptomes<br />

(P < 0.05).<br />

Table S5. Genes significantly overrepresented in the <strong>night</strong><br />

(grey shading) and <strong>day</strong> (no shading) transcriptomes<br />

(P < 0.05).<br />

Table S6. Primer sets used in qPCR.<br />

Please note: Wiley-Blackwell are not responsible for the<br />

content or functionality <strong>of</strong> any supporting materials supplied<br />

by the authors. Any queries (other than missing material)<br />

should be directed to the corresponding author for the<br />

article.<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!