Comparative day/night metatranscriptomic analysis of microbial ...

Environmental Microbiology (2009) 11(6), 1358–1375 doi:10.1111/j.1462-2920.2008.01863.x 

Comparative day/night metatranscriptomic analysis 

of microbial communities in the North Pacific 

subtropical gyreemi_1863 1358..1375 

Rachel S. Poretsky, 1 Ian Hewson, 2 Shulei Sun, 1 

Andrew E. Allen, 3 Jonathan P. Zehr2 and 

Mary Ann Moran1 * 

1University of Georgia, Department of Marine Sciences, 

Athens, GA 30602, USA. 

2University of California Santa Cruz, Department of 

Ocean Sciences, Santa Cruz, CA 95064, USA. 

3J. Craig Venter Institute, Microbial and Environmental 

Genomics, San Diego, CA 92121, USA. 

Summary 

Metatranscriptomic analyses of microbial assemblages 

(< 5 mm) from surface water at the Hawaiian 

Ocean Time-Series (HOT) revealed community-wide 

metabolic activities and day/night patterns of differential 

gene expression. Pyrosequencing produced 

75 558 putative mRNA reads from a day transcriptome 

and 75 946 from a night transcriptome. Taxonomic 

binning of annotated mRNAs indicated that Cyanobacteria 

contributed a greater percentage of the transcripts 

(54% of annotated sequences) than expected 

based on abundance (35% of cell counts and 21% 16S 

rRNA of libraries), and may represent the most 

actively transcribing cells in this surface ocean community 

in both the day and night. Major heterotrophic 

taxa contributing to the community transcriptome 

included a-Proteobacteria (19% of annotated 

sequences, most of which were SAR11-related) and 

g-Proteobacteria (4%). The composition of transcript 

pools was consistent with models of prokaryotic gene 

expression, including operon-based transcription 

patterns and an abundance of genes predicted to be 

highly expressed. Metabolic activities that are shared 

by many microbial taxa (e.g. glycolysis, citric acid 

cycle, amino acid biosynthesis and transcription and 

translation machinery) were well represented among 

the community transcripts. There was an overabundance 

of transcripts for photosynthesis, C1 

metabolism and oxidative phosphorylation in the 

Received 17 September, 2008; accepted 3 December, 2008. *For 

correspondence. E-mail mmoran@uga.edu; Tel. 706-542-6481; Fax 

706-542-5888. 

day compared with night, and evidence that energy 

acquisition is coordinated with solar radiation levels 

for both autotrophic and heterotrophic microbes. In 

contrast, housekeeping activities such as amino acid 

biosynthesis, membrane synthesis and repair, and 

vitamin biosynthesis were overrepresented in the 

night transcriptome. Direct sequencing of these environmental 

transcripts has provided detailed information 

on metabolic and biogeochemical responses of a 

microbial community to solar forcing. 

Introduction 

© 2009 The Authors 

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd 

Oceanic subtropical gyres make up 40% of the Earth’s 

surface and play critical roles in carbon fixation and nutrient 

cycling. The Hawaii Ocean Time-Series (HOT) in the North 

Pacific subtropical gyre was established to provide a longterm 

perspective on oceanographic properties of such 

systems (Karl and Lukas, 1996) and has served as the 

focus of substantial research into the role of marine microorganisms 

in ocean biogeochemistry (Karl et al., 1997; 

Cavender-Bares et al., 2001; Zehr et al., 2001). Station 

ALOHA, the core study site at HOT, is characterized by 

warm (> 23°C) surface waters with low NO3 - concentrations 

(< 15 nM), seasonally variable surface mixed-layers 

(10–120 m), low standing biomass of living organisms 

(10–15 mgCl -1 ) and a persistent deep (75–140 m) chlorophyll 

a maximum layer. Since 1988, regular measurements 

of physical, chemical and biological parameters have been 

obtained with monthly ship-based monitoring as well as 

bottom-moored instruments and buoys. Recent metagenomic 

sampling efforts at Station ALOHA have provided 

information about the genes harboured by the bacterioplankton 

community and how they are distributed with 

depth (DeLong et al., 2006). Characterizing patterns of 

expression of these microbial genes and identifying what 

factors induce their expression is the next critical step in 

understanding this oceanic ecosystem. 

Analogous to metagenomics, environmental transcriptomics 

(metatranscriptomics) retrieves and sequences 

environmental mRNAs from a microbial assemblage 

without prior knowledge of what genes the community 

might be expressing (Poretsky et al., 2005; Frias-Lopez 

et al., 2008). Thus it provides a less biased perspective on

microbial gene expression in situ compared with other 

approaches (Wawrik et al., 2002; Bürgmann et al., 2003; 

Zhou, 2003). Environmental transcriptomics protocols are 

technically difficult, however, as prokaryotic mRNAs generally 

lack the poly(A) tails that make isolation of eukaryotic 

messages relatively straightforward (Liang and 

Pardee, 1992) and because of the relatively short halflives 

of mRNAs (Belasco, 1993). In addition, mRNAs are 

much less abundant than rRNAs in total RNA extracts, 

thus an rRNA background often overwhelms mRNA 

signals. 

A first analysis of environmental transcriptomes by creating 

clone libraries using random primers to reversetranscribe 

and amplify environmental mRNAs was 

successful in two different natural environments 

(Poretsky et al., 2005), but results were biased by selection 

of the random primers used to initiate cDNA synthesis. 

Techniques to linearly amplify mRNA obviate the 

need for random primers in the amplification step and 

make it possible to use less starting material (Gelder 

et al., 1990), while recently developed pyrosequencing 

technologies allow direct sequencing (without cloning) 

(Margulies et al., 2005). Initial application of this 

approach at Station ALOHA (Frias-Lopez et al., 2008) 

and in coastal water mesocosms (Gilbert et al., 2008) 

demonstrated its utility for characterizing microbial community 

gene expression. 

Here we use environmental transcriptomics to elucidate 

day/night differences in gene expression in surface 

waters of the North Pacific subtropical gyre (Karl and 

Lukas, 1996). This analysis provides information on the 

dominant metabolic processes within the bacterioplankton 

assemblages and reveals changes in expression patterns 

of biogeochemically relevant processes. 

Results 

cDNA sequence annotation 

The cDNAs prepared from amplified RNA (collected from 

the 0.2–5 mm size fraction) ranged in size from 100 bp to 

1 kb, with the majority between 200 and 500 bp. The 

average picoliter reactor pyrosequencing read length 

was 99 bp, typical for the GS 20 sequencing platform. 

Predicted rRNA sequences were removed based on 

sequence similarity to the nt database using BLASTN. 

While more laborious than our initial approach that used 

sequence similarity to the RDP II database supplemented 

with a 18S, 23S and 28S rRNA database from genome 

sequences, it identified nearly all of the rRNA sequences 

in our libraries. Accurate identification of rRNAs is crucial 

because of numerous misidentified sequences in the 

RefSeq protein database (i.e. rRNA sequences that are 

incorrectly annotated as putative proteins). Relatively low 

rRNA sequence contamination (37%) compared with the 

Comparative Metatranscriptomic Analysis 1359 

rRNA content of prokaryotic cells (> 80%; Ingraham et al., 

1983) indicated that the steps for excluding rRNAs 

through selective degradation and subtractive hybridization 

were largely successful. 

Sequences remaining after deletion of rRNA 

sequences (75 558 from the day and 75 946 from the 

night) were categorized as possible protein encoding 

sequences and BLASTX-queried against the NCBI 

curated, non-redundant reference sequence database 

(RefSeq) to determine putative functions (Fig. 1). About 

one-third of HOT pyrosequences in each library met the 

criteria for gene predictions determined empirically by in 

silico analysis of known functional gene sequences fragmented 

into 100 bp pieces (see Experimental procedures 

for more details). This is nearly twice the fraction of reads 

identified in metagenomic efforts with similar pyrosequencing 

read lengths (Frias-Lopez et al., 2008; Mou 

et al., 2008), as might be expected for sequences biased 

towards coding regions of genomes. These sequences 

were subsequently assigned to the function of their best 

hit in RefSeq. Transcript abundance was analysed as 

relative abundance within the collective community transcriptome 

rather than per-gene expression levels (see 

Frias-Lopez et al., 2008). Empirically derived criteria were 

established in separate in silico analyses for the Clusters 

of Orthologous Groups (COG) and Kyoto Encyclopedia of 

Genes and Genomes (KEGG) databases, which contain 

fewer sequences than RefSeq (Fig. 1). Some of the 

sequences without hits in RefSeq were similar to proteins 

in the Global Ocean Sampling database, indicating that 

similar sequences have been found in marine bacterioplankton 

communities, but functional annotation is not 

currently possible. 

At the end of the annotation pipeline, half of the possible 

protein-encoding sequences in each library had no 

significant hits to previously sequenced genes. To 

examine how sequences from uncultured marine bacterial 

taxa might decrease annotation success or skew 

taxonomic assignments, we randomly selected 100 bp 

sequences from the coding regions of genome fragments 

from SAR86 and SAR116 cells captured in environmental 

BAC libraries (SAR86 BAC, AF279106; SAR86 BAC, 

AY552545; SAR116 BAC, AY744399). Excluding selfhits, 

approximately 60% of the sequences from the BACs 

had no hits in RefSeq (Table S1). In a similar analysis of 

coding sequences from cultured taxa with genome 

sequences available (Pelagibacter ubique HTCC1062 

and Prochlorococcus marinus MIT9312), only ~20% of 

the sequences had no hits in RefSeq. Many unannotated 

sequences in the HOT libraries are therefore likely to be 

transcripts from poorly known taxa, but also include 

some transcripts from well-known taxa with poor identity 

to sequence databases for that particular 100 bp fragment. 

In support of the latter, a preliminary analysis of a 


Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

1360 R. S. Poretsky et al. 

37% 

88,916 

rRNA sequences 

BLASTX against 

COG 

240,422 Total 454 

Sequences 

63% 

48,648 Identified sequences 

10% 

BLASTN against nt 

21% 

151,504 Possible proteinencoding 

sequences 

BLASTX against RefSeq 


KEGG 

15% 

24,474 35,927 

sequences sequences 

marine environmental transcriptome consisting of longer 

reads (~200 bp; 454 GS FLX sequencing platform; R.S. 

Poretsky and M.A. Moran, unpublished; and Table S1) 

resulted in twice the frequency of annotated sequences 

as the HOT metatranscriptome. For the 100 bp genome 

fragments from uncultured taxa that had significant hits 

in RefSeq, they were almost always to a gene from an 

organism in the same phylum (90%) or subphylum 

(70%), and thus did not significantly skew the taxonomic 

assignments (Table S1). SAR86, SAR116 and other currently 

recognized uncultured groups made up ~4% of the 

16S rRNA amplicons from these samples (see below). 

Finally, to examine the possibility that the unidentified 

sequences were from non-protein-coding regions, these 

sequences were BLAST-queried to tRNA genes, 5S rRNA 

genes and intergenic region sequences from three 

P. marinus genomes (MIT9301, MIT9312 and AS601) 

and two P. ubique genomes (HTCC1002 and 

HTCC1062). Based on this analysis, ~4% of the 76 327 

unidentified sequences were from non-protein-coding 

regions of these genomes, and these primarily hit intergenic 

regions. 

BLASTX 

against nr 

0.07% 

42% 

163 sequences 

102,856 Unidentified 

32% 

76,327 

unidentified 

sequences 


GOS 

11% 

26,366 GOS sequences 

Fig. 1. The mRNA annotation pipeline developed for 454 transcript reads showing combined counts for the day and night transcriptomes. All 

percentages are relative to the total number of sequences entering the pipeline. 

Community composition and taxonomic origin 

of transcripts 

Prochlorococcus are the most abundant Cyanobacteria at 

Station ALOHA (> 95% of photosynthetic picoplankton 

cells; Campbell and Vaulot, 1993) and in this study 

accounted for approximately 2 ¥ 10 5 cell ml -1 (based on 

flow cytometric counting; http://hahana.soest.hawaii.edu/ 

hot/hot-dogs/), or ~30% of the total microbial community 

(Fig. 2). Heterotrophic bacteria (including phototrophs) 

were numerically dominant with ~5 ¥ 10 5 cell ml -1 , 

accounting for ~65% of the microbial community present 

at the time of sampling. Direct counts also indicated the 

presence of ~800 cell ml -1 of pigmented nanoeukaryotes 

(0.2%; Fig. 2). 

Companion PCR-based 16S rRNA clone libraries were 

generated from DNA collected in tandem with the RNA 

samples and demonstrated close agreement with the flow 

cytometric data in terms of taxonomic composition at 

Station ALOHA. Cyanobacteria accounted for ~20% of the 

16S rRNA sequences, and heterotrophic bacterial groups 

were ~80% (Fig. 3). Among the heterotrophic 16S rRNA 



Depth (m) 

0 

50 

100 

150 

200 

sequences, Proteobacteria were most abundant (41%; 

Fig. 3) and were dominated by a-Proteobacteria (22%), 

b-Proteobacteria (8%) and g-Proteobacteria (8%). 

Bacteroidetes (8%) and Firmicutes (12%, biased towards 

the day sample) were also well represented. 

Taxonomically binned mRNA sequences were compared 

with community composition data to ask whether 

taxa contributed to the HOT community mRNA in proportion 

to their representation in the microbial assemblage 

(i.e. whether taxa are equally transcriptionally active on a 

per-cell basis). Cyanobacteria dominated the transcript 

libraries (55% of sequences) with about twofold higher 

representation than in the 16S rRNA amplicons or the cell 

count data (Fig. 3), indicating that there is more gene 

expression in these autotrophic bacterioplankton than in 

co-occurring heterotrophs (or possibly that their transcripts 

are longer-lived). When relative 16S rRNA abundance 

was calculated among just the heterotrophic 

groups (i.e. with cyanobacterial sequences removed), 

many taxa had similar contributions to the transcript pool 

and amplicon pool, suggesting comparable levels of 

transcriptional activity on a per-gene basis within the limits 

of recognized biases of PCR amplification (Fig. 3). 

0 200 400 600 

chla (10 -3 μg l -1 ) 

Prochlorococcus x 10 3 cells ml -1 

Synechococcus x 10 2 cells ml -1 

Nanoeukaryotes x 10 2 cells ml -1 

Heterotrophic bateria x 10 3 cells ml -1 


Fig. 2. Depth profiles of Prochlorococcus-like, Synechococcus-like, heterotrophic bacteria and pigmented nanoeukaryotes during the HOT-175 

cruise, as determined by flow cytometry. The horizontal line indicates the mixed layer depth. The depth profile for chlorophyll a is also 

indicated. Data were collected through the HOT project and downloaded from the HOT Data Organization and Graphical System 

(http://hahana.soest.hawaii.edu/hot/hot-dogs/). 

Proteobacteria contributed the second largest number of 

transcript sequences (28%), most of which were attributed 

to a-Proteobacteria (19%) and g-Proteobacteria 

(4%). Approximately 2% of the total transcripts were of 

eukaryotic origin. Comparing putative taxonomic assignments 

of transcripts between day and night, Cyanobacteria 

contributed equally to the day and night transcriptome 

(55% versus 56%) as did a-Proteobacteria (40% versus 

45% of heterotrophic transcripts) and g-Proteobacteria 

(11% versus 8% of heterotrophic transcripts) (Fig. 3). 

More detailed taxonomic assignment of transcripts was 

carried out for the best represented clades. The Cyanobacteria 

transcripts were dominated by Prochlorococcuslike 

sequences most similar to P. marinus AS9601, 

P. marinus MIT 9301 and P. marinus MIT 9312 (Table 1). 

The a-Proteobacteria, the most transcriptionally active 

among the heterotrophic groups, mostly contained 

sequences with similarity to the SAR11 group members 

P. ubique HTCC1002 and P. ubique HTCC1062 (~10% of 

prokaryotic transcripts). Roseobacter-like sequences 

were also represented and were primarily assigned to 

Dinoroseobacter shibae DFL 12, Jannaschia sp. CCS1, 

Silicibacter pomeroyi DSS-3, Roseobacter denitrificans 




A 

16S rRNA 

genes 

mRNA 

B 

16S rRNA 

genes 

mRNA 

Och 114 and Silicibacter sp. TM1040 (Table 1 and Fig. 4). 

These assignments do not imply that these actual species 

were present at the time of sample collection, but rather 

they represent the best current sequence matches for 

some of the more abundant environmental transcripts. 

Transcriptome coverage 

Cyanobacteria 

18 % 

Cyanobacteria 

55 % 

Cyanobacteria 

21 % 

Cyanobacteria 

56 % 

Other 

82% 

Other 

45% 

Other 

79% 

Other 

44% 

To estimate transcriptome coverage, 16S rRNA clone 

library data were used to establish a taxon-abundance 

model for the HOT community at an identity level of 99%. 

Assuming that each taxon expresses 1000 different 

genes at any given time (based on the Escherichia coli 

model; Ingraham et al., 1983) and that genome coverage 

Cyanobacteria 

Alphaproteobacteria 

Gammaproteobacteria 

Betaproteobacteria 

Deltaproteobacteria 

Epsilonproteobacteria 

Other Proteobacteria 

Actinobacteria 

Bacteroidetes 

Chlamydiae 

Chlorobi 

Chloroflexi 

Chrysiogenetes 

Acidobacteria 

Firmicutes 

Lentispaerae 

Planctomycetes 

Spirochaetes 

Thermotogae 

Verrucomicrobia 

Fig. 3. Contribution of taxa to the 16S rRNA amplicon pool and transcript pool for the day (A) and night (B) samples. Taxonomy is presented 

to the phylum level (based on NCBI taxonomy) except for Proteobacteria, which is at the subphylum level. The dashed red lines indicate 

cyanobacterial abundance in the night sample as determined by flow cytometric counting. 

follows a Lander–Waterman model (Lander and Waterman, 

1988), we estimate that the most abundant taxon in 

the day or night sample had over 90% transcriptome 

coverage (i.e. 90% of the expressed genes were 

sequenced at least once), while the 15 most abundant 

taxa had more than half of their transcriptome represented 

(Table S2). Alternately, we determined the singletons 

and doubletons among the COG categories (i.e. the 

number of COGs containing only one or two sequences) 

and applied the Chao1 index of diversity to determine the 

theoretical abundance of COGs in the day and night. The 

sequencing effort captured about 80% of the COGs predicted 

to be present in the night transcriptome and 70% of 

the COGs predicted for the day transcriptome (Table S2). 



% PHX Genes Frequency 

Frequency 

Based on these coverage estimates, increased 

sequencing depth would have been required to fully 

capture some specialized processes carried out by rarer 

members of the HOT community, but frequently transcribed 

genes from abundant taxa were well represented. 

In support of this, transcript mapping to the three P. mari- 

Number of Adjacent Genes 

Fig. 4. Evidence for prokaryotic gene expression patterns in the community transcriptome based on P. marinus, P. ubique and Roseobacter 

genome bins. 

A. Operon-based expression was evaluated by comparing the number of adjacent transcripts (closed circles) to the number of adjacent genes 

found in 1000 random samples of the same size from the reference genome (black lines). 

B. Preferential representation of transcripts from genes predicted to be highly expressed was evaluated by comparing the per cent of PHX 

genes in the reference genome (grey bar) to the per cent in the transcript pool (black bar). Differences between transcript pools and reference 

genomes were significant for both operon and PHX analyses (Wilcoxon signed-rank test; P < 0.05). 

Table 1. Number of sequences from the community transcriptome 

with highest homology to the listed reference genomes, as determined 

by top BLASTX hit to RefSeq. 

Night Day 

Prochlorococcus marinus str. MIT 9301 6309 6292 

Prochlorococcus marinus str. AS9601 3214 2849 

Pelagibacter ubique HTCC1002 2541 1851 

Prochlorococcus marinus str. MIT 9312 1430 1264 

Pelagibacter ubique HTCC1062 1308 944 

Dinoroseobacter shibae DFL 12 48 34 

Jannaschia sp. CCS1 41 27 

Silicibacter pomeroyi DSS-3 39 30 

Roseobacter denitrificans Och 114 30 28 

Silicibacter sp. TM1040 19 26 


nus and two P. ubique reference genomes showed 

sequences with homology to approximately half the 

genes, at coverage depths ranging from 1 to nearly 500 

hits per gene (Fig. 5). Moreover, many of the reference 

genes with the greatest coverage are those mediating 

metabolic processes expected to be dominant in the HOT 

bacterioplankton community (e.g. the photosynthesis 

genes psaA and psaB, the light-harvesting complex and 

RuBisCo, ammonium transporters and transcriptionrelated 

genes; Fig. 5). Other genes on the reference 

genomes for which there is similarly deep transcript coverage 

(e.g. proteorhodopsin, Na+/solute symporters, 

colicin V production and several hypothetical proteins) 

can be hypothesized to also represent dominant metabolic 

activities (Fig. 5). 

Operon signature in environmental transcript pools 

Genes that encode steps in the same metabolic pathway 

are frequently clustered into operons in prokaryotic 




Occurences 

30 

25 

20 

15 

10 

5 

0 

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 

500 

450 

400 

150 

100 

50 

425 

400 

75 

50 

25 

80 

60 

40 

20 

Hypothetical protein 

Photosytem II PsbJ protein 

Ammonium transporter family 

Photosystem II PsbB (CP47) 

Ribulose bisphosphate carboxylase 

MIT9312 

Photosystem II D2 

Cytochrome b559, beta subunit 

MIT9301 

Protoporphyrin IX magnesium chelatase, 

subunit chlH 

0 

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 

AS9601 

0 

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 

HTCC 1002 

0 

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 

35 

30 

25 

20 

15 

10 

5 

A 

E 

B 

D 

C 

Photosystem II PsbA (D1) 

lipoprotein 

precursor 

Bacteriorhodopsin 

light-harvesting complex protein 

Na+/solute symporter 

AcrB/AcrD/AcrF family protein 

(Acriflavin resistance) 

Chromosome segregation 

SMC family protein 

30S ribosomal protein S1 

excinuclease 

ABC subunit C 

heat shock protein a 

HTCC 1062 

Integral membrane protein, 

interacts with FtsH 

Ribosomal protein L14 

Ribosomal protein L20 

Photosystem I PsaA 

Hypothetical protein 

Elongation factor Tu 

Photosystem I PsaB 

30S ribosomal protein S3 Photosystem II 

reaction center Z 

DNA-directed RNA polymerase 

beta prime chain 

octaprenyl-diphosphate synthase 

translation elongation factor EF-G 

adenylylsulfate reductase 

0 

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 

Fig. 5. Mapping of transcripts to five reference genomes. A–C are P. marinus strains; D–E are P. ubique strains. The x-axis shows gene 

number in the reference genome. Shaded areas represent possible hypervariable regions with few mapped transcripts. 



genomes (Overbeek et al., 1999) to facilitate coordinated 

transcription. Thus a cell’s transcript pool is anticipated to 

include more mRNAs from adjacent genes than what is 

expected from a random sampling of the genome. We 

tested this using the transcripts assigned to taxonomic 

bins for P. marinus, P. ubique and Roseobacter by counting 

the frequency with which transcripts from two adjacent 

genes on the reference strain genome (defined as � 1 

gene intervening) were both present in the bin, recognizing 

that the wild and reference organisms will not be fully 

syntenic. In all cases, the transcript bins had significantly 

more adjacent genes than a null distribution generated 

from the reference genomes (Fig. 4A), suggesting that 

random transcript sequencing captures operon-based 

expression patterns in natural marine bacterioplankton 

communities. 

Predicted highly expressed genes in environmental 

transcript pools 

Genes that are frequently transcribed by a cell can be 

identified based on patterns in codon usage (Karlin and 

Mrázek, 2000). We identified predicted highly expressed 

(PHX) genes for the reference genomes, and then 

assigned PHX status to the transcripts with best hits to 

that reference genome based on homology. For all taxa, 

and in accordance with biological expectations, the environmental 

transcript bins had a significantly higher percentage 

of PHX genes than the reference genomes 

(Fig. 4B). This pattern was particularly evident for the 

Roseobacters (9% of the genes in the reference genomes 

are PHX versus 30% of the transcripts; 3.1-fold enrichment) 

and for P. marinus MIT9301 (4.6% versus 12.9%; 

2.8-fold enrichment). A larger proportion of PHX transcripts 

were found in the day for all P. marinus bins and 

the Roseobacter bin (although not for P. ubique), suggesting 

that highly expressed genes more frequently mediate 

daytime-biased processes (data not shown). 

Metatranscriptomic comparison of day and night 

samples 

The majority of annotated transcripts (~80%) were 

assigned to genes related to metabolism, and in particular 

to three KEGG categories: amino acid transport and 

metabolism, energy production and conversion (particularly 

oxidative phosphorylation, carbon fixation and nitrogen 

metabolism), and carbohydrate transport (Fig. 6). 

Membrane transport and signal transduction pathways 

were also common in the community transcriptome, 

specifically for ABC transporters of amino acids, glycine 

betaine/L-proline, polyamines (spermidine and 

putrescine), iron and nutrients in the form of nitrate, phosphate 

and phosphonate. 


The day/night samples allowed comparison of dominant 

expression patterns in the presence and absence of solar 

radiation in the bacterioplankton community. Among the 

167 KEGG metabolic pathways represented in the annotated 

sequences, four pathways were better represented 

at night (including those for glycospingolipid biosynthesis 

and nucleotide sugars metabolism) and six were better 

represented in the day (including photosynthesis and oxidative 

phosphorylation) (95% confidence level; Table 2). 

Some KEGG pathways had significant diel differences in 

frequency for individual taxonomic bins. These include: 

histidine biosynthesis, with evidence for expression of all 

or nearly all genes in the pathway (both P. ubique and 

P. marinus at night; Fig. 7A and Fig. S1A); metabolism of 

glutathione, a reductant with multiple detoxifying and cytoprotective 

capabilities (P. marinus at night); the photosynthesis 

pathway (phycobilisome, photosystem I and II, 

cytochromes, ATP synthase) and nearly all genes 

involved in biosynthesis of phytoene, and subsequent 

conversion into carotenoids (P. marinus in the day; 

Fig. 7B); nucleotide sugars metabolism, glycosphingolipid 

biosynthesis, carotenoid biosynthesis and vitamin B6 

metabolism (P. ubique in the night; Fig. S1B); and transfer 

of methyl groups for C1 metabolism (P. ubique and 

Roseobacter in the day) (Table S3). 

Transcript annotation based on the COG database was 

comparable. Among the 1577 COGs represented, statistical 

comparisons identified 12 that were better represented 

at night and 13 that were better represented in the 

day (Table S4). These included amino acid and nucleotide 

metabolism, membrane biosynthesis and polyamine 

dehydrogenation at night, and light-mediated energy production, 

protein turnover, catalase synthesis and inorganic 

ion transport and metabolism in the day. 

Statistically significant differences in the distribution of 

transcripts between the day and night samples were also 

assessed independently of KEGG and COG assignments 

in order to capture signals from genes not currently classified 

by these annotation systems. Among the additional 

significant functions overrepresented in the night transcriptome 

were those for ABC-type spermidine/putrescine 

transport system permeases, RNA methyltransferases 

and signal transduction histidine kinases. For the day 

transcriptome, genes encoding proteorhodopsin and an 

aromatic-ring hydroxylase were significantly overrepresented 

(Table S5). 

Eukaryotic sequences 

The majority of eukaryotic transcripts were most closely 

affiliated with sequences from green-lineage organisms 

(Viridiplantae), such as the picoeukaryotic prasinophytes 

Ostreococcus spp. (Derelle et al., 2006) and Micromonas 

spp. A large number of transcripts also appeared to be 




Fig. 6. The 50 most abundant KEGG pathways in the night (black) and day (gray) transcriptomes. The pathways marked with stars were 

significantly overexpressed in one of the pools as determined by comparisons with P < 0.05 (Rodriguez-Brito et al., 2006). 

most closely related to genes in Chromalveoltae 

(Stramenopile or Alevolate) genomes. These groups are 

major components of the picoeukaryotic phytoplankton 

(McDonald et al., 2007) and are small enough to pass the 

5 mm prefilter used in this study. Gene transcripts that 

most closely matched reference genomes of photosynthetic 

eukaryotes were more abundant in the day compared 

with night sample. Among the most highly 



Table 2. KEGG pathways significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05). 

Pathway ID Pathway Category 

expressed genes detected from eukaryotic organisms 

were those encoding chlorophyll binding proteins, light 

harvesting reactions and photosynthetic machinery 

(Fig. 8). These included a photosystem II D1 reactioncentre 

protein related to that from the diatom Thalassiosira 

psuedonana, as well as the plastid-encoded 

photosystem I subunit protein similar to psaB from the 

diatom Odontella sinensis. Evidence for stramenopile 

nitrogen metabolism via urea cycle activity was also 

detected based on several transcripts that most closely 

matched stramenopile carbamoyl phosphate synthetase 

III, indicating that the unique diatom urea cycle (Armbrust 

et al., 2004; Allen et al., 2006) is likely active in natural 

populations of stramenopile picophytoplankton. 

qPCR quality control 

The half-life of microbial transcripts can be as short as 

30 s based on studies of mRNAs of cultured bacteria 

(Belasco, 1993), while processing times for environmental 

nucleic acid samples can take hours (Fuhrman et al., 

1988). Linear amplification of RNA greatly reduces the 

time between initiation of sampling and capture of transcripts 

because sample volumes can be reduced, but it 

has potential to introduce bias into the sequenced mRNA 

pool. A previous test with mRNA from the cultured marine 

bacterium S. pomeroyi DSS-3 demonstrated minor bias 

and good repeatability during linear amplification (Bürgmann 

et al., 2007). Here, we assessed the full environmental 

transcriptomic sequencing protocol by comparing 

qPCR-based ratios of selected genes in day versus night 

total RNA fractions to the pyrosequencing-based ratio of 

these same genes in the sequenced transcript pools. Five 

genes common in the transcriptome (P. marinus-like recA 

and psaA, P. ubique-like proteorhodopsin and Na+/solute 

symporter, and P. torquis-like membrane proteinase) 

showed a strong positive correlation between night and 

day ratios in the original RNA pool and the pyrosequence 

data sets (r = 0.94, Fig. S2), indicating that the sequenced 

metatranscriptome was representative of the unamplified 

mRNA pool. 

Discussion 


path00520 Nucleotide sugars metabolism Carbohydrate Metabolism 

path00521 Streptomycin biosynthesis Biosynthesis of Secondary Metabolites 

path00602 Glycosphingolipid biosynthesis – neo-lactoseries Glycan Biosynthesis and Metabolism 

path00603 Glycosphingolipid biosynthesis – globoseries Glycan Biosynthesis and Metabolism 

path00190 Oxidative phosphorylation Energy Metabolism 

path00195 Photosynthesis Energy Metabolism 

path03010 Ribosome Translation 

path03020 RNA polymerase Transcription 

path04940 Chaperonin N/A 

path05060 Chaperonin N/A 

The HOT program provides comprehensive, long-term 

oceanographic information for the oligotrophic North 

Pacific Ocean (Karl and Lukas, 1996). In situ dissolved 

organic constituents at 25 m depth at Station ALOHA are 

typically 70–110 mM for carbon, 5–6 mM for nitrogen and 

0.2–0.3 mM for phosphorus; ammonium concentrations in 

these waters (~50 nM) are below the detection limit of 

standard nutrient analysis (http://hahana.soest.hawaii. 

edu/hot/hot-dogs/). Surface water nutrient data over the 

past several decades for the month of November, the 

month in which the community transcriptomes in this 

study were obtained, and taken during various times of 

day show no discernable differences in organic and inorganic 

carbon, nitrogen, and/or phosphorus concentrations 

at Station ALOHA on a diel basis. 

Building on previous metagenomic and transcriptomic 

analyses of this system (DeLong et al., 2006; Frias-Lopez 

et al., 2008), this day/night environmental transcriptomics 

effort provides insight into the temporal patterns of bacterioplankton 

metabolic processes and ecological activities 

(Table 3). Three important caveats of the analysis are 

that: (i) the composition of the environmental transcriptomes 

may be inadvertently shaped by collection and 

filtration manipulations, (ii) mRNAs with intrinsically 

shorter half-lives are less likely to be stabilized and 

sequenced and (iii) only 32% of the 151 000 possible 

transcript sequences could be confidently assigned to a 

known function (Fig. 1). Despite these concerns, the community 

transcriptomes provided reasonable coverage of 

mRNAs from the dominant organisms, and the relative 

representation of transcripts was corroborated by RT 

qPCR-based expression analyses (Fig. S2). 

The community transcriptomes had properties consistent 

with expected attributes of the HOT ecosystem, 

including the apparent taxonomic affiliations of transcripts. 

Closely related P. marinus reference strains that 

are members of high light clade eMIT9312 comprised the 

most populated transcript bin. This clade has been shown 

to dominate in the upper euphotic zone (< 50 m) at low 






Fig. 7. Transcript mapping to the KEGG histidine metabolism pathway for P. ubique, overrepresented at night (A) and the biosynthesis of 

steroids and carotenoids pathway for P. marinus, overrepresented in the day (B). Colour (blue for night, yellow for day) indicates that 

transcripts were found; grey indicates that genes were present in the reference genome but no transcripts were found; white indicates that 

genes were not present in the reference genomes. 

and mid latitudes (below 30°) (Johnson et al., 2006), 

much like the HOT stations from which our samples were 

collected. SAR11-like sequences comprised the second 

largest taxonomic bin. This taxon is the most numerous 

heterotrophic marine bacterioplankton group, particularly 

in oligotrophic oceans where it makes up 30–40% of cells 

in the euphotic zone (Morris et al., 2002). 

Studies of taxonomic composition of ocean assemblages 

consistently show the numerical importance of aand 

g-Proteobacteria, Cyanobacteria, and Bacteriodetes 

(Morris et al., 2002; DeLong et al., 2006; Rusch 

et al., 2007), but little is known about how abundance 

specifically relates to activity levels. Based on comparisons 

of the relative abundance of taxa (flow cytometry 

counts and 16S rRNA amplicons) to their representation 

in the community transcriptome, by far the highest per-cell 

transcriptional activity level in the HOT ecosystem was 

seen for the Cyanobacteria. Assuming similar mRNA half- 

electron transport 

photosynthesis, light reaction 

phosphorus metabolic process 

oxidative phosphorylation 

ion transmembrane transporter activity 

energy derivation by oxidation of organic compounds 

heme binding 

cellular biosynthetic process 

protein metabolic process 

cellular macromolecule metabolic process 

organelle organization and biogenesis 

DNA metabolism 

organic acid metabolic process 

carbon utilization by fixation of carbon dioxide 

aldehyde metabolic process 

macromolecular complex assembly 

cellular component assembly 

ribonucleoprotein complex biogenesis and assembly 

macromolecule biosynthetic process 

intracellular transport 

aromatic compound metabolic process 

biopolymer metabolic process 

amino acid and derivative metabolic process 


lives across the prokaryotic taxa, dominant autotrophs 

produced more transcripts per gene than any 

co-occurring heterotrophic group not only in the day, but 

also at night (Fig. 3). This may reflect an advantage of 

autotrophy over heterotrophy for maintaining cellular 

activity levels given the low concentration and refractory 

nature of organic carbon fuelling heterotrophic activity in 

the oligotrophic ocean (Bauer et al., 1992). 

As expected, many transcripts involved in lightmediated 

processes, such as photosynthesis and proteorhodopsin 

activity, were among those overrepresented in 

the community transcriptome in the day. Transcripts 

involved in protection or repair of light-induced DNA and 

protein damage (e.g. catalase, chaperones, photolyases, 

superoxide dismutase and various DNA repair proteins) 

were also common in the day sample. Evidence 

of daytime C1 utilization by some heterotrophs suggests 

a source of C1 compounds or methyl groups in this 

0 20 40 60 80 100 120 140 160 180 

Fig. 8. Number of eukaryotic transcripts in day (top bars) compared with night (bottom bars) samples. The relative contribution of 

Viridiplanteae (green), photosynthetic Chromist algae (yellow), and other Chromist (red) transcripts to each Gene Ontology (GO) annotation 

category are depicted. 




Table 3. Selected biogeochemically relevant genes in the HOT metatranscriptome. 

ecosystem. Compounds such as methanol and formaldehyde 

(Heikes et al., 2002; Carpenter et al., 2004; Giovannoni 

et al., 2008), methane (Ward et al., 1987), and 

methylhalides (Woodall et al., 2001; Schaefer et al., 2002) 

may be available to heterotrophic bacterioplankton in 

surface sea water. Dimethylsulphoniopropionate, an 

organic sulphur compound produced in abundance by 

marine phytoplankton (Kiene et al., 2000), is a rich source 

of methyl groups for surface ocean bacterioplankton, and 

tetrahydrofolate-mediated C1 transfer (i.e. transcripts 

mapping to the C1 pool by folate and methane metabolism 

KEGG pathway; Table S5) has been shown to play a role 

in its metabolism (Howard et al., 2006). Recovery of nearly 

Night Day 

Nitrogen Nitrogenase (N fixation) nifH, nifU, nifS, nifB + + 

Ammonium transport amt + +* 

Ammonia monooxygenase amoA 

Assimilatory nitrate reductase narB + 

Hydroxylamine oxidoreductase hao 

Nitrate permease napA + 

Nitrite reductase nirA + 

Dissimilatory nitrite reductase nirK, nirS 

Nitric oxide reductase norQ + 

Nitrate transporter narK + 

Urease ureC, ureE, ureF + + 

Methylotrophy Serine-glyoxylate aminotransferase + + 

Formate dehydrogenase fdh, fdsD + + 

Methylene tetrahydrofolate reductase metF + + 

Methane monooxygenase mmo 

Methanol dehydrogenase mxa + 

Methenyltetrahydromethanopterin cyclohydrolase mch + + 

Crotonyl-CoA reductase + + 

Formaldehyde-activating enzyme fae + 

Polyamine degradation Deoxyhypusine synthase dys2 +* + 

Spermidine/putrescine transport system permease potC +* + 

Acetylpolyamine aminohydrolase aphA 

Sulphur cycle Sulphur oxidation soxB, soxC, soxA, soxZ, soxF + + 

Dimethylsulphoniopropionate demethylase dmdA 

Glycine betaine Dimethylglycine dehydrogenase dmgdh + + 

Glycine cleavage system (amnomethyltransferase) gcvT +* + 

Aromatic compounds Aromatic ring hydroxylase chlP + +* 

protocatechuate 3,4-dioxygenase pcaH 

Benzoyl-CoA oxygenase boxA + 

Carbon monoxide Carbon monoxide dehydrogenase cosS, coxM, coxL + + 

Phototrophy and C fixation Photosystem I multiple + +* 

Photosystem II multiple + +* 

Rubisco rbcL, rbcS + +* 

Photosynthetic reaction centre, M subunit pufM + 

Proteorhodopsin + +* 

Phosphate assimilation Phosphonate uptake phnD, phnC + + 

Alkaline phosphatase phoA + + 

Phosphate uptake pstA, pstS + + 

Amino acid metabolism Glutamate synthase gltB + + 

Glutathione reductase gor +* + 

Histidine kinase baeS +* + 

Threonine synthase thrC +* + 

Trace metal uptake Selenium +* + 

Iron tonB + + 

Arsenite + 

Arsenate reductase arsC + + 

A‘+’ indicates occurrence in the night or day sample. An asterisk indicates significantly higher transcript frequency in one. 

four times as much mRNA per volume of sea water in the 

day (~30 ng l -1 ) compared with night (~8 ngl -1 ) is consistent 

with high relative abundance of RNA polymerase 

transcripts in the day (Table 2) and likely reflects increased 

gene expression when solar radiation is available. 

Night-biased synthesis of vitamin B6, essential for a 

variety of amino acid conversions including transaminations, 

decarboxylations and dehydrations, in conjunction 

with evidence for other night-time activities such as the 

g-glutamyl pathway for amino acid uptake, the overrepresentation 

of amino acid transport and metabolism genes, 

and the histidine synthesis pathway (Table 3 and 

Tables S4–S6), indicate that amino acid acquisition in 



general may be a relatively more important metabolic 

activity in the night. Prochlorococcus marinus has recently 

been shown to exhibit diel patterns of amino acid uptake, 

with acquisition occurring predominantly at dusk (Mary 

et al., 2008). Our data agree with this and further suggest 

that heterotrophic taxa also devote a greater percentage 

of their transcriptome to transporting and synthesizing 

amino acids at night. Night-time accumulation of amino 

acids might be a mechanism for nitrogen storage by many 

organisms, particularly for P. marinus, which undergoes 

cell division at night. Histidine, the amino acid with the 

most consistent signal for synthesis at night by both 

autotrophs and heterotrophs (Fig. 7A and Fig. S1), is one 

of the most nitrogen-rich amino acids (only arginine has 

more amino groups). 

Overall, bacterial community investment in this oligotrophic 

ocean system was skewed towards energy 

acquisition and metabolism during the day, while biosynthesis 

(specifically of membranes, amino acids and vitamins) 

received relatively greater investments at night. 

Many microbial processes expected to be differentially 

expressed over a day/night cycle, such as photosynthesis, 

oxidative phosphorylation and proteorhodopsin activity, 

were indeed captured in the sequence data. Less 

anticipated processes that emerged included the utilization 

of C1 compounds, the uptake of polyamines and the 

degradation of aromatic compounds (Table 3). Other 

metabolic processes ongoing in this microbial community, 

although without statistical evidence for day/night patterns, 

included: use of nitrate and urea as nitrogen 

sources; use of phosphate, phosphonate and carbonoxygen-phosphorus 

(C-O-P) compounds as phosphorus 

sources; oxidation of reduced sulphur compounds; oxidation 

of carbon monoxide; and uptake of multiple trace 

metals (Table 3). This comparative analysis of microbial 

community transcripts has provided an inventory of 

ongoing metabolic processes, offered insights into their 

temporal patterns and supplied a new type of data for 

predictive modelling of environmental controls on ecosystem 

properties. 

Experimental procedures 

Sample collection 

Samples were collected at the Hawaiian Ocean Time-series 

(HOT) Station ALOHA, defined by the 6-nautical-mile radius 

circle centred at 22°45′N, 158°W in November, 2005 (HOT- 

175). For RNA extraction, sea water was collected from a 

depth of 25 m using Niskin bottles on a conductivitytemperature-depth 

rosette sampler. A night sample was collected 

at 03:00 on 11 November 2005, and a daytime 

sample was collected at 13:00 on 13 November 2005. 

During HOT-175, the peak PAR level was at 12:00, with 

sunrise occurring around 07:00 and sunset just before 

18:00. Sea water (80 l for the night sample and 40 l for the 


day sample) was prefiltered through a 5 mm, 142 mm polycarbonate 

filter (GE Osmonics, Minnetonka, MN) followed 

by a 0.2 mm, 142 mm Durapore (Millipore) filter using 

positive air pressure. The 0.2 mm filters were placed in a 

15 ml tube containing 2 ml Buffer RLT (containing 

b-mercaptoethanol) from the RNeasy kit (Qiagen, Valencia, 

CA) and flash-frozen in liquid nitrogen for RNA extraction. 

For DNA extraction, an additional 20 l of sea water were 

simultaneously filtered using the protocol outlined above at 

both time points. The 0.2 mm filters were placed in Whirlpack 

bags and flash-frozen. The total sampling time from initiation 

of collection until freezing in liquid nitrogen was approximately 

1.5 h. We obtained ~1 mg of total RNA from 40 to 80 l 

of sea water. Following mRNA enrichment and amplification, 

30–100 mg of mRNA was available for conversion to cDNA 

for sequencing. Typically, only 3–5 mg of DNA was required 

for pyrosequencing. 

RNA and DNA preparation 

DNA was extracted using a phenol : chloroform-based protocol 

(Fuhrman et al., 1988). Briefly, frozen filters inside Whirlpak 

bags were transferred to 50 ml Falcon centrifuge tubes. 

Ten millilitre extraction buffer [SDS (10% Sodium Doecyl 

Sulphate) : STE (100 mM NaCl, 10 mM Tris, 1 mM EDTA), 

9:1] was added to the tubes and boiled in a water bath for 

5 min. The extraction buffer was then removed from the 

tubes, placed into Oak Ridge round-bottom centrifuge tubes, 

to which 3 ml NaOAc and 28 ml 100% EtOH were added. 

Organic macromolecules were precipitated overnight at 

-20°C, before the tubes were centrifuged for 1 h at 15 000 g. 

The supernatant was decanted, and pellets dried for 30 min 

in the air. The pellets were resuspended in 600 ml deionized 

water, and sequentially extracted with 500 ml phenol, 500 ml 

phenol : chloroform : isoamyl alcohol (24:1:0.1), and 500 ml 

chloroform:isoamyl alcohol (9:1); after each extraction the 

organic phase was removed and discarded. The supernatant 

was removed into a fresh tube at the end of last extraction, 

amended with 150 ml NaOAc and 1.2 ml 100% EtOH, and 

precipitated overnight. The tube contents were then centrifuged 

at 15 000 g for 1 h, the supernatant decanted, and 

pellets dried in a speed vacuum dryer for 10 min. The DNA 

pellets were resuspended in 100 ml DNAse and RNAse-free 

deionized water (Ambion). 

RNA was extracted using a modified version of the RNeasy 

kit (Qiagen) that results in high RNA yields from material on 

polycarbonate filters (Poretsky et al., 2008). Frozen samples 

were first thawed slightly for 2 min in a 40–50°C water bath 

and then vortexed for 10 min with RNase-free beads from the 

Mo-Bio RNA PowerSoil kit (Carlsbad, CA). Following centrifugation 

for 5 min at 3000–5000 g, the supernatant was transferred 

to a new tube. Beginning with the RNeasy Midi kit, 

1 vol. of 70% ethanol was added to the lysate and, in order to 

shear large-molecular-weight nucleic acids, the lysate was 

drawn through a 22-gauge needle several (~5) times. RNA 

extraction then continued with the RNeasy Mini kit according 

to the manufacturer’s instructions. 

Following extraction, RNA was treated with DNase using 

the TURBO DNA-free kit (Ambion, Austin, TX). Two methods 

were employed to rid the RNA samples of rRNA. The RNA 

was first treated enzymatically with the mRNA-ONLY 




Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies, 

Madison, WI) that uses a 5′-phosphate-dependent exonuclease 

to degrade rRNAs. The MICROBExpress kit (Ambion) 

subtractive hybridization with capture oligonucleotides 

hybridized to magnetic beads was subsequently used as an 

additional mRNA enrichment step. 

In order to obtain mg quantities of mRNA, approximately 

500 ng of RNA was linearly amplified using the MessageAmp 

II-Bacteria Kit (Ambion) according to the manufacturer’s 

instructions. Finally, the amplified, antisense RNA (aRNA) 

was converted to double-stranded cDNA with random hexamers 

using the Universal RiboClone cDNA Synthesis 

System (Promega, Madison, WI). The cDNA was purified with 

the Wizard DNA Clean-up System (Promega). The quality 

and quantity of the total RNA, mRNA, aRNA and cDNA were 

assessed by measurement on the NanoDrop-1000 Spectrophotometer 

(NanoDrop Technologies, Wilmington, DE) and 

the Experion Automated Electrophoresis System (Bio-Rad, 

Hercules, CA). 

cDNA sequencing and quality control 

cDNAs from each sample (night and day) were sequenced 

using the GS 20 sequencing system by 454 Life Sciences 

(Branford, CT) (Margulies et al., 2005), resulting in 

10 682 120 bp from 106 907 reads for the night sample and 

13 255 704 bp from 133 515 reads for the day sample. The 

average sequence length was 99 bp. The sequences have 

been deposited in the NCBI Short Read Archive with the 

Genome Project ID #33463. 

rRNA identification and removal 

For rRNA sequence identification, the sequences were clustered 

at an identity threshold of 98% based on a local alignment 

(number of identical residues divided by length of 

alignment) using the program Cd-hit (Li and Godzik, 2006). 

Ribosomal RNA sequences were identified by BLASTN queries 

of the reference sequence of each cluster against the noncurated, 

GenBank nucleotide database (nt) (Benson et al., 

2007) using cut-off criteria of E-value � 10 -3 , nucleic acid 

length � 69 and per cent identity � 40% previously established 

with in silico tests for rRNA sequence predictions of 

short pyrosequences (Frias-Lopez et al., 2008; Mou et al., 

2008). We conservatively identified a sequence as rRNAderived 

and removed it from the analysis pipeline if any of the 

top three BLASTN hits were to an rRNA gene. 

cDNA sequence annotation 

The criteria for protein predictions generated using BLASTX 

against the NCBI curated, non-redundant reference 

sequence database (RefSeq) (Pruitt et al., 2005) were established 

with in silico tests to determine suitable cut-off limits for 

reliable functional prediction. For these tests, 100 arbitrarily 

selected, known functional gene sequences were fragmented 

into 20–500 bp fragments and analysed using BLASTX 

against RefSeq to determine if the best BLAST hit was to the 

correct gene function, excluding self-hits. Based on these 

analyses, the cut-off criteria for protein prediction were 

set as E-value < 0.01, identity > 40% and overlapping 

length > 23 aa to the corresponding best hit. 

Sequences with hits to RefSeq were assigned functional 

protein or pathway predictions based on the COG database 

(Tatusov et al., 2000) or KEGG database (Kanehisa and 

Goto, 2000). The cut-off criteria for functional protein prediction 

based on orthologous groups using BLASTX analysis 

against the COG database were established using the same 

in silico approach with 100 bp fragments of known functional 

genes as E-value < 0.1, identity > 40% and overlapping 

length > 23 aa to the corresponding best hit. The COG cut-off 

criteria were also applied to the KEGG database for pathway 

prediction because of the similarity in database size. Taxonomic 

binning of the sequences was carried out using MEGAN 

with the default settings for all parameters (Huson et al., 

2007); this program assigns likely taxonomic origin to 

sequences based on the NCBI taxonomy of closest BLAST 

hits. The taxonomic affiliations of the putative mRNA 

sequences were predicted using MEGAN to the family level, 

and the top BLAST hit for any higher-resolution taxonomic 

assignments. All non-rRNA sequences that had no RefSeq 

hits were BLASTX-queried against the nr database as well as 

against CAMERA un-assembled ORFs predicted from the 

Global Ocean Survey reads (http://camera.calit2.net/ 

index.php) (Seshadri et al., 2007). 

Eukaryotic sequence annotation 

Eukaryotic transcripts were binned by MEGAN. Sequences 

were queried (BLASTX) against a curated database of protein 

sequences derived from all available complete eukaryotic 

organelle and nuclear genomes (currently, 46 eukaryotic 

genomes). Transcripts that matched a reference protein 

sequence with > 60% identity and an E-value < e -10 were 

retained and the reference protein for the cluster was used for 

functional annotation. Functional annotation was performed 

using Java-based Blast2go (Conesa et al., 2005) that annotates 

genes based on similarity searches with statistical 

analysis and highlighted visualization on directed acyclic 

graphs. 

16S rRNA gene libraries 

PCR amplification of ribosomal DNA was carried out using 

primers 27F and 1522R (Johnson, 1994). The PCR conditions 

were as follows: 3 min at 96°C, followed by 30 cycles of 

denaturation at 95°C for 50 s, annealing at 58°C for 50 s, 

primer extension at 72°C for 1 min and a final extension at 

72°C for 10 min. PCR products were cleaned using the 

QIAquick PCR Purification Kit (Qiagen) and multiple PCR 

reactions were pooled and cloned into pCR2.1 vector using 

the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR 

amplifications included standard no-template controls. 

Clones from each sample (192) were sequenced at the University 

of Georgia Sequencing Facility on an ABI 3100 

(Applied Biosystems, Foster City, CA). 

Predicted highly expressed genes 

The PHX genes were determined for cultured representatives 

of three prokaryotic taxa that were well represented in the 

transcript libraries (Prochlorococcus, Roseobacter and 



SAR11) using an algorithm developed by Karlin and Mrázek 

(2000). The algorithm is based on comparisons with codon 

usage patterns in genes expected to be frequently transcribed 

in a prokaryotic genome (ribosomal proteins, chaperone 

proteins, etc.). Environmental transcript sequences 

that had best BLAST hits to one of the PHX genes were 

similarly designated as PHX. 

Statistical analysis 

A statistical program designed for comparing gene frequency 

in metagenomic data sets (Rodriguez-Brito et al., 2006) was 

used to compare the night and day mRNA sequences categorized 

based on COGs, KEGGs and proteins. The program 

was run with 20 000 repeated samplings with a sample size 

of 10 000 for COGs, 9000 for KEGGs and 25 000 for proteins. 

The significance level (P) was set at < 0.05. 

qPCR verifications 

To confirm that the composition of the pyrosequence library 

was representative of the initial mRNAs, transcripts of five 

genes that were top hits to multiple sequences in both transcript 

pools were quantified in the total RNA pool. The qPCR 

primer sets were designed for the P. marinus str. AS9601 

recA and psaA, a proteorhodopsin gene and a Na+/solute 

symporter (Ssf family) gene from P. ubique HTCC1062, and a 

probable integral membrane proteinase attributed to Psychroflexus 

torquis ATCC 700755 (sequences and annealing 

temps in Table S6). Reverse transcription reactions were 

carried out on 200 ng of RNA using the Omniscript RT kit 

(Qiagen) in 20 ml volumes containing 1¥ RT buffer, 0.3 mg ml -1 

of random hexamers (Invitrogen), 1 ml of 5 mM dNTPs, 2 U of 

reverse transcriptase and 20 U of RNase inhibitor (Promega) 

at 37°C for 1 h, followed by inactivation of the reverse transcriptase 

at 95°C for 2 min. The day : night ratio of each gene 

transcript in the RNA pools was determined by qPCR amplification 

of a serial dilution of cDNAs in triplicate, and calculation 

of the difference in cycle threshold values (DCT) 

between the two samples. Quantitative amplification was 

done using the iCycler iQ RT PCR detection system (Bio- 

Rad) in a 20 ml reaction volume containing 10 ml of iQ SYBR 

Green Supermix (Bio-Rad), 0.4 ml each of 10 mM of the 

forward and reverse primers and 1 ml of the cDNA template. 

PCR conditions included a preliminary denaturation at 95°C 

for 3 min followed by 45 cycles of 95°C for 15 s, annealing for 

1.5 s, 95°C for 1 min and 55°C for 1 min. A melt curve was 

generated following the PCR, beginning with 55°C and 

increasing 0.4°C every 10 s until 95°C. A PCR control without 

an initial RT step was included with every set of reactions. 

Acknowledgements 

We thank the Captain and crew of the R/V Kilo Moana and Dr 

David Karl. Jennifer Oliver assisted with sample processing. 

Jonathan Badger assisted with data processing. Funding was 

provided by The Gordon and Betty Moore Foundation, 

National Science Foundation grants MCB-0702125 (M.A.M.), 

EF-0722374 (A.E.A) and OCE-0425363 (J.P.Z.), and the NSF 

C-MORE Center for Microbial Oceanography. 

References 


Allen, A.E., Vardi, A., and Bowler, C. (2006) An ecological 

and evolutionary context for integrated nitrogen metabolism 

and related signaling pathways in marine diatoms. 

Curr Opin Plant Biol 9: 264–273. 

Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, 

D., Putnam, N.H., et al. (2004) The genome of the 

diatom Thalassiosira pseudonana: ecology, evolution, and 

metabolism. Science 306: 79–86. 

Bauer, J.E., Williams, P.M., and Druffel, E.R.M. (1992) 14C 

activity of dissolved organic carbon fractions in the northcentral 

Pacific and Sargasso Sea. Nature 357: 667–670. 

Belasco, J.G. (1993) mRNA degradation in prokaryotic cells: 

an overview. In Control of Messenger RNA Stability. 

Belasco, J.G., Brawerman, G. (eds). San Diego, CA, USA: 

Academic Press, pp. 3–11. 

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., 

and Wheeler, D.L. (2007) GenBank. Nucleic Acids Res 35: 

D21–D25. 

Bürgmann, H., Widmer, F., Sigler, W.V., and Zeyer, J. (2003) 

mRNA extraction and reverse transcription-PCR protocol 

for detection of nifH gene expression by Azotobacter vinelandii 

in soil. Appl Environ Microbiol 69: 1928–1935. 

Bürgmann, H., Howard, E.C., Ye, W., Sun, F., Sun, S., Napierala, 

S., and Moran, M.A. (2007) Transcriptional response 

of Silicibacter pomeroyi DSS-3 to dimethylsulfoniopropionate 

(DMSP). Environ Microbiol 9: 2742–2755. 

Campbell, L., and Vaulot, D. (1993) Photosynthetic picoplankton 

community structure in the subtropical North 

Pacific Ocean near Hawaii (Station ALOHA). Deep Sea 

Res. Part I Oceanogr Res Pap 40: 2043–2060. 

Carpenter, L.J., Lewis, A.C., Hopkins, J.R., Read, K.A., 

Longley, I.D., and Gallagher, M.W. (2004) Uptake of 

methanol to the North Atlantic Ocean surface. Global Biogeochem 

Cycles 18: GB4027. 

Cavender-Bares, K.K., Karl, D.M., and Chisholm, S.W. 

(2001) Nutrient gradients in the western North Atlantic 

Ocean: relationship to microbial community structure and 

comparison to patterns in the Pacific Ocean. Deep Sea 

Res. Part I Oceanogr Res Pap 48: 2373–2395. 

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, 

M., and Robles, M. (2005) Blast2GO: a universal tool for 

annotation, visualization and analysis in functional genomics 

research. Bioinformatics 21: 3674–3676. 

DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam, 

S.J., Frigaard, N.-U., et al. (2006) Community genomics 

among stratified microbial assemblages in the ocean’s 

interior. Science 311: 496–503. 

Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden, 

A.Z., Robbens, S., et al. (2006) Genome analysis of the 

smallest free-living eukaryote Ostreococcus tauri unveils 

many unique features. Proc Natl Acad Sci USA 103: 

11647–11652. 

Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L., 

Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008) 

Microbial community gene expression in ocean surface 

waters. Proc Natl Acad Sci USA 105: 3805–3810. 

Fuhrman, J.A., Comeau, D.E., Hagstrom, A., and Chan, A.M. 

(1988) Extraction from natural planktonic microorganisms 

of DNA suitable for molecular biological studies. Appl 

Environ Microbiol 54: 1426–1429. 




Gelder, R.N.V., von Zastrow, M.E., Yool, A., Dement, W.C., 

Barchas, J.D., and Eberwine, J.H. (1990) Amplified RNA 

synthesized from limited quantities of heterogeneous 

cDNA. Proc Natl Acad Sci USA 87: 1663–1667. 

Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna, 

P., and Joint, I. (2008) Detection of large numbers of novel 

sequences in the metatranscriptomes of complex marine 

microbial communities. PLoS ONE 3: e3042. 

Giovannoni, S.J., Hayakawa, D.H., Tripp, H.J., Stingl, U., 

Givan, S.A., Cho, J.-C., et al. (2008) The small genome of 

an abundant coastal ocean methylotroph. Environ Microbiol 

10: 1771–1782. 

Heikes, B.G., Chang, W.N., Pilson, M.E.Q., Swift, E., Singh, 

H.B., Guenther, A., et al. (2002) Atmospheric methanol 

budget and ocean implication. Global Biogeochem Cycles 

16: 80.81–80.80.13. 

Howard, E.C., Henriksen, J.R., Buchan, A., Reisch, C.R., 

Burgmann, H., Welsh, R., et al. (2006) Bacterial taxa that 

limit sulfur flux from the ocean. Science 314: 649–652. 

Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007) 

MEGAN analysis of metagenomic data. Genome Res 17: 

377–386. 

Ingraham, J.L., Maaløe, O., and Neidhardt, F.C. (1983) 

Growth of the Bacterial Cell. Sunderland, MA, USA: 

Sinauer Associates. 

Johnson, J.L. (1994) Similarity analysis of rRNAs. In Methods 

for General and Molecular Bacteriology. Gerhardt, P., 

Murray, R.G.E., Wood, W.A., and Krieg, N.R. (eds). Washington, 

DC: American Society for Microbiology, pp. 683– 

700. 

Johnson, Z.I., Zinser, E.R., Coe, A., McNulty, N.P., Woodward, 

E.M.S., and Chisholm, S.W. (2006) Niche partitioning 

among Prochlorococcus ecotypes along ocean-scale 

environmental gradients. Science 311: 1737–1740. 

Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia 

of genes and genomes. Nucleic Acids Res 28: 27–30. 

Karl, D., Letelier, R., Tupas, L., Dore, J., Christian, J., and 

Hebel, D. (1997) The role of nitrogen fixation in biogeochemical 

cycling in the subtropical North Pacific 

Ocean. Nature 388: 533–538. 

Karl, D.M., and Lukas, R. (1996) The Hawaii Ocean Timeseries 

(HOT) program: background, rationale and field 

implementation. Deep Sea Res. Part II Top Stud Oceanogr 

43: 129–156. 

Karlin, S., and Mrázek, J. (2000) Predicted highly expressed 

genes of diverse prokaryotic genomes. J Bacteriol 182: 

5238–5250. 

Kiene, R.P., Linn, L.J., and Bruton, J.A. (2000) New and 

important roles for DMSP in marine microbial communities. 

J Sea Res 43: 209–224. 

Lander, E.S., and Waterman, M.S. (1988) Genomic mapping 

by fingerprinting random clones: a mathematical analysis. 

Genomics 2: 231–239. 

Li, W., and Godzik, A. (2006) Cd-hit: a fast program for 

clustering and comparing large sets of protein or nucleotide 

sequences. Bioinformatics 22: 1658–1659. 

Liang, P., and Pardee, A.B. (1992) Differential display of 

eukaryotic messenger RNA by means of the polymerase 

chain reaction. Science 257: 967–971. 

McDonald, S.M., Sarno, D., Scanlan, D.J., and Zingone, A. 

(2007) Genetic diversity of eukaryotic ultraphytoplankton in 

the Gulf of Naples during an annual cycle. Aquat Microb 

Ecol 50: 75–89. 

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, 

J.S., Bemben, L.A., et al. (2005) Genome sequencing in 

microfabricated high-density picolitre reactors. Nature 437: 

376–380. 

Mary, I., Garczarek, L., Tarran, G.A., Kolowrat, C., Terry, 

M.J., Scanlan, D.J., et al. (2008) Diel rhythmicity in amino 

acid uptake by Prochlorococcus. Environ Microbiol 10: 

2124–2131. 

Morris, R.M., Rappe, M.S., Connon, S.A., Vergin, K.L., 

Siebold, W.A., Carlson, C.A., and Giovannoni, S.J. (2002) 

SAR11 clade dominates ocean surface bacterioplankton 

communities. Nature 420: 806–810. 

Mou, X., Sun, S., Edwards, R.A., Hodson, R.E., and Moran, 

M.A. (2008) Bacterial carbon processing by generalist 

species in the coastal ocean. Nature 451: 708–711. 

Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., and 

Maltsev, N. (1999) The use of gene clusters to infer functional 

coupling. Proc Natl Acad Sci USA 96: 2896–2901. 

Poretsky, R.S., Bano, N., Buchan, A., LeCleir, G., 

Kleikemper, J., Pickering, M., et al. (2005) Analysis of 

microbial gene transcripts in environmental samples. Appl 

Environ Microbiol 71: 4121–4126. 

Poretsky, R.S., Bano, N., Buchan, A., Moran M.A., and 

Hollibaugh, J.T. (2008) Environmental transcriptomics: a 

method to access expressed genes in complex microbial 

communities. In Molecular Microbial Ecology Manual. 

Kowalchuk, G.A., de Bruijn, F.J., Head, I.M., Akkermans, 

A.D.L., and van Elsas, J.D. (eds). Dordrecht, Netherlands: 

Springer, pp. 1892–1904. 

Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI 

Reference Sequence (RefSeq): a curated non-redundant 

sequence database of genomes, transcripts and proteins. 

Nucleic Acids Res 33: D501–D504. 

Rodriguez-Brito, B., Rohwer, F., and Edwards, R. (2006) An 

application of statistics to comparative metagenomics. 

BMC Bioinformatics 7: 162. 

Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., 

Williamson, S., Yooseph, S., et al. (2007) The Sorcerer II 

Global Ocean Sampling Expedition: Northwest Atlantic 

through Eastern Tropical Pacific. PLoS Biol 5: e77. 

Schaefer, J.K., Goodwin, K.D., McDonald, I.R., Murrell, J.C., 

and Oremland, R.S. (2002) Leisingera methylohatidivorans 

gen. nov., sp nov., a marine methylotroph that grows on 

methyl bromide. Int J Syst Evol Microbiol 52: 851–859. 

Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier, 

M. (2007) CAMERA: a community resource for metagenomics. 

PLoS Biol 5: 394–397. 

Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V. 

(2000) The COG database: a tool for genome-scale analysis 

of protein functions and evolution. Nucleic Acids Res 

28: 33–36. 

Ward, B.B., Kilpatrick, K.A., Novelli, P.C., and Scranton, M.I. 

(1987) Methane oxidation and methane fluxes in the ocean 

surface-layer and deep anoxic waters. Nature 327: 226– 

229. 

Wawrik, B., Paul, J.H., and Tabita, F.R. (2002) Real-time 

PCR quantification of rbcL (ribulose-1,5-bisphosphate 

carboxylase/oxygenase) mRNA in diatoms and pelagophytes. 

Appl Environ Microbiol 68: 3771–3779. 



Woodall, C.A., Warner, K.L., Oremland, R.S., Murrell, J.C., 

and McDonald, I.R. (2001) Identification of methyl halideutilizing 

genes in the methyl bromide-utilizing bacterial 

strain IMB-1 suggests a high degree of conservation of 

methyl halide-specific genes in gram-negative bacteria. 

Appl Environ Microbiol 67: 1959–1963. 

Zehr, J.P., Waterbury, J.B., Turner, P.J., Montoya, J.P., 

Omoregie, E., Steward, G.F., et al. (2001) Unicellular 

cyanobacteria fix N 2 in the subtropical North Pacific Ocean. 

Nature 412: 635–638. 

Zhou, J.H. (2003) Microarrays for bacterial detection and 

microbial community analysis. Curr Opin Microbiol 6: 288– 

294. 

Supporting information 

Additional Supporting Information may be found in the online 

version of this article: 

Fig. S1. Transcript mapping to the KEGG histidine metabolism 

pathway for P. marinus (A) and the vitamin B6 metabolism 

pathway for P. ubique (B) at night. Blue shading indicates 

that transcripts were found; grey indicates genes that are 

present in the genome, but no transcripts were found; white 

indicates genes that are not present in the reference 

genomes. 

Fig. S2. Quality control of the pyrosequences using qPCR 

verifications of transcript ratios for five genes: recA and psaA 

from P. marinus str. AS9601, a bacteriorhodopsin and a 

Na+/solute symporter (Ssf family) gene from P. ubique 

HTCC1062, and a probable integral membrane proteinase 

attributed to P. torquis ATCC 700755. The night : day ratio of 

transcripts in the pyrosequence libraries is plotted against the 

same ratio in the original total RNA fraction. 


Table S1. Results of bioinformatic pipeline for 100 and 

200 bp fragments from groups for which there are no genome 

sequences currently available. BACs from uncultured marine 

taxa (two from SAR86 and one from SAR116) were fragmented 

into random 100 bp pieces, using just the coding 

regions. Fragments were blasted against RefSeq, not allowing 

a self-hit. As controls, we did the same for P. ubique 

HTCC1062 and P. marinus MIT9312. 

Table S2. Estimates of coverage using two different models. 

The Lander–Waterman model uses the 16S rRNA clone 

library data to establish a taxon-abundance model for the 

system at a similarity level of 99%, and is based on the 

assumptions that each taxon produces 1000 transcripts at 

any given time and all expressed genes are expressed 

equally. The Chao1 richness estimators for COGs are computed 

using EstimateS (version 8.0, R. K. Colwell, http:// 

purl.oclc.org/estimates). 

Table S3. KEGG pathways for three taxonomic bins 

(P. marinus, P. ubique and Roseobacters) significantly overrepresented 

in the night (grey shading) and day (no shading) 

transcriptomes (P < 0.10). 

Table S4. COGs significantly overrepresented in the night 

(grey shading) and day (no shading) transcriptomes 

(P < 0.05). 

Table S5. Genes significantly overrepresented in the night 

(grey shading) and day (no shading) transcriptomes 

(P < 0.05). 

Table S6. Primer sets used in qPCR. 

Please note: Wiley-Blackwell are not responsible for the 

content or functionality of any supporting materials supplied 

by the authors. Any queries (other than missing material) 

should be directed to the corresponding author for the 

article.

Comparative day/night metatranscriptomic analysis of microbial ...

Create successful ePaper yourself

Delete template?

Save as template?