Comparative day/night metatranscriptomic analysis of microbial ...
Comparative day/night metatranscriptomic analysis of microbial ...
Comparative day/night metatranscriptomic analysis of microbial ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1372 R. S. Poretsky et al.<br />
Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies,<br />
Madison, WI) that uses a 5′-phosphate-dependent exonuclease<br />
to degrade rRNAs. The MICROBExpress kit (Ambion)<br />
subtractive hybridization with capture oligonucleotides<br />
hybridized to magnetic beads was subsequently used as an<br />
additional mRNA enrichment step.<br />
In order to obtain mg quantities <strong>of</strong> mRNA, approximately<br />
500 ng <strong>of</strong> RNA was linearly amplified using the MessageAmp<br />
II-Bacteria Kit (Ambion) according to the manufacturer’s<br />
instructions. Finally, the amplified, antisense RNA (aRNA)<br />
was converted to double-stranded cDNA with random hexamers<br />
using the Universal RiboClone cDNA Synthesis<br />
System (Promega, Madison, WI). The cDNA was purified with<br />
the Wizard DNA Clean-up System (Promega). The quality<br />
and quantity <strong>of</strong> the total RNA, mRNA, aRNA and cDNA were<br />
assessed by measurement on the NanoDrop-1000 Spectrophotometer<br />
(NanoDrop Technologies, Wilmington, DE) and<br />
the Experion Automated Electrophoresis System (Bio-Rad,<br />
Hercules, CA).<br />
cDNA sequencing and quality control<br />
cDNAs from each sample (<strong>night</strong> and <strong>day</strong>) were sequenced<br />
using the GS 20 sequencing system by 454 Life Sciences<br />
(Branford, CT) (Margulies et al., 2005), resulting in<br />
10 682 120 bp from 106 907 reads for the <strong>night</strong> sample and<br />
13 255 704 bp from 133 515 reads for the <strong>day</strong> sample. The<br />
average sequence length was 99 bp. The sequences have<br />
been deposited in the NCBI Short Read Archive with the<br />
Genome Project ID #33463.<br />
rRNA identification and removal<br />
For rRNA sequence identification, the sequences were clustered<br />
at an identity threshold <strong>of</strong> 98% based on a local alignment<br />
(number <strong>of</strong> identical residues divided by length <strong>of</strong><br />
alignment) using the program Cd-hit (Li and Godzik, 2006).<br />
Ribosomal RNA sequences were identified by BLASTN queries<br />
<strong>of</strong> the reference sequence <strong>of</strong> each cluster against the noncurated,<br />
GenBank nucleotide database (nt) (Benson et al.,<br />
2007) using cut-<strong>of</strong>f criteria <strong>of</strong> E-value � 10 -3 , nucleic acid<br />
length � 69 and per cent identity � 40% previously established<br />
with in silico tests for rRNA sequence predictions <strong>of</strong><br />
short pyrosequences (Frias-Lopez et al., 2008; Mou et al.,<br />
2008). We conservatively identified a sequence as rRNAderived<br />
and removed it from the <strong>analysis</strong> pipeline if any <strong>of</strong> the<br />
top three BLASTN hits were to an rRNA gene.<br />
cDNA sequence annotation<br />
The criteria for protein predictions generated using BLASTX<br />
against the NCBI curated, non-redundant reference<br />
sequence database (RefSeq) (Pruitt et al., 2005) were established<br />
with in silico tests to determine suitable cut-<strong>of</strong>f limits for<br />
reliable functional prediction. For these tests, 100 arbitrarily<br />
selected, known functional gene sequences were fragmented<br />
into 20–500 bp fragments and analysed using BLASTX<br />
against RefSeq to determine if the best BLAST hit was to the<br />
correct gene function, excluding self-hits. Based on these<br />
analyses, the cut-<strong>of</strong>f criteria for protein prediction were<br />
set as E-value < 0.01, identity > 40% and overlapping<br />
length > 23 aa to the corresponding best hit.<br />
Sequences with hits to RefSeq were assigned functional<br />
protein or pathway predictions based on the COG database<br />
(Tatusov et al., 2000) or KEGG database (Kanehisa and<br />
Goto, 2000). The cut-<strong>of</strong>f criteria for functional protein prediction<br />
based on orthologous groups using BLASTX <strong>analysis</strong><br />
against the COG database were established using the same<br />
in silico approach with 100 bp fragments <strong>of</strong> known functional<br />
genes as E-value < 0.1, identity > 40% and overlapping<br />
length > 23 aa to the corresponding best hit. The COG cut-<strong>of</strong>f<br />
criteria were also applied to the KEGG database for pathway<br />
prediction because <strong>of</strong> the similarity in database size. Taxonomic<br />
binning <strong>of</strong> the sequences was carried out using MEGAN<br />
with the default settings for all parameters (Huson et al.,<br />
2007); this program assigns likely taxonomic origin to<br />
sequences based on the NCBI taxonomy <strong>of</strong> closest BLAST<br />
hits. The taxonomic affiliations <strong>of</strong> the putative mRNA<br />
sequences were predicted using MEGAN to the family level,<br />
and the top BLAST hit for any higher-resolution taxonomic<br />
assignments. All non-rRNA sequences that had no RefSeq<br />
hits were BLASTX-queried against the nr database as well as<br />
against CAMERA un-assembled ORFs predicted from the<br />
Global Ocean Survey reads (http://camera.calit2.net/<br />
index.php) (Seshadri et al., 2007).<br />
Eukaryotic sequence annotation<br />
Eukaryotic transcripts were binned by MEGAN. Sequences<br />
were queried (BLASTX) against a curated database <strong>of</strong> protein<br />
sequences derived from all available complete eukaryotic<br />
organelle and nuclear genomes (currently, 46 eukaryotic<br />
genomes). Transcripts that matched a reference protein<br />
sequence with > 60% identity and an E-value < e -10 were<br />
retained and the reference protein for the cluster was used for<br />
functional annotation. Functional annotation was performed<br />
using Java-based Blast2go (Conesa et al., 2005) that annotates<br />
genes based on similarity searches with statistical<br />
<strong>analysis</strong> and highlighted visualization on directed acyclic<br />
graphs.<br />
16S rRNA gene libraries<br />
PCR amplification <strong>of</strong> ribosomal DNA was carried out using<br />
primers 27F and 1522R (Johnson, 1994). The PCR conditions<br />
were as follows: 3 min at 96°C, followed by 30 cycles <strong>of</strong><br />
denaturation at 95°C for 50 s, annealing at 58°C for 50 s,<br />
primer extension at 72°C for 1 min and a final extension at<br />
72°C for 10 min. PCR products were cleaned using the<br />
QIAquick PCR Purification Kit (Qiagen) and multiple PCR<br />
reactions were pooled and cloned into pCR2.1 vector using<br />
the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR<br />
amplifications included standard no-template controls.<br />
Clones from each sample (192) were sequenced at the University<br />
<strong>of</strong> Georgia Sequencing Facility on an ABI 3100<br />
(Applied Biosystems, Foster City, CA).<br />
Predicted highly expressed genes<br />
The PHX genes were determined for cultured representatives<br />
<strong>of</strong> three prokaryotic taxa that were well represented in the<br />
transcript libraries (Prochlorococcus, Roseobacter and<br />
© 2009 The Authors<br />
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375