03.11.2012 Views

Comparative day/night metatranscriptomic analysis of microbial ...

Comparative day/night metatranscriptomic analysis of microbial ...

Comparative day/night metatranscriptomic analysis of microbial ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1372 R. S. Poretsky et al.<br />

Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies,<br />

Madison, WI) that uses a 5′-phosphate-dependent exonuclease<br />

to degrade rRNAs. The MICROBExpress kit (Ambion)<br />

subtractive hybridization with capture oligonucleotides<br />

hybridized to magnetic beads was subsequently used as an<br />

additional mRNA enrichment step.<br />

In order to obtain mg quantities <strong>of</strong> mRNA, approximately<br />

500 ng <strong>of</strong> RNA was linearly amplified using the MessageAmp<br />

II-Bacteria Kit (Ambion) according to the manufacturer’s<br />

instructions. Finally, the amplified, antisense RNA (aRNA)<br />

was converted to double-stranded cDNA with random hexamers<br />

using the Universal RiboClone cDNA Synthesis<br />

System (Promega, Madison, WI). The cDNA was purified with<br />

the Wizard DNA Clean-up System (Promega). The quality<br />

and quantity <strong>of</strong> the total RNA, mRNA, aRNA and cDNA were<br />

assessed by measurement on the NanoDrop-1000 Spectrophotometer<br />

(NanoDrop Technologies, Wilmington, DE) and<br />

the Experion Automated Electrophoresis System (Bio-Rad,<br />

Hercules, CA).<br />

cDNA sequencing and quality control<br />

cDNAs from each sample (<strong>night</strong> and <strong>day</strong>) were sequenced<br />

using the GS 20 sequencing system by 454 Life Sciences<br />

(Branford, CT) (Margulies et al., 2005), resulting in<br />

10 682 120 bp from 106 907 reads for the <strong>night</strong> sample and<br />

13 255 704 bp from 133 515 reads for the <strong>day</strong> sample. The<br />

average sequence length was 99 bp. The sequences have<br />

been deposited in the NCBI Short Read Archive with the<br />

Genome Project ID #33463.<br />

rRNA identification and removal<br />

For rRNA sequence identification, the sequences were clustered<br />

at an identity threshold <strong>of</strong> 98% based on a local alignment<br />

(number <strong>of</strong> identical residues divided by length <strong>of</strong><br />

alignment) using the program Cd-hit (Li and Godzik, 2006).<br />

Ribosomal RNA sequences were identified by BLASTN queries<br />

<strong>of</strong> the reference sequence <strong>of</strong> each cluster against the noncurated,<br />

GenBank nucleotide database (nt) (Benson et al.,<br />

2007) using cut-<strong>of</strong>f criteria <strong>of</strong> E-value � 10 -3 , nucleic acid<br />

length � 69 and per cent identity � 40% previously established<br />

with in silico tests for rRNA sequence predictions <strong>of</strong><br />

short pyrosequences (Frias-Lopez et al., 2008; Mou et al.,<br />

2008). We conservatively identified a sequence as rRNAderived<br />

and removed it from the <strong>analysis</strong> pipeline if any <strong>of</strong> the<br />

top three BLASTN hits were to an rRNA gene.<br />

cDNA sequence annotation<br />

The criteria for protein predictions generated using BLASTX<br />

against the NCBI curated, non-redundant reference<br />

sequence database (RefSeq) (Pruitt et al., 2005) were established<br />

with in silico tests to determine suitable cut-<strong>of</strong>f limits for<br />

reliable functional prediction. For these tests, 100 arbitrarily<br />

selected, known functional gene sequences were fragmented<br />

into 20–500 bp fragments and analysed using BLASTX<br />

against RefSeq to determine if the best BLAST hit was to the<br />

correct gene function, excluding self-hits. Based on these<br />

analyses, the cut-<strong>of</strong>f criteria for protein prediction were<br />

set as E-value < 0.01, identity > 40% and overlapping<br />

length > 23 aa to the corresponding best hit.<br />

Sequences with hits to RefSeq were assigned functional<br />

protein or pathway predictions based on the COG database<br />

(Tatusov et al., 2000) or KEGG database (Kanehisa and<br />

Goto, 2000). The cut-<strong>of</strong>f criteria for functional protein prediction<br />

based on orthologous groups using BLASTX <strong>analysis</strong><br />

against the COG database were established using the same<br />

in silico approach with 100 bp fragments <strong>of</strong> known functional<br />

genes as E-value < 0.1, identity > 40% and overlapping<br />

length > 23 aa to the corresponding best hit. The COG cut-<strong>of</strong>f<br />

criteria were also applied to the KEGG database for pathway<br />

prediction because <strong>of</strong> the similarity in database size. Taxonomic<br />

binning <strong>of</strong> the sequences was carried out using MEGAN<br />

with the default settings for all parameters (Huson et al.,<br />

2007); this program assigns likely taxonomic origin to<br />

sequences based on the NCBI taxonomy <strong>of</strong> closest BLAST<br />

hits. The taxonomic affiliations <strong>of</strong> the putative mRNA<br />

sequences were predicted using MEGAN to the family level,<br />

and the top BLAST hit for any higher-resolution taxonomic<br />

assignments. All non-rRNA sequences that had no RefSeq<br />

hits were BLASTX-queried against the nr database as well as<br />

against CAMERA un-assembled ORFs predicted from the<br />

Global Ocean Survey reads (http://camera.calit2.net/<br />

index.php) (Seshadri et al., 2007).<br />

Eukaryotic sequence annotation<br />

Eukaryotic transcripts were binned by MEGAN. Sequences<br />

were queried (BLASTX) against a curated database <strong>of</strong> protein<br />

sequences derived from all available complete eukaryotic<br />

organelle and nuclear genomes (currently, 46 eukaryotic<br />

genomes). Transcripts that matched a reference protein<br />

sequence with > 60% identity and an E-value < e -10 were<br />

retained and the reference protein for the cluster was used for<br />

functional annotation. Functional annotation was performed<br />

using Java-based Blast2go (Conesa et al., 2005) that annotates<br />

genes based on similarity searches with statistical<br />

<strong>analysis</strong> and highlighted visualization on directed acyclic<br />

graphs.<br />

16S rRNA gene libraries<br />

PCR amplification <strong>of</strong> ribosomal DNA was carried out using<br />

primers 27F and 1522R (Johnson, 1994). The PCR conditions<br />

were as follows: 3 min at 96°C, followed by 30 cycles <strong>of</strong><br />

denaturation at 95°C for 50 s, annealing at 58°C for 50 s,<br />

primer extension at 72°C for 1 min and a final extension at<br />

72°C for 10 min. PCR products were cleaned using the<br />

QIAquick PCR Purification Kit (Qiagen) and multiple PCR<br />

reactions were pooled and cloned into pCR2.1 vector using<br />

the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR<br />

amplifications included standard no-template controls.<br />

Clones from each sample (192) were sequenced at the University<br />

<strong>of</strong> Georgia Sequencing Facility on an ABI 3100<br />

(Applied Biosystems, Foster City, CA).<br />

Predicted highly expressed genes<br />

The PHX genes were determined for cultured representatives<br />

<strong>of</strong> three prokaryotic taxa that were well represented in the<br />

transcript libraries (Prochlorococcus, Roseobacter and<br />

© 2009 The Authors<br />

Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!