24.07.2013 Views

Characterising the CRISPR immune system in Archaea

Characterising the CRISPR immune system in Archaea

Characterising the CRISPR immune system in Archaea

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ABSTRACT<br />

<strong>Archaea</strong>, a group of microorganisms dist<strong>in</strong>ct from bacteria and<br />

eukaryotes, are equipped with an adaptive <strong>immune</strong> <strong>system</strong> called<br />

<strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>, which relies on an RNA <strong>in</strong>terference mechanism<br />

to combat <strong>in</strong>vad<strong>in</strong>g viruses and plasmids. Us<strong>in</strong>g a genome<br />

sequence analysis approach, <strong>the</strong> four components of archaeal<br />

genomic <strong>CRISPR</strong> loci were analysed, namely, repeats, spacers,<br />

leaders and cas genes. Based on analysis of spacer sequences it<br />

was predicted that <strong>the</strong> <strong>immune</strong> <strong>system</strong> combats viruses and plasmids<br />

by target<strong>in</strong>g <strong>the</strong>ir DNA. Fur<strong>the</strong>rmore, analysis of repeats,<br />

leaders and cas genes revealed that <strong>CRISPR</strong> <strong>system</strong>s exist as dist<strong>in</strong>ct<br />

families which have key differences between <strong>the</strong>mselves.<br />

Closely related organisms were seen harbour<strong>in</strong>g different CR-<br />

ISPR <strong>system</strong>s, while some distantly related species carried similar<br />

<strong>system</strong>s, <strong>in</strong>dicat<strong>in</strong>g frequent horizontal exchange. Moreover, it<br />

was found that cas genes of Type I <strong>CRISPR</strong> <strong>system</strong>s could be divided<br />

<strong>in</strong>to functionally <strong>in</strong>dependent modules which occasionally<br />

exchange to form new comb<strong>in</strong>ations of Type I <strong>system</strong>s. Fur<strong>the</strong>rmore,<br />

Type III <strong>system</strong>s were found to be genomically associated<br />

with various comb<strong>in</strong>ations of accessory genes which may play a<br />

role <strong>in</strong> functionally extend<strong>in</strong>g <strong>the</strong> activity of <strong>the</strong> Type III <strong>in</strong>terference<br />

complexes. This dynamic nature of <strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong><br />

<strong>system</strong>s may be a prerequisite for <strong>the</strong>ir cont<strong>in</strong>ued efficacy aga<strong>in</strong>st<br />

<strong>the</strong> ever chang<strong>in</strong>g threats <strong>the</strong>y protect <strong>the</strong>ir hosts from.<br />

iii


SUMMARY<br />

<strong>Archaea</strong> comprise a group of microorganisms dist<strong>in</strong>ct from both<br />

bacteria and eukaryotes. These organisms are equipped with an<br />

adaptive <strong>immune</strong> <strong>system</strong> aga<strong>in</strong>st <strong>in</strong>vad<strong>in</strong>g viruses and plasmids.<br />

The <strong>immune</strong> <strong>system</strong> works by tak<strong>in</strong>g up DNA from a virus, and<br />

sav<strong>in</strong>g it on <strong>the</strong> host’s own chromosome as a template to produce<br />

<strong>in</strong>terference RNA. The RNA recognises <strong>the</strong> virus <strong>the</strong> next time<br />

it <strong>in</strong>fects and signals <strong>the</strong> degradation of its genetic material. All<br />

<strong>the</strong> components of this <strong>system</strong> are encoded on <strong>the</strong> chromosomes<br />

of <strong>the</strong> organisms, and by look<strong>in</strong>g <strong>in</strong>to <strong>the</strong>se components us<strong>in</strong>g<br />

genome sequence analysis, a number of <strong>in</strong>sights were ga<strong>in</strong>ed.<br />

It was found that <strong>the</strong> <strong>immune</strong> <strong>system</strong> kills viruses by target<strong>in</strong>g<br />

<strong>the</strong>ir DNA, first and foremost. Fur<strong>the</strong>rmore, different archaea<br />

have different variants of <strong>the</strong> <strong>immune</strong> <strong>system</strong>, and most archaea<br />

harbour several variants at <strong>the</strong> same time, probably to aid <strong>the</strong>m<br />

<strong>in</strong> target<strong>in</strong>g different types of viruses. The <strong>system</strong>s <strong>the</strong>mselves<br />

are composed of <strong>in</strong>dependent modules which are responsible for<br />

different stages of <strong>the</strong> <strong>immune</strong> response. By comb<strong>in</strong><strong>in</strong>g <strong>the</strong> modules<br />

<strong>in</strong> various comb<strong>in</strong>ations and extend<strong>in</strong>g <strong>the</strong>m with additional<br />

components as well as exchang<strong>in</strong>g <strong>the</strong>m with o<strong>the</strong>r archaea, <strong>the</strong><br />

organisms ensure that <strong>the</strong>ir <strong>immune</strong> <strong>system</strong>s are fit to handle<br />

diverse and cont<strong>in</strong>uously evolv<strong>in</strong>g threats.<br />

iv


SAMMENFATNING<br />

Arkæa udgør en gruppe af organismer som er forskellige fra<br />

både bakterier og eukaryoter. De er udstyrret med et adaptivt<br />

immun <strong>system</strong> mod <strong>in</strong>vaderende vira og plasmider. Immun<strong>system</strong>et<br />

virker ved at optage virussens DNA, som bliver gemt i<br />

værtens eget kromosom for derved at blive brugt som skabelon<br />

til fremstill<strong>in</strong>g af <strong>in</strong>terferens RNA. RNA’et genkender virussen<br />

næste gang den <strong>in</strong>ficerer, og signalerer derved for nedbrydelsen<br />

af virussens genetiske materiale. Alle immun<strong>system</strong>ets komponenter<br />

er <strong>in</strong>dkodet i organismernes DNA, og ved at undersøge<br />

komponenterne gennem genom sekvens analyse blev der gjort<br />

en række opdagelser. Vi fandt ud af at immun<strong>system</strong>et dræber<br />

vira ved først og fremmest at angribe deres DNA. Derudover<br />

har forskellige arkæa forskellige varianter af immun <strong>system</strong>et, og<br />

de fleste arkæa besidder flere af varianterne på én gang, hvilket<br />

sandsynligvis hjælper dem med at kunne tackle forskellige typer<br />

vira. Selve immun<strong>system</strong>erne består af uafhængige moduler som<br />

hver især står for forskellige stadier af immun reaktionen. Ved at<br />

komb<strong>in</strong>ere modulerne i forskellige komb<strong>in</strong>ationer eller udvide<br />

dem med yderligere komponenter samt at ombytte dem med<br />

andre arkæa, sikrer organismerne sig at deres immun <strong>system</strong><br />

er opdateret til at kunne modstå de forskelligartede trusler som<br />

hele tiden udvikler sig.<br />

v


PREFACE<br />

The work presented <strong>in</strong> this <strong>the</strong>sis was carried out at <strong>the</strong> Danish<br />

<strong>Archaea</strong> Centre at <strong>the</strong> Department of Biology, University of<br />

Copenhagen from May 2007 to July 2012 under <strong>the</strong> supervision<br />

of Professor Roger A. Garrett.<br />

The <strong>in</strong>itial objective of <strong>the</strong> Ph.D. study was to characterise<br />

<strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong> <strong>in</strong> Sulfolobus species us<strong>in</strong>g computational<br />

methods, especially comparative genome sequence analysis.<br />

Sulfolobus species have been extensively studied with regard to<br />

<strong>the</strong> viruses and plasmids which <strong>in</strong>fect <strong>the</strong>m, with many genome<br />

sequences available of hosts as well as <strong>the</strong>ir extrachromosomal<br />

elements. Fur<strong>the</strong>rmore, Sulfolobus species harbour extensive and<br />

diverse <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s. Thus <strong>the</strong>re was more than<br />

enough data to beg<strong>in</strong> <strong>the</strong> analyses. After exhaust<strong>in</strong>g <strong>the</strong> possible<br />

ways of analys<strong>in</strong>g <strong>CRISPR</strong> spacer, repeat, leader and cas gene<br />

sequences from Sulfolobales, <strong>the</strong> analyses were extended to <strong>the</strong><br />

rest of <strong>the</strong> available archaeal genomes <strong>in</strong> collaboration with Dr.<br />

Gisle A. Vestergaard start<strong>in</strong>g July 2010. Gisle worked with me on<br />

<strong>the</strong> project until October 2011 after which I overtook it. Extend<strong>in</strong>g<br />

<strong>the</strong> study to o<strong>the</strong>r archaea proved fruitful, but was also a big<br />

mouthful, and <strong>the</strong>se analyses are still <strong>in</strong> <strong>the</strong> process of be<strong>in</strong>g<br />

completed. Prelim<strong>in</strong>ary results have, however, been <strong>in</strong>cluded <strong>in</strong><br />

this <strong>the</strong>sis, to which especially <strong>the</strong> last part is dedicated.<br />

The results from this Ph.D. study have been published throughout<br />

many <strong>in</strong>dividual research papers, which are all enclosed,<br />

and most of which have multiple co-authors. Therefore <strong>the</strong> extent<br />

of my own contributions to each of <strong>the</strong>se papers have been<br />

stipulated on <strong>the</strong> sheet preced<strong>in</strong>g every paper.<br />

For <strong>the</strong> sake of clarify<strong>in</strong>g <strong>the</strong> extent of my collaboration with<br />

Dr. Gisle A. Vestergaard and its <strong>in</strong>fluence on what is presented<br />

<strong>in</strong> this <strong>the</strong>sis, he was deeply <strong>in</strong>volved <strong>in</strong> all work concern<strong>in</strong>g<br />

<strong>the</strong> classification of archaeal cas genes <strong>in</strong>to separate functional<br />

modules (aCas, iCas, etc.), and <strong>the</strong> classification of iCmr modules<br />

<strong>in</strong>to 5 families, A through E. In <strong>the</strong>se studies our workload<br />

was more or less equal. Some aspects of this work is already<br />

published while o<strong>the</strong>rs rema<strong>in</strong>. As for <strong>the</strong> analysis of archaeal<br />

<strong>CRISPR</strong> repeats and leaders, as well as <strong>the</strong> def<strong>in</strong>ition of iCmr<br />

accessory genes, <strong>the</strong>se analyses were conducted by my myself<br />

after Gisle left <strong>the</strong> project and are still unpublished.<br />

vi


ACKNOWLEDGEMENTS<br />

First and foremost I’d like to thank my supervisor, Professor<br />

Roger A. Garrett. We had many long, <strong>in</strong>spir<strong>in</strong>g discussions. Also,<br />

with Roger I saw how experience and wisdom go hand <strong>in</strong> hand.<br />

But most importantly I want to thank him for hav<strong>in</strong>g patience<br />

and ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g his trust <strong>in</strong> me dur<strong>in</strong>g <strong>the</strong> challenges which<br />

were faced.<br />

Dur<strong>in</strong>g my Ph.D. I made friends. Chandra, Gisle, Chao and<br />

L<strong>in</strong>g. Four friends <strong>in</strong> four years. It’s difficult not to be thankful<br />

for that.<br />

Also, I had <strong>the</strong> privilege of work<strong>in</strong>g <strong>in</strong> a lab full of absolutely<br />

terrific people. I can’t th<strong>in</strong>k of a s<strong>in</strong>gle exception. Despite many<br />

of us hav<strong>in</strong>g to deal with our own day-to-day challenges, between<br />

us <strong>the</strong>re was a genu<strong>in</strong>ely positive and understand<strong>in</strong>g atmosphere.<br />

When compar<strong>in</strong>g with most o<strong>the</strong>r workplaces you realise that<br />

this is <strong>the</strong> k<strong>in</strong>d of th<strong>in</strong>g you mustn’t take for granted.<br />

vii


LIST OF TABLES<br />

Table 1 Features of <strong>Archaea</strong>, Bacteria and Eucarya 6<br />

Table 2 Extremophile record-holders 9<br />

Table 3 Sequenced Sulfolobales genomes 15<br />

Table 4 Properties of Sulfolobus viruses & plasmids 18<br />

Table 5 cas genes and <strong>the</strong>ir functions 25<br />

Table 6 Overview of accessory iCmr genes 33<br />

xi


LIST OF FIGURES<br />

Figure 1 Haeckel’s tree of life 2<br />

Figure 2 16S-RNA universal phylogenetic tree 4<br />

Figure 3 RNA polymerases form <strong>the</strong> three doma<strong>in</strong>s 7<br />

Figure 4 <strong>Archaea</strong>l 16S RNA phylogenetic tree 11<br />

Figure 5 A Sulfolobus cell <strong>in</strong>fected with a virus 13<br />

Figure 6 Morphologies of select archaeal viruses 17<br />

Figure 7 <strong>CRISPR</strong> immunity: mode of action 20<br />

Figure 8 Gene maps of accessory iCmr genes 34<br />

Figure 9 Tree of csx1 genes from Sulfolobales 37<br />

xiii


3 DISCUSSION &<br />

PERSPECTIVES<br />

Although experimental studies resolv<strong>in</strong>g <strong>the</strong> mechanistic details<br />

of <strong>the</strong> Cas <strong>in</strong>terference complexes have started to ga<strong>in</strong> momentum,<br />

with more and more articles be<strong>in</strong>g published each month, <strong>the</strong>re<br />

are still many holes <strong>in</strong> our understand<strong>in</strong>g of key parts of <strong>the</strong> <strong>in</strong>terference<br />

process. Despite this, <strong>the</strong>re are some marked differences<br />

which are already established between iCas and iCmr, such as <strong>the</strong><br />

manner <strong>in</strong> which self vs. non-self nucleic acid is dist<strong>in</strong>guished1 ,<br />

or <strong>the</strong> species of mature crRNA (long[10] vs. short[27]) utilised<br />

by ei<strong>the</strong>r <strong>system</strong>.<br />

In addition to such specific differences, a deeper divergence<br />

between <strong>the</strong> two <strong>system</strong>s is becom<strong>in</strong>g <strong>in</strong>creas<strong>in</strong>gly apparent. For<br />

<strong>the</strong> iCas prote<strong>in</strong> complex, <strong>the</strong> f<strong>in</strong>d<strong>in</strong>gs[32, 67] first made for <strong>the</strong><br />

<strong>system</strong> <strong>in</strong> E. coli (Type E), are remarkably be<strong>in</strong>g rediscovered <strong>in</strong><br />

<strong>the</strong> diverse iCas <strong>system</strong>s (types A, D and F) of Sulfolobus, Bacillus<br />

and Pseudomonas[39, 53, 81]. As for iCmr, <strong>the</strong> opposite seems to<br />

be <strong>the</strong> case. Here we see surpris<strong>in</strong>g mechanistic diversity despite<br />

<strong>the</strong> apparent homology between <strong>the</strong> <strong>system</strong>s. E. g. <strong>the</strong> Type<br />

A iCmr <strong>system</strong> of Staphylococcus targets DNA[47], and while<br />

two different Type B <strong>system</strong>s, <strong>in</strong> Pyrococcus[27] and Sulfolobus[86]<br />

respectively, both target RNA, <strong>the</strong>y do so <strong>in</strong> ways which are very<br />

different2 . Fur<strong>the</strong>rmore, <strong>the</strong>re is evidence now for a ano<strong>the</strong>r<br />

Sulfolobus Type B iCmr <strong>system</strong> target<strong>in</strong>g DNA (Deng et al., under<br />

revision), obscur<strong>in</strong>g <strong>the</strong> picture even fur<strong>the</strong>r.<br />

This tendency for iCas <strong>system</strong>s be<strong>in</strong>g mechanistically conserved,<br />

and iCmr <strong>system</strong>s exhibit<strong>in</strong>g diversity, is also reflected<br />

on <strong>the</strong> genomic level. Although sequences of <strong>the</strong> <strong>in</strong>dividual iCas<br />

genes have diverged considerably between <strong>the</strong> subtypes, some<br />

even beyond recognition, <strong>the</strong> overall gene composition is constant,<br />

with cas3, cas5, cas7 and cas8 compris<strong>in</strong>g a universal core.<br />

iCmr modules on <strong>the</strong> o<strong>the</strong>r hand vary with regard to <strong>the</strong> content<br />

of RAMP genes depend<strong>in</strong>g on <strong>the</strong>ir be<strong>in</strong>g types A, B, C or D.<br />

Also, and perhaps more importantly, very similar iCmr modules<br />

are sometimes seen accompanied by different comb<strong>in</strong>ations of<br />

accessory genes (Section 2.4.1) which encode prote<strong>in</strong>s that may<br />

be responsible for modify<strong>in</strong>g <strong>the</strong> core functionality of <strong>the</strong> iCmr<br />

complex, possibly account<strong>in</strong>g for <strong>the</strong> mechanistic diversity so far<br />

35<br />

1with PAMs[49] as<br />

opposed to<br />

base-pair<strong>in</strong>g[48]<br />

respectively<br />

2 Both types utilise<br />

crRNA to target<br />

complementary<br />

ssRNA, but while<br />

<strong>the</strong> former always<br />

cleaves <strong>the</strong> target<br />

RNA at a fixed<br />

position employ<strong>in</strong>g<br />

some k<strong>in</strong>d of<br />

ruler-mechanism, <strong>the</strong><br />

latter cleaves <strong>the</strong><br />

target <strong>in</strong> a sequence<br />

specific manner at<br />

each ‘UA’<br />

d<strong>in</strong>ucleotide<br />

encountered


4 CONCLUSION<br />

Dur<strong>in</strong>g this Ph.D study <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s <strong>in</strong> Sulfolobales have<br />

been extensively characterised us<strong>in</strong>g a bio<strong>in</strong>formatical genome<br />

sequence analysis approach. Later <strong>the</strong> analyses were extended<br />

to all available archaeal genomes. The latter work is still <strong>in</strong> <strong>the</strong><br />

process of be<strong>in</strong>g concluded. To summarise:<br />

• Sulfolobales <strong>CRISPR</strong> spacer sequences were analysed to<br />

f<strong>in</strong>d matches to <strong>the</strong> large number of sequenced Sulfolobales<br />

extrachromosomal elements. The matches obta<strong>in</strong>ed were<br />

used to successfully predict <strong>the</strong> target nucleic acid of <strong>the</strong><br />

<strong>CRISPR</strong> <strong>system</strong>, back when <strong>the</strong> target nucleic acid was not<br />

known.<br />

• Analysis of Sulfolobales <strong>CRISPR</strong> repeats, spacers, leaders<br />

and cas genes revealed that <strong>CRISPR</strong> <strong>system</strong>s exist <strong>in</strong> families,<br />

where leader types, repeat types, cas gene types and<br />

PAM motifs go hand <strong>in</strong> hand. At <strong>the</strong> time, this had not<br />

been shown for any o<strong>the</strong>r organism.<br />

• Analysis of <strong>the</strong> genomic contexts of Sulfolobales CRIS-<br />

PR/Cas loci revealed that <strong>the</strong> <strong>system</strong>s are located <strong>in</strong> genomic<br />

hyper-variable regions and subject to frequent horizontal<br />

gene transfer, where transposable elements and<br />

tox<strong>in</strong>-antitox<strong>in</strong> loci play a role <strong>in</strong> modulat<strong>in</strong>g <strong>the</strong>ir mobility.<br />

• Extend<strong>in</strong>g <strong>the</strong> analyses to o<strong>the</strong>r archaea outside Sulfolobales<br />

revealed that <strong>the</strong> <strong>CRISPR</strong> <strong>in</strong>terference modules (iCas<br />

and iCmr) <strong>in</strong> particular are very diverse, while <strong>the</strong> adaptation<br />

modules (aCas) are remarkably conserved. Individual<br />

modules were also seen <strong>in</strong>terchang<strong>in</strong>g giv<strong>in</strong>g rise to CR-<br />

ISPR/Cas loci with different comb<strong>in</strong>ations of functional<br />

modules.<br />

• iCmr modules of different types were found to be associated<br />

with a rich array of various accessory genes which were<br />

also found to exchange between different types of iCmr<br />

modules. It was hypo<strong>the</strong>sised that <strong>the</strong>se accessory genes<br />

extend <strong>the</strong> core functionality of <strong>the</strong> iCmr modules, e. g. by<br />

conferr<strong>in</strong>g <strong>the</strong> ability to switch target nucleic acids.<br />

39


5 PUBLICATIONS<br />

The publications result<strong>in</strong>g from this PhD study are <strong>in</strong>cluded <strong>in</strong><br />

this chapter <strong>in</strong> chronological order. As most of <strong>the</strong> publications<br />

here have multiple authors with vary<strong>in</strong>g contributions, I have<br />

rated my own level of contribution to each publication as ei<strong>the</strong>r<br />

‘major, ‘substantial’ or ‘m<strong>in</strong>or’. ‘Major’ means that <strong>the</strong> majority<br />

of <strong>the</strong> work beh<strong>in</strong>d <strong>the</strong> publication was carried out by myself.<br />

‘Substantial’ means that my contribution comprised a smaller<br />

but crucial part of <strong>the</strong> manuscript, while ‘m<strong>in</strong>or’ means that<br />

my contribution was small and non-crucial to that particular<br />

manuscript, although still a part of my own Ph. D project. In<br />

addition to my level of contribution, <strong>the</strong> exact nature of my<br />

contribution is also stipulated.<br />

A note on iCmr family nomenclature<br />

In conformance with <strong>the</strong> recent update of cas gene nomenclature[45],<br />

<strong>the</strong> iCmr families referred to <strong>in</strong> publications 5.7[25], 5.8[20] and<br />

5.10[21] as ‘B’ and ‘C’ are now merged <strong>in</strong>to ‘B’, while ‘E’ is now<br />

‘A’, ‘A’ is now ‘C’, and ‘D’ rema<strong>in</strong>s ‘D’. The new nomenclature is<br />

used throughout this <strong>the</strong>sis and <strong>in</strong> publications to come, whereas<br />

<strong>the</strong> publications listed above conta<strong>in</strong> <strong>the</strong> old nomenclature. So <strong>in</strong><br />

summary:<br />

<strong>in</strong> <strong>the</strong>sis <strong>in</strong> [25], [20] and [21]<br />

A E<br />

B B and C<br />

C A<br />

D D<br />

41


JOURNAL OF BACTERIOLOGY, Oct. 2008, p. 6837–6845 Vol. 190, No. 20<br />

0021-9193/08/$08.000 doi:10.1128/JB.00795-08<br />

Copyright © 2008, American Society for Microbiology. All Rights Reserved.<br />

Stygiolobus Rod-Shaped Virus and <strong>the</strong> Interplay of Crenarchaeal<br />

Rudiviruses with <strong>the</strong> <strong>CRISPR</strong> Antiviral System †<br />

Gisle Vestergaard, 1 Shiraz A. Shah, 1 Ariane Bize, 2 Werner Reitberger, 3 Monika Reuter, 3 Hien Phan, 1<br />

Ariane Briegel, 4 Re<strong>in</strong>hard Rachel, 3 Roger A. Garrett, 1 and David Prangishvili 2 *<br />

Danish <strong>Archaea</strong> Centre and Centre for Comparative Genomics, Department of Biology, Copenhagen University,<br />

Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark 1 ;MolecularBiologyof<strong>the</strong>Gene<strong>in</strong>ExtremophilesUnit,<br />

Institut Pasteur, rue Dr. Roux 25, 75724 Paris Cedex 15, France 2 ; Department of Microbiology,<br />

University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany 3 ; and<br />

Max-Planck-Institut of Biochemistry, Molecular Structural Biology, Am Klopferspitz 21,<br />

D-82152 Mart<strong>in</strong>sried, Germany 4<br />

Received 6 June 2008/Accepted 11 August 2008<br />

A newly characterized archaeal rudivirus Stygiolobus rod-shaped virus (SRV), which <strong>in</strong>fects a hyper<strong>the</strong>rmophilic<br />

Stygiolobus species, was isolated from a hot spr<strong>in</strong>g <strong>in</strong> <strong>the</strong> Azores, Portugal. Its virions are rod-shaped, 702<br />

( 50) by 22 ( 3) nm <strong>in</strong> size, and nonenveloped and carry three tail fibers at each term<strong>in</strong>us. The l<strong>in</strong>ear<br />

double-stranded DNA genome conta<strong>in</strong>s 28,096 bp and an <strong>in</strong>verted term<strong>in</strong>al repeat of 1,030 bp. The SRV shows<br />

morphological and genomic similarities to <strong>the</strong> o<strong>the</strong>r characterized rudiviruses Sulfolobus rod-shaped virus 1<br />

(SIRV1), SIRV2, and Acidianus rod-shaped virus 1, isolated from hot acidic spr<strong>in</strong>gs of Iceland and Italy. The<br />

s<strong>in</strong>gle major rudiviral structural prote<strong>in</strong> is shown to generate long tubular structures <strong>in</strong> vitro of similar<br />

dimensions to those of <strong>the</strong> virion, and we estimate that <strong>the</strong> virion constitutes a s<strong>in</strong>gle, superhelical, doublestranded<br />

DNA embedded <strong>in</strong>to such a prote<strong>in</strong> structure. Three additional m<strong>in</strong>or conserved structural prote<strong>in</strong>s<br />

are also identified. Ubiquitous rudiviral prote<strong>in</strong>s with assigned functions <strong>in</strong>clude glycosyl transferases and a<br />

S-adenosylmethion<strong>in</strong>e-dependent methyltransferase, as well as a Holliday junction resolvase, a transcriptionally<br />

coupled helicase and nuclease implicated <strong>in</strong> DNA replication. Analysis of matches between known crenarchaeal<br />

chromosomal <strong>CRISPR</strong> spacer sequences, implicated <strong>in</strong> a viral defense <strong>system</strong>, and rudiviral genomes<br />

revealed that about 10% of <strong>the</strong> 3,042 unique acido<strong>the</strong>rmophile spacers yield significant matches to rudiviral<br />

genomes, with a bias to highly conserved prote<strong>in</strong> genes, consistent with <strong>the</strong> widespread presence of rudiviruses<br />

<strong>in</strong> hot acidophilic environments. We propose that <strong>the</strong> 12-bp <strong>in</strong>dels which are commonly found <strong>in</strong> conserved<br />

rudiviral prote<strong>in</strong> genes may be generated as a reaction to <strong>the</strong> presence of <strong>the</strong> host <strong>CRISPR</strong> defense <strong>system</strong>.<br />

Viruses of <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaea are extremely<br />

diverse <strong>in</strong> <strong>the</strong>ir morphotypes and <strong>in</strong> <strong>the</strong> properties<br />

of <strong>the</strong>ir double-stranded DNA (dsDNA) genomes (reviewed<br />

<strong>in</strong> references 19 and 23). Moreover, some of <strong>the</strong> virion<br />

morphotypes are unique for dsDNA viruses from any doma<strong>in</strong><br />

of life. Many of <strong>the</strong>se viruses have been classified <strong>in</strong>to<br />

seven new families that <strong>in</strong>clude rod-shaped rudiviruses, filamentous<br />

lipothrixviruses, sp<strong>in</strong>dle-shaped fuselloviruses,<br />

and a bottle-shaped ampullavirus (reviewed <strong>in</strong> reference<br />

24). The bicaudavirus Acidianus two-tailed virus (ATV) exhibits<br />

an exceptional two-tailed morphology and <strong>the</strong> unique<br />

viral property of develop<strong>in</strong>g long tail-like appendages <strong>in</strong>dependently<br />

of <strong>the</strong> host cell (11). Crenarchaeal viral research<br />

is still at an early stage of development, and <strong>in</strong>sights <strong>in</strong>to<br />

basic molecular processes, <strong>in</strong>clud<strong>in</strong>g <strong>in</strong>fection, replication,<br />

packag<strong>in</strong>g, and virus-host <strong>in</strong>teractions, are limited. One of<br />

<strong>the</strong> ma<strong>in</strong> reasons for this lies <strong>in</strong> <strong>the</strong> high proportion of<br />

predicted genes with unknown functions (25).<br />

* Correspond<strong>in</strong>g author. Mail<strong>in</strong>g address: Molecular Biology of <strong>the</strong><br />

Gene <strong>in</strong> Extremophiles Unit, Institut Pasteur, rue Dr. Roux 25, 75724<br />

Paris Cedex 15, France. Phone: 33-(0)144-38-9119. Fax: 33-(0)145-68-<br />

8834. E-mail: prangish@pasteur.fr.<br />

† Supplemental material for this article may be found at http://jb<br />

.asm.org/.<br />

Published ahead of pr<strong>in</strong>t on 22 August 2008.<br />

6837<br />

At present, viruses of <strong>the</strong> family Rudiviridae are <strong>the</strong> most<br />

promis<strong>in</strong>g for detailed studies because <strong>the</strong>y can be obta<strong>in</strong>ed <strong>in</strong><br />

reasonable yields, and <strong>the</strong>re are already some <strong>in</strong>sights <strong>in</strong>to<br />

<strong>the</strong>ir mechanisms of replication, transcriptional regulation,<br />

and host cell adaptation (4, 12, 13, 20, 21). To date, three<br />

rudiviruses have been characterized, all from <strong>the</strong> order Sulfolobales:<br />

<strong>the</strong> closely related Sulfolobus rod-shaped virus 1<br />

(SIRV1), and SIRV2, isolated on Iceland, which <strong>in</strong>fect stra<strong>in</strong>s<br />

of Sulfolobus islandicus (20, 22), and Acidianus rod-shaped<br />

virus 1 (ARV1), isolated at Pozzuoli, Italy, which propagates <strong>in</strong><br />

Acidianus stra<strong>in</strong>s (34). Moreover, rudivirus-like morphotypes<br />

and partial rudiviral genome sequences have been detected <strong>in</strong><br />

environmental samples collected from both acidic and neutrophilic<br />

hot aquatic sites (27, 29, 32).<br />

All rudiviral genomes carry l<strong>in</strong>ear dsDNA genomes with<br />

long <strong>in</strong>verted term<strong>in</strong>al repeats (ITRs) end<strong>in</strong>g <strong>in</strong> covalently<br />

closed hairp<strong>in</strong> structures with 5-to-3 l<strong>in</strong>kages (4, 20). The<br />

term<strong>in</strong>al structure is important for replication, which presumably<br />

is <strong>in</strong>itiated by site-specific s<strong>in</strong>gle-strand nick<strong>in</strong>g<br />

with<strong>in</strong> <strong>the</strong> ITR, with <strong>the</strong> subsequent formation of head-tohead<br />

and tail-to-tail <strong>in</strong>termediates, and <strong>the</strong> conversion of<br />

genomic concatemers <strong>in</strong>to monomers by a virus-encoded<br />

Holliday junction resolvase (20). This basic replication<br />

mechanism appears to be similar to that used by <strong>the</strong> eukaryal<br />

poxviruses, Chlorella virus and African sw<strong>in</strong>e fever<br />

virus, although <strong>the</strong>re is no clear similarity between <strong>the</strong> se-<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


6838 VESTERGAARD ET AL. J. BACTERIOL.<br />

quences of <strong>the</strong> implicated archaeal and eukaryal prote<strong>in</strong>s<br />

(20, 25).<br />

The transcriptional patterns of rudiviruses SIRV1 and<br />

SIRV2 are relatively simple, with few temporal expression<br />

differences. An exception is <strong>the</strong> gene encod<strong>in</strong>g <strong>the</strong> major<br />

structural prote<strong>in</strong> that b<strong>in</strong>ds to DNA and, at an early stage<br />

of <strong>in</strong>fection, is expressed as a polycistronic mRNA but appears<br />

as a s<strong>in</strong>gle gene transcript close to <strong>the</strong> eclipse period (12). It<br />

has also been shown that rudiviral transcription can be activated<br />

by a Sulfolobus host-encoded prote<strong>in</strong>, Sta1, that <strong>in</strong>teracts<br />

specifically with TATA-like promoter motifs <strong>in</strong> <strong>the</strong> viral genome<br />

(13).<br />

For SIRV1, a detailed study of <strong>the</strong> mechanism of adaptation<br />

to foreign hosts was conducted. Upon passage of <strong>the</strong> virus<br />

through closely related S. islandicus stra<strong>in</strong>s, complex changes<br />

were detected that were concentrated with<strong>in</strong> six genomic regions<br />

(21, 22). These changes <strong>in</strong>cluded <strong>in</strong>sertions, deletions,<br />

gene duplications, <strong>in</strong>versions, and transpositions, as well as<br />

changes <strong>in</strong> gene sizes that often <strong>in</strong>volved <strong>the</strong> <strong>in</strong>sertion or deletion<br />

of what appeared to be “12-bp elements.” It was concluded<br />

that <strong>the</strong> virus generated a complex mixture of variants,<br />

one or more of which were preferentially propagated when <strong>the</strong><br />

virus entered a new host (21).<br />

Here we describe a novel rudivirus, Stygiolobus rod-shaped<br />

virus (SRV), isolated from <strong>the</strong> Azores, Portugal, a location<br />

geographically distant from <strong>the</strong> locations of <strong>the</strong> o<strong>the</strong>r characterized<br />

rudiviruses (20, 34). SRV shows sufficient differences<br />

from <strong>the</strong> o<strong>the</strong>r rudiviruses, both morphologically and genomically,<br />

to warrant its classification as a novel species. The structural<br />

and genomic properties of <strong>the</strong> rudiviruses are compared<br />

and contrasted, and new data on <strong>the</strong> conserved virion structural<br />

prote<strong>in</strong>s are presented. Different rudiviruses were selected<br />

for <strong>the</strong>se studies on <strong>the</strong> basis of <strong>the</strong> virion or prote<strong>in</strong><br />

yields that were obta<strong>in</strong>ed. Moreover, matches between <strong>the</strong><br />

spacer regions of <strong>the</strong> crenarchaeal chromosomal <strong>CRISPR</strong> repeat<br />

clusters, which have been implicated <strong>in</strong> a viral defense<br />

<strong>system</strong> (18) <strong>in</strong>volv<strong>in</strong>g processed RNA transcribed from one<br />

DNA strand (reviewed <strong>in</strong> references 16 and 17), and <strong>the</strong> rudiviral<br />

genomes are analyzed and <strong>the</strong>ir significance, and possible<br />

relationships to <strong>the</strong> 12-bp <strong>in</strong>dels, are considered.<br />

MATERIALS AND METHODS<br />

Enrichment culture, isolation of viral hosts, and virus purification. An environmental<br />

sample was taken from a hot acidic spr<strong>in</strong>g (93°C, pH 2) <strong>in</strong> <strong>the</strong> Furnas<br />

Bas<strong>in</strong> on Saõ Miguel Island, <strong>the</strong> Azores, Portugal. The aerobic enrichment<br />

culture was established from <strong>the</strong> environmental sample and ma<strong>in</strong>ta<strong>in</strong>ed at 80°C<br />

under conditions described previously for cultivation of members of <strong>the</strong> Sulfolobales<br />

(35). S<strong>in</strong>gle stra<strong>in</strong>s were isolated by plat<strong>in</strong>g on Gelrite (Kelco, San<br />

Diego, CA) conta<strong>in</strong><strong>in</strong>g colloidal sulfur (35) and grown <strong>in</strong> <strong>the</strong> medium of <strong>the</strong><br />

enrichment culture. Cell-free supernatants of cultures were analyzed by transmission<br />

electron microscopy for <strong>the</strong> presence of virus particles.<br />

SRV was isolated from <strong>the</strong> growth culture of its host stra<strong>in</strong> Stygiolobus sp.,<br />

which was colony purified as described above. After cells were grown to <strong>the</strong> late<br />

exponential phase and harvested by low-speed centrifugation (Sorvall GS3 rotor)<br />

(4,500 rpm), virions were precipitated from <strong>the</strong> supernatant by add<strong>in</strong>g NaCl (1<br />

M) and polyethylene glycol 6000 (10% [wt/vol]) and ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g <strong>the</strong> mixture at<br />

4°C overnight. They were purified fur<strong>the</strong>r by CsCl gradient centrifugation (34).<br />

Transmission electron microscopy. Samples were deposited on carbon-coated<br />

copper grids, negatively sta<strong>in</strong>ed with 2% uranyl acetate (pH 4.5), and exam<strong>in</strong>ed<br />

<strong>in</strong> a CM12 transmission electron microscope (FEI, E<strong>in</strong>dhoven, The Ne<strong>the</strong>rlands)<br />

operated at 120 keV. The magnification was calibrated us<strong>in</strong>g catalase crystals<br />

negatively sta<strong>in</strong>ed with uranyl acetate (28). Images were digitally recorded us<strong>in</strong>g<br />

a slow-scan charge-coupled-device camera connected to a PC runn<strong>in</strong>g TVIPS<br />

software (TVIPS GmbH, Gaut<strong>in</strong>g, Germany). To some samples, 0.1% sodium<br />

dodecyl sulfate (SDS) was added, and those samples were ma<strong>in</strong>ta<strong>in</strong>ed at 22°C for<br />

30 m<strong>in</strong> <strong>in</strong> order to study <strong>the</strong> stability of <strong>the</strong> virion particles. Electron tomography<br />

of <strong>in</strong>tact, negatively sta<strong>in</strong>ed virions was performed as described previously (10,<br />

26). Visualization of <strong>the</strong> three-dimensional (3D) data was performed us<strong>in</strong>g<br />

Amira software (Visage Imag<strong>in</strong>g, Fürth, Germany).<br />

Prote<strong>in</strong> analyses. Prote<strong>in</strong>s of SRV were separated <strong>in</strong> 13.5% SDS–polyacrylamide<br />

gels (14) and sta<strong>in</strong>ed with Coomassie brilliant blue R-250 (Serva,<br />

Heidelberg, Germany). N-term<strong>in</strong>al prote<strong>in</strong> sequences were determ<strong>in</strong>ed by<br />

Edman degradation us<strong>in</strong>g a Procise 492 prote<strong>in</strong> sequencer (Applied Bio<strong>system</strong>s,<br />

Foster City, CA).<br />

SIRV2 prote<strong>in</strong>s were separated <strong>in</strong> 4 to 12% SDS–polyacrylamide NuPAGE<br />

gradient gels by <strong>the</strong> use of MES (morphol<strong>in</strong>eethanesulfonic acid) buffer (both<br />

from Invitrogen, Paisley, United K<strong>in</strong>gdom). The gels were sta<strong>in</strong>ed with Sypro<br />

Ruby (Invitrogen). Prote<strong>in</strong> bands were analyzed by peptide mass f<strong>in</strong>gerpr<strong>in</strong>t<strong>in</strong>g<br />

with matrix-assisted laser desorption ionization–time of flight mass spectrometry<br />

us<strong>in</strong>g a Voyager DE-STR biospectrometry workstation (Applied Bio<strong>system</strong>s,<br />

Fram<strong>in</strong>gham, MA) as described earlier (26). The analysis was performed <strong>in</strong><br />

conjunction with <strong>the</strong> proteomic platform at <strong>the</strong> Pasteur Institute.<br />

Clon<strong>in</strong>g and heterologous expression of ARV1-ORF134b and purification of<br />

<strong>the</strong> recomb<strong>in</strong>ant prote<strong>in</strong> and its self-assembly. ARV1-ORF134b was amplified<br />

from purified viral DNA with primers ARV1ORF134F (GGAATTCCATATG<br />

ATGGCGAAAGGACACACACC) and ARV1ORF134R (GGAATTCTCGA<br />

GACTTACGTATCCGTTAGGAC). The PCR product was purified (PCR purification<br />

kit; Roche, Mannheim, Germany) and cloned <strong>in</strong>to pET30a expression<br />

vector (Novagen, Madison, WI) between restriction sites for EcoRI and XbaI.<br />

The prote<strong>in</strong> was expressed overnight at 20°C <strong>in</strong> <strong>the</strong> Escherichia coli<br />

Rosetta(DE3)pLysS stra<strong>in</strong>. Prote<strong>in</strong> expression was controlled by SDS-polyacrylamide<br />

gel electrophoresis analysis and by perform<strong>in</strong>g a Western blot analysis<br />

us<strong>in</strong>g anti-His-tag-specific antibodies (Novagen). The native prote<strong>in</strong> was purified<br />

on a Ni 2 -nitrilotriacetic acid (Ni 2 -NTA)-agarose column (Novagen) with elution<br />

buffers conta<strong>in</strong><strong>in</strong>g 50 to 500 mM imidazole. The accuracy of its sequence was<br />

confirmed. Self-assembly of <strong>the</strong> recomb<strong>in</strong>ant prote<strong>in</strong> <strong>in</strong>to filamentous structures<br />

was performed at 75°C and pH 3.5 and observed by electron microscopy.<br />

Preparation of cellular and viral DNA and DNA sequenc<strong>in</strong>g. DNA was extracted<br />

from Stygiolobus azoricus cells as described previously (2), and <strong>the</strong> 16S<br />

rRNA gene was amplified by PCR us<strong>in</strong>g primers 8aF and 1512 uR (6) and<br />

sequenced.<br />

Viral DNA was obta<strong>in</strong>ed by disrupt<strong>in</strong>g SRV particles with 1% SDS for 1hat<br />

room temperature and extraction with phenol-chloroform (9). A shotgun library<br />

was prepared by sonicat<strong>in</strong>g viral DNA to generate fragments of 2 to 4 kb and<br />

clon<strong>in</strong>g <strong>the</strong>se <strong>in</strong>to <strong>the</strong> SmaI site of <strong>the</strong> pUC18 vector. DNA was purified from<br />

s<strong>in</strong>gle colonies by <strong>the</strong> use of a Biorobot 8000 workstation (Qiagen, Westburg,<br />

Germany) and sequenced <strong>in</strong> MegaBACE 1000 sequenators (Amersham Biotech,<br />

Amersham, United K<strong>in</strong>gdom). The viral sequence was assembled us<strong>in</strong>g Sequencher<br />

4.2 software (Gene Code, Ann Arbor, MI). PCR primers for gap<br />

clos<strong>in</strong>g and resolv<strong>in</strong>g sequence ambiguities were designed us<strong>in</strong>g Primers for Mac,<br />

version 1.0. Sequence alignments were obta<strong>in</strong>ed us<strong>in</strong>g MUSCLE software (7).<br />

Open read<strong>in</strong>g frames (ORFs) were def<strong>in</strong>ed with <strong>the</strong> help of ARTEMIS software<br />

(30) and <strong>in</strong>vestigated <strong>in</strong> searches us<strong>in</strong>g <strong>the</strong> EMBL and GenBank (1), 3D-Jury (8),<br />

and SMART (15) databases. Genome maps were generated and compared us<strong>in</strong>g<br />

Mutagen software, version 4.0 (5).<br />

Bio<strong>in</strong>formatical match<strong>in</strong>g of crenarchaeal <strong>CRISPR</strong> spacers to rudiviral genomes.<br />

<strong>CRISPR</strong>s were predicted for each of <strong>the</strong> 14 publicly available crenarchaeal<br />

genomes <strong>in</strong> GenBank (NC_000854 [Aeropyrum pernix K1], NC_002754<br />

[Sulfolobus solfataricus P2], NC_003106 [Sulfolobus tokodaii stra<strong>in</strong> 7],<br />

NC_003364 [Pyrobaculum aerophilum stra<strong>in</strong> IM2], NC_007181 [Sulfolobus acidocaldarius<br />

DSM 639], NC_008698 [Thermofilum pendens Hrk5], NC_008701<br />

[Pyrobaculum islandicum DSM 4184], NC_008818 [Hyper<strong>the</strong>rmus butylicus DSM<br />

5456], NC_009033 [Staphylo<strong>the</strong>rmus mar<strong>in</strong>us F1], NC_009073 [Pyrobaculum<br />

calidifontis JCM 11548], NC_009376 [Pyrobaculum arsenaticum DSM 13514],<br />

NC_009440 [Metallosphaera sedula DSM 5348], NC_009676 [Cenarchaeum symbiosum],<br />

and NC_009776 [Ignicoccus hospitalis KIN4/I]). In addition, <strong>the</strong> six<br />

sequenced repeat clusters from Sulfolobus solfataricus P1 (16) were added to <strong>the</strong><br />

data set as well as <strong>CRISPR</strong>s from five <strong>in</strong>complete Sulfolobus islandicus genomes<br />

publicly available through <strong>the</strong> Jo<strong>in</strong>t Genome Institute (http://genome.jgi.doe.gov<br />

/mic_asmb.html) and unpublished genome sequences of Sulfolobus islandicus<br />

HVE10/4 and Acidianus brierleyi from <strong>the</strong> Copenhagen laboratory. The repeat<br />

cluster sequences were found us<strong>in</strong>g publicly available software (3, 7).<br />

All predictions were curated manually. The orientation of each repeat cluster<br />

was <strong>in</strong>ferred from <strong>the</strong> repeat sequence and by locat<strong>in</strong>g <strong>the</strong> low-complexity flank<strong>in</strong>g<br />

sequence that generally resides immediately upstream from <strong>the</strong> cluster and<br />

conta<strong>in</strong>s <strong>the</strong> transcriptional leader (16). All unique spacer sequences of <strong>the</strong><br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6839<br />

FIG. 1. Electron micrographs of SRV virions negatively sta<strong>in</strong>ed with 3% uranyl acetate. (A) A full virion particle, with a discont<strong>in</strong>uous central<br />

l<strong>in</strong>e along <strong>the</strong> virion. (B) Six virions attached to liposome-like structures. (C) Enlargement of a portion of panel B display<strong>in</strong>g <strong>the</strong> term<strong>in</strong>al fibers.<br />

(D to H) Electron tomography images of an SRV virion. (D) Horizontal x-y slice (0.7 nm) show<strong>in</strong>g <strong>the</strong> accumulated sta<strong>in</strong> <strong>in</strong> <strong>the</strong> central part of<br />

<strong>the</strong> virion (white arrow). (E) Vertical y-z slice (0.7 nm) through <strong>the</strong> 3D data set of <strong>the</strong> reconstructed part of an SRV particle. (F) Visualization<br />

of <strong>the</strong> 3D data set us<strong>in</strong>g Amira software. (G and H) Vertical x-z slice (0.7 nm) through <strong>the</strong> tomogram show<strong>in</strong>g that <strong>the</strong> virion particles are<br />

embedded <strong>in</strong> negative sta<strong>in</strong> and that accumulated sta<strong>in</strong> visible <strong>in</strong> panel D is absent from <strong>the</strong> plug (black arrows). Bars, 200 nm (A and B); 50 nm<br />

(C); 20 nm (D, E, G, and H).<br />

repeat clusters, correspond<strong>in</strong>g to <strong>the</strong> processed spacer transcript sequence (16),<br />

were aligned to <strong>the</strong> complete nucleotide sequences on each strand of all four<br />

rudiviral genomes (SRV [accession no. FM164764], SIRV1 [AJ414696], SIRV2<br />

[AJ344259], and ARV1 [AJ875026]) by use of Paralign, an MMX-optimized<br />

implementation of <strong>the</strong> Smith-Watermann algorithm (31). Moreover, assum<strong>in</strong>g<br />

that <strong>the</strong> spacer DNA can be <strong>in</strong>corporated <strong>in</strong>to <strong>the</strong> oriented <strong>CRISPR</strong>s <strong>in</strong> ei<strong>the</strong>r<br />

direction, we also translated <strong>the</strong> two strands of <strong>the</strong> spacer DNA <strong>in</strong>to all <strong>the</strong><br />

read<strong>in</strong>g frames, yield<strong>in</strong>g six am<strong>in</strong>o acid sequences per spacer. Read<strong>in</strong>g frames<br />

conta<strong>in</strong><strong>in</strong>g stop codons (ca. 50%) were omitted to make <strong>the</strong> subsequent search<br />

more specific. Each translation was aligned aga<strong>in</strong>st <strong>the</strong> am<strong>in</strong>o acid sequences of<br />

all <strong>the</strong> annotated ORFs <strong>in</strong> each of <strong>the</strong> four rudiviral genomes. Significant e-value<br />

cutoffs were determ<strong>in</strong>ed for both <strong>the</strong> nucleotide and am<strong>in</strong>o acid sequence<br />

searches us<strong>in</strong>g <strong>the</strong> genome sequence of Saccharomyces cerevisiae as a negative<br />

control (data not shown).<br />

RESULTS<br />

SRV isolation and structure. The virus-produc<strong>in</strong>g stra<strong>in</strong> was<br />

colony purified from an enrichment culture established from a<br />

sample collected from an acidic hot spr<strong>in</strong>g <strong>in</strong> <strong>the</strong> Azores (see<br />

Rudivirus Orig<strong>in</strong><br />

Virion<br />

length (nm)<br />

TABLE 1. Properties of <strong>the</strong> rudiviruses<br />

Genome size<br />

(bp)<br />

Materials and Methods). Its 16S rRNA sequence represented<br />

<strong>the</strong> genus Stygiolobus of <strong>the</strong> Sulfolobales crenarchaeal order<br />

and was closely related to that of Stygiolobus azoricus. However,<br />

it differs from S. azoricus, <strong>the</strong> type species of <strong>the</strong> genus, <strong>in</strong><br />

its capacity to grow aerobically, and a description of <strong>the</strong> new<br />

species is <strong>in</strong> preparation. The virus particles produced constituted<br />

flexible rods 702 ( 50) by 22 ( 3) nm <strong>in</strong> size, with three<br />

short fibers at each term<strong>in</strong>us (Fig. 1A to C; Table 1). A Fourier<br />

analysis of <strong>the</strong> virion (not shown) revealed <strong>the</strong> presence of<br />

regular features with a periodicity of (4.2 nm) 1 , which probably<br />

reflect a helical subunit arrangement. This feature is also<br />

seen <strong>in</strong> <strong>the</strong> tomographic data set (Fig. 1D to H), which revealed<br />

more structural details. The helical arrangement <strong>in</strong> <strong>the</strong><br />

virion core occurs <strong>in</strong> two different configurations. In <strong>the</strong> central<br />

region, a zigzag structure with dark contrast, probably aris<strong>in</strong>g<br />

from uranyl acetate sta<strong>in</strong><strong>in</strong>g, is surrounded by a prote<strong>in</strong> shell<br />

(Fig. 1D and E). In contrast, <strong>in</strong> <strong>the</strong> term<strong>in</strong>al plug, which is<br />

Total no. of<br />

ORFs<br />

GC (%)<br />

ITR length<br />

(bp)<br />

SRV Azores 702 28,097 37 29.3 1,030<br />

ARV1 Pozzuoli 610 24,655 41 39.1 1,365 34<br />

SIRV1 Iceland 830 32,308 45 25.3 2,032 20<br />

SIRV2 Iceland 900 35,498 54 25.2 1,626 20<br />

Reference<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


6840 VESTERGAARD ET AL. J. BACTERIOL.<br />

FIG. 2. Electron micrograph of a portion of an SRV virion after<br />

treatment with 0.1% SDS for 30 m<strong>in</strong> (see Materials and Methods).<br />

White arrows <strong>in</strong>dicate DNA or DNA-prote<strong>in</strong> fibers lack<strong>in</strong>g <strong>the</strong> prote<strong>in</strong><br />

core. Bar, 100 nm.<br />

about 50 nm <strong>in</strong> length, a helically arranged prote<strong>in</strong> mass, with<br />

no obvious uranyl acetate <strong>in</strong>clusions, is seen (Fig. 1D to F).<br />

The three term<strong>in</strong>al fibers, anchored <strong>in</strong> <strong>the</strong> plug-like structure,<br />

appear to be built up of multiple subunits ordered <strong>in</strong> a l<strong>in</strong>ear<br />

array (Fig. 1D). The side view of <strong>the</strong> reconstructed virion<br />

particle (Fig. 1E), as well as cross-sections of <strong>the</strong> negatively<br />

sta<strong>in</strong>ed virions obta<strong>in</strong>ed from <strong>the</strong> tomograms (Fig. 1G and H),<br />

shows that <strong>the</strong> virion particles are embedded <strong>in</strong> negative sta<strong>in</strong><br />

(Fig. 1G and H) and partially collapsed due to sta<strong>in</strong><strong>in</strong>g and air<br />

dry<strong>in</strong>g; <strong>the</strong> height of <strong>the</strong> particles was about half of <strong>the</strong> apparent<br />

diameter. Never<strong>the</strong>less, <strong>the</strong> accumulated central sta<strong>in</strong> is<br />

clearly visible <strong>in</strong> <strong>the</strong> cross-section (Fig. 1G) of <strong>the</strong> central part<br />

of <strong>the</strong> virion (Fig. 1D), while this feature is absent from <strong>the</strong><br />

plug (Fig. 1G and H). The rod-shaped morphology of SRV,<br />

with a regular helical core and tail fibers, is characteristic of<br />

rudiviruses.<br />

To <strong>in</strong>vestigate fur<strong>the</strong>r <strong>the</strong> f<strong>in</strong>e structure of <strong>the</strong> virion, virion<br />

particles were <strong>in</strong>cubated <strong>in</strong> buffer conta<strong>in</strong><strong>in</strong>g 0.1% SDS for 30<br />

m<strong>in</strong> at 22°C. Most of <strong>the</strong> virion rema<strong>in</strong>ed undisturbed, with<br />

<strong>the</strong> particles show<strong>in</strong>g <strong>the</strong> same diameter as native virions and <strong>the</strong><br />

densely sta<strong>in</strong>ed, helical core. However, <strong>in</strong> local regions <strong>the</strong><br />

prote<strong>in</strong> shell had dissociated (Fig. 2) and a f<strong>in</strong>e fiber with a<br />

diameter of 3 to 4 nm that constituted ei<strong>the</strong>r naked DNA or a<br />

DNA-prote<strong>in</strong> complex was visible.<br />

Self-assembly of <strong>the</strong> major coat prote<strong>in</strong>. The major rudiviral<br />

structural prote<strong>in</strong> is highly conserved <strong>in</strong> sequence and is glycosylated<br />

(20, 22, 32a, 34). In order to study its possible selfassembly<br />

properties, <strong>the</strong> ARV1 prote<strong>in</strong> (ORF134b [34]) was<br />

expressed heterologously <strong>in</strong> E. coli (see Materials and Methods)<br />

and a His-tagged prote<strong>in</strong> was purified to homogeneity on<br />

an Ni 2 -NTA-agarose column. The prote<strong>in</strong> was shown by<br />

transmission electron microscopy to self-assemble to produce<br />

filamentous structures of uniform widths and different lengths<br />

(Fig. 3). The optimal conditions for <strong>the</strong> assembly, 75°C and pH<br />

3, were close to those of <strong>the</strong> natural environment, and no<br />

additional energy source was required for this process.<br />

The transmission electron microscopy analysis revealed that <strong>the</strong><br />

filaments had structural parameters similar to those of <strong>the</strong><br />

native virions, with a diameter of 21 ( 3) nm and a periodicity<br />

of (4.2 nm) 1 . Thus, <strong>the</strong> data suggest that <strong>the</strong> s<strong>in</strong>gle major coat<br />

prote<strong>in</strong> alone can generate <strong>the</strong> body of <strong>the</strong> virion.<br />

M<strong>in</strong>or rudiviral virion prote<strong>in</strong>s. To date, <strong>the</strong> major coat<br />

prote<strong>in</strong> is <strong>the</strong> only rudiviral structural prote<strong>in</strong> to have been<br />

characterized. Given <strong>the</strong> closely similar structures of <strong>the</strong> different<br />

rudiviruses, we attempted to identify m<strong>in</strong>or structural<br />

prote<strong>in</strong>s for <strong>the</strong> SIRV2 virus, which can be produced <strong>in</strong> high<br />

yields. Prote<strong>in</strong> components of SIRV2 virions, separated on a<br />

polyacrylamide gel, yielded six dist<strong>in</strong>ct major bands (Fig. 4),<br />

and all except D2, which is <strong>the</strong> strongest band and corresponds<br />

to ORF134 (gp26), were analyzed by mass spectrometry. Their<br />

identities were as follows: band A conta<strong>in</strong>ed ORF1070 (gp38),<br />

band B conta<strong>in</strong>ed ORF488 (gp33), and band C conta<strong>in</strong>ed<br />

ORF564 (gp39), while bands D1 and D3 both conta<strong>in</strong>ed<br />

ORF134 (gp26), probably <strong>in</strong> a glycosylated or, <strong>in</strong> <strong>the</strong> case of<br />

D3, a proteolytically degraded form. Thus, three additional<br />

SIRV2 structural prote<strong>in</strong>s were identified, each highly conserved<br />

<strong>in</strong> sequence <strong>in</strong> all rudiviruses (Table 2).<br />

SRV genome content. A shotgun library of <strong>the</strong> viral genome<br />

was prepared, sequenced, and assembled (see Materials and<br />

Methods) to yield an approximately 10-fold coverage of a<br />

26-kb contig. S<strong>in</strong>ce 1 to 2 kb of term<strong>in</strong>al sequence is always<br />

absent from shotgun libraries of l<strong>in</strong>ear viral genomes, <strong>the</strong>se<br />

additional sequences were generated by primer walk<strong>in</strong>g us<strong>in</strong>g<br />

viral DNA, or us<strong>in</strong>g PCR products obta<strong>in</strong>ed <strong>the</strong>refrom, until<br />

subsequent rounds of walk<strong>in</strong>g yielded no fur<strong>the</strong>r sequence.<br />

FIG. 3. Electron micrograph images of <strong>the</strong> self-assembled major coat prote<strong>in</strong> of ORF134 from ARV1 after negative sta<strong>in</strong><strong>in</strong>g with 3% uranyl<br />

acetate. Bar, 100 nm.<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6841<br />

FIG. 4. SIRV2 virion prote<strong>in</strong>s separated by SDS-polyacrylamide<br />

gel electrophoresis and sta<strong>in</strong>ed with Sypro Ruby. Molecular masses of<br />

prote<strong>in</strong> standards are <strong>in</strong>dicated <strong>in</strong> kilodaltons on <strong>the</strong> left.<br />

The total sequence obta<strong>in</strong>ed was 28,096 bp, with a GC content<br />

of 29% and an ITR of about 1,030 bp (Table 1). An EcoRI<br />

restriction digest yielded fragments consistent with <strong>the</strong> genome<br />

size (data not shown).<br />

Thirty-seven ORFs were predicted for which start codons<br />

were assigned on <strong>the</strong> basis of <strong>the</strong> upstream locations of TATAlike<br />

and transcription factor B-responsive element (BRE) promoter<br />

motifs and/or Sh<strong>in</strong>e-Dalgarno motifs. Details of <strong>the</strong><br />

putative genes and operon structures are presented <strong>in</strong> Table S1<br />

<strong>in</strong> <strong>the</strong> supplemental material, and a comparative genome map<br />

of SRV and rudiviruses SIRV1 and ARV1 is presented <strong>in</strong> Fig.<br />

5; <strong>the</strong> genome map of SIRV2, which is closely similar to that of<br />

SIRV1, is not <strong>in</strong>cluded (12, 20). SRV differs from <strong>the</strong> o<strong>the</strong>r<br />

rudiviruses <strong>in</strong> that fewer ORFs are organized <strong>in</strong> operons, and<br />

it has a lower level of gene order conservation (Fig. 5). Moreover,<br />

whereas for <strong>the</strong> o<strong>the</strong>r rudiviruses TATA-like motifs are<br />

often directly preceded by a conserved GTC triplet (12, 20, 34),<br />

<strong>in</strong> SRV <strong>the</strong> ensu<strong>in</strong>g triplet sequence was GTA for 10 of <strong>the</strong> 30<br />

putative TATA-like motifs (see Table S1 <strong>in</strong> <strong>the</strong> supplemental<br />

material).<br />

Homologs of 17 SRV ORFs are present <strong>in</strong> all rudiviruses,<br />

and a fur<strong>the</strong>r 10 SRV ORFs are conserved <strong>in</strong> some rudiviruses.<br />

Each virus type carries a few genes which are unique, and <strong>the</strong>se<br />

are generally clustered near <strong>the</strong> ends of <strong>the</strong> l<strong>in</strong>ear genomes and<br />

yield no matches to genes <strong>in</strong> public sequence databases. In<br />

SRV, <strong>the</strong>se are ORF145, -116a, -109, -59, -108, -97b, and -92<br />

(left to right <strong>in</strong> Fig. 5). Although for SIRV1 and SIRV2 some<br />

of <strong>the</strong>se nonconserved ORFs have been shown to be transcribed<br />

(12), fur<strong>the</strong>r work is necessary to establish whe<strong>the</strong>r<br />

<strong>the</strong>y are all prote<strong>in</strong>-cod<strong>in</strong>g genes. Some of <strong>the</strong> prote<strong>in</strong>s carry<br />

predicted structural motifs, and putative functions could be<br />

assigned to some of <strong>the</strong> conserved ORFs on <strong>the</strong> basis of public<br />

database searches; most of <strong>the</strong>se are encoded <strong>in</strong> o<strong>the</strong>r crenarchaeal<br />

genomes (Table 2).<br />

The host-encoded transcriptional regulator Sta1, a w<strong>in</strong>ged<br />

helix-turn-helix prote<strong>in</strong>, was shown to b<strong>in</strong>d to some SIRV1<br />

promoters, <strong>in</strong>clud<strong>in</strong>g those of ORF134 and ORF399, and to<br />

enhance <strong>the</strong>ir transcription (13). A similar regulation may occur<br />

also for SRV, s<strong>in</strong>ce <strong>the</strong> promoter regions of <strong>the</strong> homologs<br />

of ORF134 and ORF399 conta<strong>in</strong> putative Sta1 b<strong>in</strong>d<strong>in</strong>g sites. In<br />

contrast, <strong>in</strong> ARV1 only <strong>the</strong> ORF134 homolog is present <strong>in</strong> an<br />

operon for which <strong>the</strong> first ORF is a putative transcriptional<br />

regulator, and its promoter does not carry Sta1 b<strong>in</strong>d<strong>in</strong>g motifs.<br />

Genomic features. Sequence heterogeneities and o<strong>the</strong>r exceptional<br />

properties were detected <strong>in</strong> <strong>the</strong> SRV genome and <strong>in</strong><br />

o<strong>the</strong>r rudiviral genomes that are described below.<br />

(i) ITRs. For SRV, <strong>the</strong> 1,030-bp ITR is perfect, except for a<br />

36-bp <strong>in</strong>sert at positions 799 to 834 at <strong>the</strong> left end and <strong>in</strong>verted<br />

tetramer sequences (AAAA [positions 425 to 428] and TTTT<br />

[positions 27672 to 27669]). It shows little sequence similarity<br />

to ITRs of <strong>the</strong> o<strong>the</strong>r rudiviruses, except for <strong>the</strong> 21-bp sequence<br />

(AATTTAGGAATTTAGGAATTT) located at <strong>the</strong> term<strong>in</strong>us<br />

that is predicted to be a Holliday junction resolvase b<strong>in</strong>d<strong>in</strong>g<br />

site occurr<strong>in</strong>g <strong>in</strong> all sequenced rudiviruses (34). The ITRs of<br />

SRV and SIRV1 and -2 carry four to five degenerate copies of<br />

this direct sequence repeat, while that of ARV1 carries multiple<br />

degenerate copies of o<strong>the</strong>r diverse repeats of similar<br />

sizes.<br />

(ii) Genome heterogeneity <strong>in</strong> SRV and 12-bp <strong>in</strong>dels. Sequence<br />

heterogeneities were detected <strong>in</strong> <strong>the</strong> SRV genome,<br />

with<strong>in</strong> <strong>the</strong> 10-fold sequence coverage, and mutations were localized<br />

to groups of subpopulations, <strong>in</strong>clud<strong>in</strong>g one 180-bp deletion<br />

between positions 11896 and 12077 <strong>in</strong> two out of six<br />

clones. Moreover, a 48-bp <strong>in</strong>sertion was observed <strong>in</strong> one variant<br />

(out of 18 clones) precisely at <strong>the</strong> C term<strong>in</strong>us of ORF533<br />

(position 20285) that generated a third copy of a 16-am<strong>in</strong>o-acid<br />

direct repeat. Some changes correspond<strong>in</strong>g to 12-bp <strong>in</strong>dels<br />

were also apparent <strong>in</strong> overlapp<strong>in</strong>g clones, and <strong>the</strong>y are <strong>in</strong>dicated<br />

<strong>in</strong> Table 3 toge<strong>the</strong>r with those observed earlier for<br />

SIRV1 (4, 20, 21). Moreover, sequence comparison of highly<br />

conserved ORFs present <strong>in</strong> <strong>the</strong> four rudiviral genomes revealed<br />

several additional 12-bp <strong>in</strong>dels. The locations of all <strong>the</strong><br />

identified <strong>in</strong>dels which occur <strong>in</strong> conserved rudiviral genes or<br />

sites correspond<strong>in</strong>g to SRV ORF75, -104, -138, -163, -168,<br />

-197, -199, -286, -294, -419, -440, -464, -533, and -1059 (Fig. 5)<br />

are <strong>in</strong>dicated <strong>in</strong> <strong>the</strong> SIRV1 genome map <strong>in</strong> Fig. 6.<br />

Rudiviral matches to <strong>CRISPR</strong>s. The availability of four separate<br />

rudiviral genome sequences provided a basis for analyz<strong>in</strong>g<br />

<strong>the</strong> frequency and distribution of <strong>the</strong> matches of <strong>CRISPR</strong><br />

spacer sequences to <strong>the</strong> viral genomes. Therefore, we analyzed<br />

<strong>the</strong> repeat clusters of each of <strong>the</strong> available crenarchaeal genomes<br />

<strong>in</strong> <strong>the</strong> public EMBL/GenBank and JGI sequence data-<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


6842 VESTERGAARD ET AL. J. BACTERIOL.<br />

SRV ORF category<br />

Rudiviral<br />

homolog(s)<br />

Structural prote<strong>in</strong>s<br />

ORF134 All Structural prote<strong>in</strong><br />

ORF464 All Structural prote<strong>in</strong><br />

ORF581 All Structural prote<strong>in</strong><br />

ORF1059 All Structural prote<strong>in</strong><br />

bases and <strong>in</strong> our own unpublished genomes (see Materials and<br />

Methods). Fourteen complete genomes and 8 partial genomes<br />

were analyzed. In total, 82 repeat clusters from complete genomes<br />

and 44 clusters, some <strong>in</strong>complete, from partial genomes<br />

TABLE 2. Rudiviral prote<strong>in</strong>s with predicted functions<br />

Predicted function or description Analysis tool E-value or score<br />

Transcriptional regulators<br />

ORF58 All RHH-1 SMART 2.0e-08 Many<br />

ORF95 None “W<strong>in</strong>ged helix” repressor DNA<br />

b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong><br />

3D-Jury 64.00 None<br />

O<strong>the</strong>r crenarchaeal<br />

virus(es)<br />

Translational regulator<br />

ORF294 SIRV1 and -2 tRNA-guan<strong>in</strong>e transglycosylase 3D-Jury 167.57 STSV1<br />

DNA replication<br />

ORF440 All RuvB Holliday junction helicase<br />

(Lon ATPase)<br />

3D-Jury 53.71 AFV1, AFV2<br />

ORF116c All Holliday junction resolvase<br />

(archaeal)<br />

SMART 2.4e-45<br />

ORF199 All Nuclease 3D-Jury 63.86 AFV1, SIFV<br />

DNA metabolism<br />

ORF168 SIRV1 and -2 dUTPase SMART 1.5e-12 STSV1<br />

ORF257 ARV1 Thymidylate synthase (Thy1) SMART 7.9e-46 STSV1<br />

ORF159 All S-adenosylmethion<strong>in</strong>e-dependent<br />

methyltransferase<br />

3D-Jury 73.67 SIFV<br />

Glycosylation<br />

ORF335 All Glycosyl transferase group 1 SMART 6.7e-09<br />

ORF355 All Glycosyl transferase SMART 5.1e-04<br />

O<strong>the</strong>r<br />

ORF419 SIRV1 and -2 11 transmembrane regions TMHMM<br />

yielded 4,283 spacer sequences. Subsequently, 278 sequences<br />

that are shared between S. solfataricus stra<strong>in</strong>s P1 and P2 (16)<br />

were omitted from <strong>the</strong> data set, yield<strong>in</strong>g a total of 4,005 spacer<br />

sequences.<br />

FIG. 5. Genome maps of SRV, SIRV1, and ARV1 show<strong>in</strong>g <strong>the</strong> predicted ORFs and <strong>the</strong> ITRs (bold l<strong>in</strong>es). SRV ORFs are identified by <strong>the</strong>ir<br />

am<strong>in</strong>o acid lengths. Homologous genes shared between <strong>the</strong> rudiviruses are color-coded. Genes above <strong>the</strong> horizontal l<strong>in</strong>e are transcribed from left<br />

to right, and those below <strong>the</strong> l<strong>in</strong>e are transcribed <strong>in</strong> <strong>the</strong> opposite direction. Predicted functions or structural characteristics of <strong>the</strong> gene products<br />

are <strong>in</strong>dicated as follows: sp, structural prote<strong>in</strong>; rhh, ribbon-helix-helix prote<strong>in</strong>; wh, w<strong>in</strong>ged helix prote<strong>in</strong>; tm, transmembrane; tgt, tRNA guan<strong>in</strong>e<br />

transglycosylase; hjh; Holliday junction helicase; hjr, Holliday junction resolvase; n, nuclease; du, dUTPase; ts, thymidylate synthase; sm,<br />

S-adenosylmethion<strong>in</strong>e-dependent methyltransferase; gt, glycosyl transferase.<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6843<br />

ORF or ITR<br />

TABLE 3. Occurrence of <strong>the</strong> 12-bp <strong>in</strong>dels <strong>in</strong> overlapp<strong>in</strong>g rudiviral clone libraries<br />

No. of 12-bp<br />

clones<br />

In <strong>the</strong> first analysis, each of <strong>the</strong> 4,005 spacer sequences was<br />

compared to <strong>the</strong> four rudiviral genomes at <strong>the</strong> nucleotide level.<br />

In total, 158 spacers yielded 268 rudiviral matches. The latter<br />

number exceeds <strong>the</strong> former because (i) some spacers match to<br />

more than one locus with<strong>in</strong> repeat sequences of a given virus<br />

and (ii) some spacers match to more than one virus. Second,<br />

<strong>the</strong> analysis was performed at <strong>the</strong> prote<strong>in</strong> level (see Materials<br />

and Methods). This analysis revealed 148 additional match<strong>in</strong>g<br />

spacers and a fur<strong>the</strong>r 427 rudiviral genome matches exclusively<br />

at <strong>the</strong> prote<strong>in</strong> level. (An additional 105 match<strong>in</strong>g spacer sequences<br />

from <strong>the</strong> latter analysis that overlapped, partially or<br />

completely, with 158 of those detected with<strong>in</strong> rudiviral ORFs<br />

at <strong>the</strong> nucleotide level were not counted.) Only 6 of <strong>the</strong> 14<br />

completed crenarchaeal genomes carried spacers yield<strong>in</strong>g<br />

matches to rudiviral genomes, and <strong>the</strong>y are listed, toge<strong>the</strong>r<br />

with <strong>the</strong> results for <strong>the</strong> partial genomes, <strong>in</strong> Table 4. These<br />

results re<strong>in</strong>forced <strong>the</strong> choice of criteria employed for determ<strong>in</strong><strong>in</strong>g<br />

<strong>the</strong> significance of sequence matches (see Materials<br />

and Methods).<br />

The locations of <strong>the</strong> spacer sequence matches are superimposed<br />

on <strong>the</strong> genome map of SIRV1 <strong>in</strong> Fig. 6. The matches are<br />

not evenly distributed along <strong>the</strong> genome; some genes have no<br />

matches, while o<strong>the</strong>rs carry up to 18. Although <strong>the</strong>re is no strict<br />

correlation between <strong>the</strong> level of gene sequence conservation<br />

and <strong>the</strong> number of match<strong>in</strong>g spacers, <strong>the</strong> five most conserved<br />

genes, ORF440, ORF1059, ORF134, ORF355, and ORF581,<br />

exhibit <strong>the</strong> highest number of matches (18, 15, 14, 14, and 13,<br />

respectively) (Fig. 3 and 6).<br />

DISCUSSION<br />

No. of 12-bp<br />

clones<br />

The morphological and genomic data for SRV and <strong>the</strong> o<strong>the</strong>r<br />

characterized rudiviruses are summarized <strong>in</strong> Table 1. The conservation<br />

of <strong>the</strong>ir morphologies and genomic properties contrasts<br />

with that of o<strong>the</strong>r crenarchaeal viruses and, <strong>in</strong> particular,<br />

Sequence<br />

with that of <strong>the</strong> filamentous lipothrixviruses, which exhibit a<br />

variety of surface, envelope, and tail structures and much more<br />

heterogeneous genomes (24).<br />

The virion length of SRV, 702 ( 50) nm, shows <strong>the</strong> same<br />

direct proportionality to genome size (28 kb) as those for <strong>the</strong><br />

o<strong>the</strong>r rudivirus virions, which range <strong>in</strong> length from 610 ( 50)<br />

nm (ARV1) to 900 ( 50) nm (SIRV2) (Table 1). A superhelical<br />

core, with a pitch of 4.3 nm and a width of 20 nm,<br />

term<strong>in</strong>ates <strong>in</strong> 45-nm-long nonhelical “plugs,” and it correlates<br />

with <strong>the</strong> <strong>in</strong>ternal structure observed earlier <strong>in</strong> electron micrographs<br />

of SIRV1 (22). In order to determ<strong>in</strong>e whe<strong>the</strong>r a s<strong>in</strong>gle<br />

superhelical DNA can span <strong>the</strong> SRV virion length, we applied<br />

<strong>the</strong> follow<strong>in</strong>g formulae to estimate <strong>the</strong> sizes and length of <strong>the</strong><br />

superhelical DNA:<br />

L turn p 2 c 2<br />

where L turn represents <strong>the</strong> arc length of a turn, p represents <strong>the</strong><br />

pitch, and c represents <strong>the</strong> cyl<strong>in</strong>der circumference, and<br />

L total t L turn<br />

Genome position<br />

(reference)<br />

SRV ORF58 5 1 AATTAAATTATG 26079–26068<br />

SRV ORF95 8 8 TTTTGAATTATG 7112–7101<br />

SIRV1 ORF335 7 3 AACATTCATTAA Variant (21)<br />

SIRV1 ORF562 1 4 ATACAAATTTCA Variant (21)<br />

SIRV1-ITR 10 29 TTTAGCAGTTCA (20)<br />

where t represents <strong>the</strong> number of turns and L total represents<br />

<strong>the</strong> arc length of entire helix.<br />

Calculations us<strong>in</strong>g structural parameters for B-form DNA<br />

yielded a genome size of 26 kbp without, and 30 kbp with,<br />

term<strong>in</strong>al “plugs”. The estimated width (20 nm) is an upperlimit<br />

estimate. A reciprocal calculation, with a 28-kbp genome,<br />

yields a diameter of 21.2 nm without, and 18.5 nm with, <strong>the</strong><br />

“plugs.” Given that <strong>the</strong> major rudiviral coat prote<strong>in</strong> is capable<br />

of self-assembly <strong>in</strong>to filamentous structures similar <strong>in</strong> width to<br />

<strong>the</strong> native virion (Fig. 4), it is likely that <strong>the</strong> rod-shaped body<br />

consists of a s<strong>in</strong>gle superhelical DNA embedded with<strong>in</strong> this<br />

filamentous prote<strong>in</strong> structure. Thus, <strong>the</strong> three newly identified<br />

m<strong>in</strong>or structural prote<strong>in</strong>s probably contribute to conserved<br />

term<strong>in</strong>al features of <strong>the</strong> virion; consistent with this, <strong>the</strong> largest<br />

FIG. 6. <strong>CRISPR</strong> spacer sequence matches for SIRV1 are superimposed on <strong>the</strong> SIRV1 genome map. Prote<strong>in</strong>-cod<strong>in</strong>g regions translated from<br />

left to right are shown above <strong>the</strong> l<strong>in</strong>e, and those translated from right to left are shown below <strong>the</strong> l<strong>in</strong>e. Highly conserved cod<strong>in</strong>g genes are presented<br />

<strong>in</strong> dark blue, while less-conserved or nonconserved genes are <strong>in</strong> light blue. The <strong>in</strong>verted term<strong>in</strong>al repeat is shaded <strong>in</strong> violet. Matches to spacers<br />

are shown as vertical l<strong>in</strong>es and are color-coded as <strong>in</strong>dicated. Matches to <strong>the</strong> upper DNA strand are placed above <strong>the</strong> genome, and those to <strong>the</strong><br />

lower strand are located below <strong>the</strong> genome. The red vertical l<strong>in</strong>es correspond to <strong>the</strong> nucleotide sequence matches, and <strong>the</strong> green vertical l<strong>in</strong>es<br />

correspond to match<strong>in</strong>g am<strong>in</strong>o acid sequences, after translation of <strong>the</strong> spacer sequences from both DNA strands. In total, <strong>the</strong>re were 106 matches<br />

to SIRV1 at <strong>the</strong> nucleotide level, some of <strong>the</strong>m occurr<strong>in</strong>g more than once, and an additional 127 matches to SIRV1 ORFs at <strong>the</strong> am<strong>in</strong>o acid level.<br />

The black arrowheads <strong>in</strong>dicate <strong>the</strong> positions of <strong>the</strong> 12-bp <strong>in</strong>dels that occur <strong>in</strong> one or more conserved rudiviral genes.<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


6844 VESTERGAARD ET AL. J. BACTERIOL.<br />

TABLE 4. Number of <strong>CRISPR</strong> spacer sequences from complete<br />

and partial crenarchaeal genomes which match<br />

rudiviral genomes a<br />

Stra<strong>in</strong><br />

No. of match<strong>in</strong>g<br />

sequences at <strong>the</strong><br />

<strong>in</strong>dicated level b<br />

Nucleotide<br />

Am<strong>in</strong>o<br />

acid<br />

Accession no.,<br />

reference,<br />

or source<br />

Complete genomes<br />

S. solfataricus P2 22 (14) 31 (18) NC_002754<br />

S. tokodaii 7 9 14 NC_003106<br />

M. sedula 5 15 NC_009440<br />

S. acidocaldarius 5 9 NC_007181<br />

S. mar<strong>in</strong>us F1 2 1 NC_009033<br />

H. butylicus 0 1 NC_008818<br />

Incomplete genomes<br />

S. solfataricus P1 20 (14) 30 (18) 16<br />

S. islandicus (5 stra<strong>in</strong>s) 39/12/4/2/1 26/7/2/2/0 See text<br />

S. islandicus HVE10/4 36 11 Unpublished<br />

A. brierleyi 15 14 Unpublished<br />

a All <strong>the</strong> acido<strong>the</strong>rmophilic organisms from <strong>the</strong> family Sulfolobaceae have<br />

spacers match<strong>in</strong>g those of <strong>the</strong> rudiviral genomes. However, <strong>the</strong> neutrophilic<br />

hyper<strong>the</strong>rmophiles S. mar<strong>in</strong>us and H. butylicus produced very few matches.<br />

Matches at <strong>the</strong> am<strong>in</strong>o acid sequence level that overlapped with those at <strong>the</strong><br />

nucleotide sequence level were excluded from <strong>the</strong> data.<br />

b Numbers <strong>in</strong> paren<strong>the</strong>ses <strong>in</strong> columns 2 and 3 <strong>in</strong>dicate <strong>the</strong> number of matches<br />

that arose from spacers shared by S. solfataricus stra<strong>in</strong>s P1 and P2 (16).<br />

structural prote<strong>in</strong> (correspond<strong>in</strong>g to SRV ORF1059) was localized<br />

with<strong>in</strong> <strong>the</strong> virion tail fibers of SIRV2 by study<strong>in</strong>g functional<br />

groups by <strong>the</strong> use of bioconjugation (Ste<strong>in</strong>metz et al.,<br />

submitted).<br />

We still have limited <strong>in</strong>sight <strong>in</strong>to functional roles of rudivirus-encoded<br />

prote<strong>in</strong>s (Table 2). The glycosyl transferases have<br />

been implicated <strong>in</strong> <strong>the</strong> glycosylation of <strong>the</strong> structural prote<strong>in</strong>s<br />

(34). Moreover, a few prote<strong>in</strong>s have been l<strong>in</strong>ked to viral replication.<br />

Two of <strong>the</strong>se, ORF440 and ORF199, lie with<strong>in</strong> an<br />

operon and are conserved <strong>in</strong> phylogenetically diverse lipothrixviruses<br />

(33). The former yielded significant matches to RuvB,<br />

<strong>the</strong> helicase facilitat<strong>in</strong>g branch migration dur<strong>in</strong>g Holliday junction<br />

resolution, while ORF199 yielded <strong>the</strong> best matches to<br />

nucleases, <strong>in</strong>clud<strong>in</strong>g Holliday junction resolvases (Table 2).<br />

Thus, <strong>the</strong>y are likely to facilitate rudiviral replication, which, <strong>in</strong><br />

SIRV1, <strong>in</strong>volves site-specific nick<strong>in</strong>g with<strong>in</strong> <strong>the</strong> ITR, formation<br />

of head-to-head and tail-to-tail <strong>in</strong>termediates, and conversion<br />

of genomic concatemers to monomers by a Holliday junction<br />

resolvase (ORF116c) (20). In addition, SRV encodes a<br />

dUTPase and a thymidylate synthase, both of which are <strong>in</strong>volved<br />

<strong>in</strong> thymidylate syn<strong>the</strong>sis, whereas <strong>the</strong> o<strong>the</strong>r rudiviruses<br />

encode only one of <strong>the</strong>se enzymes, both of which are considered<br />

helpful <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g of a low dUTP/dTTP ratio and thus<br />

<strong>in</strong> m<strong>in</strong>imiz<strong>in</strong>g detrimental effects of mis<strong>in</strong>corporat<strong>in</strong>g uracil<br />

<strong>in</strong>to DNA. Two putative transcriptional regulators have been<br />

identified, toge<strong>the</strong>r with <strong>the</strong> putative tRNA transglycosylase<br />

encoded by SRV, which has homologs <strong>in</strong> SIRV1 and -2 and <strong>in</strong><br />

o<strong>the</strong>r crenarchaeal viruses (Table 2) and is distantly related to<br />

a tRNA-guan<strong>in</strong>e transglycosylase implicated <strong>in</strong> archeos<strong>in</strong>e formation.<br />

The two approaches employed to analyze <strong>CRISPR</strong> spacers<br />

match<strong>in</strong>g <strong>the</strong> four rudiviral genomes demonstrated that about<br />

10% of <strong>the</strong> 3,042 unique acido<strong>the</strong>rmophile spacers yielded<br />

positive matches. Employ<strong>in</strong>g alignments at <strong>the</strong> am<strong>in</strong>o acid<br />

level considerably <strong>in</strong>creased <strong>the</strong> number of positive matches<br />

detected, because nucleotide sequences diverge more rapidly.<br />

Thus, <strong>the</strong> genomes of SRV and SIRV1 share almost no (4%)<br />

similarity at <strong>the</strong> DNA level, whereas most homologous prote<strong>in</strong>s<br />

show, on average, 47% sequence identity or similarity.<br />

When study<strong>in</strong>g <strong>the</strong> distribution of <strong>the</strong> spacer matches <strong>in</strong> <strong>the</strong><br />

rudiviral genomes, some trends are evident. First, <strong>the</strong>re is no<br />

significant bias with regard to <strong>the</strong> DNA strand carry<strong>in</strong>g <strong>the</strong><br />

match<strong>in</strong>g sequence. In SIRV1, for example, 122 matches occur<br />

on one strand and 111 on <strong>the</strong> o<strong>the</strong>r (Fig. 6). This is consistent<br />

with our assumption that <strong>the</strong> <strong>in</strong>corporation of viral or plasmid<br />

DNA <strong>in</strong>to <strong>the</strong> orientated <strong>CRISPR</strong>s is nondirectional. Second,<br />

<strong>in</strong> accordance with earlier analyses (16), for matches to cod<strong>in</strong>g<br />

regions, <strong>the</strong>re is no significant bias to matches occurr<strong>in</strong>g <strong>in</strong> a<br />

sense or antisense direction. Thus, for SIRV1, 39% of <strong>the</strong><br />

matches are <strong>in</strong> <strong>the</strong> sense direction whereas 54% are antisense—<strong>the</strong><br />

rema<strong>in</strong><strong>in</strong>g 7% constitute nucleotide matches to<br />

non-prote<strong>in</strong>-cod<strong>in</strong>g regions (Fig. 6). Third, when <strong>the</strong> latter<br />

nucleotide sequence-based matches are considered, <strong>the</strong> proportion<br />

of matches which occur <strong>in</strong> <strong>in</strong>tergenic regions, as opposed<br />

to those occurr<strong>in</strong>g <strong>in</strong> prote<strong>in</strong>-cod<strong>in</strong>g regions, is not significantly<br />

different from <strong>the</strong> overall cod<strong>in</strong>g percentage of <strong>the</strong><br />

virus. For SIRV1 19% of <strong>the</strong> nucleotide matches fall with<strong>in</strong><br />

<strong>in</strong>tergenic regions, whereas 20% of <strong>the</strong> genome is non-prote<strong>in</strong>cod<strong>in</strong>g.<br />

F<strong>in</strong>ally, some genes have many matches whereas o<strong>the</strong>rs<br />

have none at all. Five genes have 13 or more matches <strong>in</strong><br />

SIRV1; <strong>the</strong>se genes correspond to SRV ORF440, ORF1059,<br />

ORF134, ORF355, and ORF581. Apart from be<strong>in</strong>g conserved<br />

<strong>in</strong> each rudivirus, <strong>the</strong>ir gene products have important structural<br />

or functional roles (Table 2).<br />

The results pose an important question as to how <strong>the</strong> host<br />

dist<strong>in</strong>guishes between more important and less important<br />

genes when add<strong>in</strong>g <strong>the</strong> spacers to its <strong>CRISPR</strong>s. Possibly, although<br />

<strong>the</strong> de novo addition of spacers may well be an unbiased<br />

process with respect to both viral genome position and<br />

direction, <strong>the</strong> selective advantage provided by some spacers<br />

would result <strong>in</strong> a population be<strong>in</strong>g enriched <strong>in</strong> hosts with<br />

<strong>CRISPR</strong>s carry<strong>in</strong>g spacers target<strong>in</strong>g crucial viral genes.<br />

The 12-bp viral <strong>in</strong>dels were orig<strong>in</strong>ally shown to occur commonly<br />

<strong>in</strong> SIRV1 variants that arose as a result of passage of an<br />

SIRV1 isolate through different closely related S. islandicus<br />

stra<strong>in</strong>s from Iceland, and it was <strong>in</strong>ferred that this unusual<br />

activity reflected adaptation of <strong>the</strong> rudivirus to <strong>the</strong> different<br />

hosts (21). The positions of <strong>the</strong> 12-bp <strong>in</strong>dels that have been<br />

identified <strong>in</strong> conserved rudiviral prote<strong>in</strong> genes are shown toge<strong>the</strong>r<br />

with <strong>the</strong> <strong>CRISPR</strong> spacer matches on <strong>the</strong> SIRV1 genome<br />

map <strong>in</strong> Fig. 6. Many of <strong>the</strong> sites are very close or overlap.<br />

This raises <strong>the</strong> possibility that leng<strong>the</strong>n<strong>in</strong>g or shorten<strong>in</strong>g of<br />

conserved prote<strong>in</strong> genes by 12 bp could be a mechanism to<br />

overcome <strong>the</strong> host <strong>CRISPR</strong> defense <strong>system</strong>.<br />

We conclude that <strong>the</strong> rudiviruses are excellent models for<br />

study<strong>in</strong>g details of viral life cycles and virus-host <strong>in</strong>teractions <strong>in</strong><br />

crenarchaea. These viruses appear to be much more conserved<br />

<strong>in</strong> <strong>the</strong>ir morphologies and genomes than, for example, <strong>the</strong><br />

equally ubiquitous lipothrixviruses. Moreover, <strong>the</strong>y are relatively<br />

stably ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> <strong>the</strong>ir hosts and can be isolated <strong>in</strong><br />

reasonable yields for experimental studies.<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6845<br />

ACKNOWLEDGMENTS<br />

We are grateful to Georg Fuchs for provid<strong>in</strong>g <strong>the</strong> environmental<br />

sample from Saõ Miguel Island, <strong>the</strong> Azores.<br />

The research <strong>in</strong> Copenhagen was supported by grants from <strong>the</strong><br />

Danish Natural Science Research Council, <strong>the</strong> Danish National Research<br />

Foundation, and Copenhagen University. The research <strong>in</strong> Paris<br />

was partly supported by grant NT05-2_41674 from Agence Nationale<br />

de Recherche (Programme Blanc).<br />

REFERENCES<br />

1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller,<br />

and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation<br />

of prote<strong>in</strong> database search programs. Nucleic Acids Res. 25:3389–3402.<br />

2. Bettstetter, M., X. Peng, R. A. Garrett, and D. Prangishvili. 2003. AFV1, a<br />

novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic archaea of <strong>the</strong> genus Acidianus.<br />

Virology 315:68–79.<br />

3. Bland, C., T. L. Ramsey, F. Sabree, M. Lowe, K. Brown, N. C. Kyrpides, and<br />

P. Hugenholtz. 2007. <strong>CRISPR</strong> recognition tool (CRT): a tool for automatic<br />

detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />

Bio<strong>in</strong>formatics 8:209.<br />

4. Blum, H., W. Zillig, S. Mallok, H. Domdey, and D. Prangishvili. 2001. The<br />

genome of <strong>the</strong> archaeal virus SIRV1 has features <strong>in</strong> common with genomes<br />

of eukaryal viruses. Virology 281:6–9.<br />

5. Brügger, K., P. Redder, and M. Skovgaard. 2003. MUTAGEN: multi-user<br />

tool for annotat<strong>in</strong>g genomes. Bio<strong>in</strong>formatics 19:2480–2481.<br />

6. Eder, W., W. Ludwig, and R. Huber. 1999. Novel 16S rRNA gene sequences<br />

retrieved from highly sal<strong>in</strong>e br<strong>in</strong>e sediments of Kebrit Deep, Red Sea. Arch.<br />

Microbiol. 172:213–218.<br />

7. Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with<br />

reduced time and space complexity. BMC Bio<strong>in</strong>formatics 5:113.<br />

8. G<strong>in</strong>alski, K., A. Elofsson, D. Fischer, and L. Rychlewski. 2003. 3D-Jury: a<br />

simple approach to improve prote<strong>in</strong> structure predictions. Bio<strong>in</strong>formatics<br />

19:1015–1018.<br />

9. Här<strong>in</strong>g, M., X. Peng, K. Brügger, R. Rachel, K. O. Stetter, R. A. Garrett, and<br />

D. Prangishvili. 2004. Morphology and genome organization of <strong>the</strong> virus<br />

PSV of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera Pyrobaculum and Thermoproteus:<br />

a novel virus family, <strong>the</strong> Globuloviridae. Virology 323:233–242.<br />

10. Här<strong>in</strong>g, M., G. Vestergaard, K. Brügger, R. Rachel, R. A. Garrett, and D.<br />

Prangishvili. 2005. Structure and genome organization of AFV2, a novel<br />

archaeal lipothrixvirus with unusual term<strong>in</strong>al and core structures. J. Bacteriol.<br />

187:3855–3858.<br />

11. Här<strong>in</strong>g, M., G. Vestergaard, R. Rachel, L. Chen, R. A. Garrett, and D.<br />

Prangishvili. 2005. Virology: <strong>in</strong>dependent virus development outside a host.<br />

Nature 436:1101–1102.<br />

12. Kessler, A., A. B. Br<strong>in</strong>kman, J. van der Oost, and D. Prangishvili. 2004.<br />

Transcription of <strong>the</strong> rod-shaped viruses SIRV1 and SIRV2 of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

archaeon Sulfolobus. J. Bacteriol. 186:7745–7753.<br />

13. Kessler, A., G. Sezonov, J. I. Guijarro, N. Desnoues, T. Rose, M. Delepierre,<br />

S. D. Bell, and D. Prangishvili. 2006. A novel archaeal regulatory prote<strong>in</strong>,<br />

Sta1, activates transcription from viral promoters. Nucleic Acids Res. 34:<br />

4837–4845.<br />

14. Laemmli, U. K. 1970. Cleavage of structural prote<strong>in</strong>s dur<strong>in</strong>g <strong>the</strong> assembly of<br />

<strong>the</strong> head of bacteriophage T4. Nature 227:680–685.<br />

15. Letunic, I., R. R. Copley, B. Pils, S. P<strong>in</strong>kert, J. Schultz, and P. Börk. 2006.<br />

SMART 5: doma<strong>in</strong>s <strong>in</strong> <strong>the</strong> context of genomes and networks. Nucleic Acids<br />

Res. 34:D257–D260.<br />

16. Lillestøl, R. K., P. Redder, R. A. Garrett, and K. Brügger. 2006. A putative<br />

viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2:59–72.<br />

17. Makarova, K. S., N. V. Grish<strong>in</strong>, S. A. Shabal<strong>in</strong>a, Y. I. Wolf, and E. V. Koon<strong>in</strong>.<br />

2006. A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes:<br />

computational analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />

analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms of action.<br />

Biol. Direct 1:7.<br />

18. Mojica, F. J., C. Diez-Villasenor, J. Garcia-Mart<strong>in</strong>ez, and E. Soria. 2005.<br />

Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats derive from<br />

foreign genetic elements. J. Mol. Evol. 60:174–182.<br />

19. Ortmann, A. C., B. Wiedenheft, T. Douglas, and M. Young. 2006. Hot<br />

crenarchaeal viruses reveal deep evolutionary connections. Nat. Rev. Microbiol.<br />

4:520–528.<br />

20. Peng, X., H. Blum, Q. She, S. Mallok, K. Brügger, R. A. Garrett, W. Zillig,<br />

and D. Prangishvili. 2001. Sequences and replication of genomes of <strong>the</strong><br />

archaeal rudiviruses SIRV1 and SIRV2: relationships to <strong>the</strong> archaeal lipothrixvirus<br />

SIFV and some eukaryal viruses. Virology 291:226–234.<br />

21. Peng, X., A. Kessler, H. Phan, R. A. Garrett, and D. Prangishvili. 2004.<br />

Multiple variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and<br />

a novel mechanism of genomic variation. Mol. Microbiol. 54:366–375.<br />

22. Prangishvili, D., H. P. Arnold, D. Gotz, U. Ziese, I. Holz, J. K. Kristjansson,<br />

and W. Zillig. 1999. A novel virus family, <strong>the</strong> Rudiviridae: structure, virushost<br />

<strong>in</strong>teractions and genome variability of <strong>the</strong> Sulfolobus viruses SIRV1 and<br />

SIRV2. Genetics 152:1387–1396.<br />

23. Prangishvili, D., P. Forterre, and R. A. Garrett. 2006. Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />

a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 4:837–848.<br />

24. Prangishvili, D., and R. A. Garrett. 2005. Viruses of hyper<strong>the</strong>rmophilic<br />

Crenarchaea. Trends Microbiol. 13:535–542.<br />

25. Prangishvili, D., R. A. Garrett, and E. V. Koon<strong>in</strong>. 2006. Evolutionary genomics<br />

of archaeal viruses: unique viral genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life.<br />

Virus Res. 117:52–67.<br />

26. Prangishvili, D., G. Vestergaard, M. Här<strong>in</strong>g, R. Aramayo, T. Basta, R.<br />

Rachel, and R. A. Garrett. 2006. Structural and genomic properties of <strong>the</strong><br />

hyper<strong>the</strong>rmophilic archaeal virus ATV with an extracellular stage of <strong>the</strong><br />

reproductive cycle. J. Mol. Biol. 359:1203–1216.<br />

27. Rachel, R., M. Bettstetter, B. P. Hedlund, M. Här<strong>in</strong>g, A. Kessler, K. O.<br />

Stetter, and D. Prangishvili. 2002. Remarkable morphological diversity of<br />

viruses and virus-like particles <strong>in</strong> hot terrestrial environments. Arch. Virol.<br />

147:2419–2429.<br />

28. Reil<strong>in</strong>, A. 1998. Preparation of catalase crystals. University of Ill<strong>in</strong>ois at Urbana-<br />

Champaign, Urbana, IL. http://www.itg.uiuc.edu/publications/techreports/98-009.<br />

29. Rice, G., K. Stedman, J. Snyder, B. Wiedenheft, D. Willits, S. Brumfield, T.<br />

McDermott, and M. J. Young. 2001. Viruses from extreme <strong>the</strong>rmal environments.<br />

Proc. Natl. Acad. Sci. USA 98:13341–13345.<br />

30. Ru<strong>the</strong>rford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream,<br />

and B. Barrell. 2000. ARTEMIS: sequence visualization and annotation.<br />

Bio<strong>in</strong>formatics 16:944–945.<br />

31. Sæbø, P. E., S. M. Andersen, J. Myrseth, J. K. Lærdahl, and T. Rognes. 2005.<br />

PARALIGN: rapid and sensitive sequence similarity searches powered by<br />

parallel comput<strong>in</strong>g technology. Nucleic Acids Res. 33:W535–W539.<br />

32. Snyder, J. C., B. Wiedenheft, M. Lav<strong>in</strong>, F. F. Roberto, J. Spuhler, A. C.<br />

Ortmann, T. Douglas, and M. Young. 2007. Virus movement ma<strong>in</strong>ta<strong>in</strong>s local<br />

virus population diversity. Proc. Natl. Acad. Sci. USA 104:19102–19107.<br />

32a.Ste<strong>in</strong>metz, N. F., A Bize, K. C. F<strong>in</strong>dlay, G. P. Lomonossoff, M. Manchester,<br />

D. J. Evans, and D. Prangishvili. Site-specific and spatially controlled addressability<br />

of a new viral nanobuild<strong>in</strong>g block: Sulfolobus islandicus rodshaped<br />

virus 2. Adv. Funct. Mat., <strong>in</strong> press.<br />

33. Vestergaard, G., R. Aramayo, T. Basta, M. Här<strong>in</strong>g, X. Peng, K. Brügger, L.<br />

Chen, R. Rachel, N. Boisset, R. A. Garrett, and D. Prangishvili. 2008.<br />

Structure of <strong>the</strong> Acidianus filamentous virus 3 and comparative genomics of<br />

related archaeal lipothrixviruses. J. Virol. 82:371–381.<br />

34. Vestergaard, G., M. Här<strong>in</strong>g, X. Peng, R. Rachel, R. A. Garrett, and D.<br />

Prangishvili. 2005. A novel rudivirus, ARV1, of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />

genus Acidianus. Virology 336:83–92.<br />

35. Zillig, W., A. Kletz<strong>in</strong>, C. Schleper, I. Holz, D. Janekovic, H. Ha<strong>in</strong>, M.<br />

Lanzendörfer, and J. K. Kristjansson. 1994. Screen<strong>in</strong>g for Sulfolobales, <strong>the</strong>ir<br />

plasmids and <strong>the</strong>ir viruses <strong>in</strong> Icelandic solfataras. System. Appl. Microbiol.<br />

16:609–628.<br />

Downloaded from<br />

jb.asm.org<br />

by on October 1, 2008


Biochemical Society Transactions www.biochemsoctrans.org<br />

Distribution of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses<br />

and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />

and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism<br />

Molecular Biology of <strong>Archaea</strong> 23<br />

Shiraz Ali Shah*, Niels R. Hansen† and Roger A. Garrett* 1<br />

*Centre for Comparative Genomics, Department of Biology, Biocenter, Copenhagen University, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark, and<br />

†Department of Ma<strong>the</strong>matical Sciences, Copenhagen University, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark<br />

Abstract<br />

Transcripts from spacer sequences with<strong>in</strong> chromosomal repeat clusters [<strong>CRISPR</strong>s (clusters of regularly<br />

<strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)] from archaea have been implicated <strong>in</strong> <strong>in</strong>hibit<strong>in</strong>g or regulat<strong>in</strong>g <strong>the</strong><br />

propagation of archaeal viruses and plasmids. For <strong>the</strong> crenarchaeal <strong>the</strong>rmoacidophiles, <strong>the</strong> chromosomal<br />

spacers show a high level of matches (∼30%) with viral or plasmid genomes. Moreover, <strong>the</strong>ir distribution<br />

along <strong>the</strong> virus/plasmid genomes, as well as <strong>the</strong>ir DNA strand specificity, appear to be random. This is<br />

consistent with <strong>the</strong> hypo<strong>the</strong>sis that chromosomal spacers are taken up directly and randomly from virus and<br />

plasmid DNA and that <strong>the</strong> spacer transcripts target <strong>the</strong> genomic DNA of <strong>the</strong> extrachromosomal elements<br />

and not <strong>the</strong>ir transcripts.<br />

<strong>Archaea</strong>l <strong>CRISPR</strong> <strong>system</strong><br />

<strong>CRISPR</strong>s (clusters of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)<br />

consist of identical repeats separated by unique spacer<br />

sequences of constant length which occur <strong>in</strong> <strong>the</strong> sequenced<br />

chromosomes of almost all archaea and approx. 40% of<br />

bacteria (reviewed <strong>in</strong> [1]). The archaeal repeat clusters are generally<br />

large and can constitute >1% of <strong>the</strong> chromosome. The<br />

orig<strong>in</strong>al observation that some spacers show close sequence<br />

matches with archaeal viral genomes led to <strong>the</strong> hypo<strong>the</strong>sis<br />

that spacer regions have a regulatory effect on viral propagation<br />

[2] and plasmid propagation [1], and this proposal<br />

was subsequently re<strong>in</strong>forced by several studies on both<br />

archaea and bacteria (reviewed <strong>in</strong> [1,3,4]). Moreover, a<br />

mechanism for this putative <strong>in</strong>hibitory effect was suggested,<br />

at an early stage, by <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that RNA transcripts are<br />

produced, and processed, from at least one strand of <strong>the</strong><br />

archaeal repeat clusters [5,6], with <strong>the</strong> smallest product<br />

correspond<strong>in</strong>g roughly <strong>in</strong> size to a s<strong>in</strong>gle spacer transcript<br />

[1]. This opened for <strong>the</strong> possibility of an antisense RNA<br />

or RNAi (RNA <strong>in</strong>terference)-like mechanism act<strong>in</strong>g ei<strong>the</strong>r<br />

on <strong>the</strong> viral transcripts or directly on <strong>the</strong> viral DNA [1,3].<br />

New spacer-repeat units are added at <strong>the</strong> end of <strong>the</strong> repeat<br />

clusters adjo<strong>in</strong><strong>in</strong>g a low-complexity flank<strong>in</strong>g sequence [1,7],<br />

by a process that probably <strong>in</strong>volves Cas prote<strong>in</strong>s which are<br />

generally encoded adjacent to <strong>the</strong> clusters [3,5,8]. Experimental<br />

evidence for such a virus-<strong>in</strong>duced addition was<br />

Key words: acido<strong>the</strong>rmophile, archaeal plasmid, archaeal virus, cluster of regularly <strong>in</strong>terspaced<br />

pal<strong>in</strong>dromic repeats (<strong>CRISPR</strong>), crenarchaeon.<br />

Abbreviations used: ATV, Acidianus two-tailed virus; <strong>CRISPR</strong>, cluster of regularly <strong>in</strong>terspaced<br />

pal<strong>in</strong>dromic repeats; ITR, <strong>in</strong>verted term<strong>in</strong>al repeat; ORF, open read<strong>in</strong>g frame; SIRV1, Sulfolobus<br />

islandicus rod-shaped virus 1; STIV, Sulfolobus turreted icosahedral virus.<br />

1 To whom correspondence should be addressed (email garrett@bio.ku.dk).<br />

Biochem. Soc. Trans. (2009) 37, 23–28; doi:10.1042/BST0370023<br />

recently provided for bacteria on <strong>in</strong>fect<strong>in</strong>g Streptococcus<br />

<strong>the</strong>rmophilus with bacteriophages 858 and 2972 [9].<br />

Hypo<strong>the</strong>sis<br />

In <strong>the</strong> present article, we explore and <strong>in</strong>terpret trends<br />

which emerge when collectively analys<strong>in</strong>g chromosomal<br />

<strong>CRISPR</strong> spacer matches to viral and plasmid genomes.<br />

The crenarchaeal acido<strong>the</strong>rmophiles were selected for <strong>the</strong><br />

analysis because <strong>the</strong>y carry large and multiple repeat clusters<br />

[1] and because many of <strong>the</strong>ir viruses and plasmids have been<br />

sequenced [10]. The results should yield <strong>in</strong>sights <strong>in</strong>to both <strong>the</strong><br />

mechanism of uptake of new spacer regions <strong>in</strong> <strong>CRISPR</strong>s and<br />

<strong>the</strong> mechanism of <strong>in</strong>hibition or regulation of <strong>the</strong> viruses<br />

and plasmids. We assume that, if chromosomal spacer sequence<br />

matches occur randomly on <strong>the</strong> virus or plasmid<br />

genome, <strong>the</strong>n <strong>the</strong> chromosomal spacer regions are generated<br />

by DNA excision and <strong>in</strong>sertion and not by reverse transcription<br />

from virus/plasmid transcripts. In contrast, a<br />

non-random distribution of matches biased to <strong>the</strong> genes<br />

would favour <strong>the</strong> latter RNA-based mechanism. A random<br />

distribution of spacer matches on <strong>the</strong> virus/plasmid genomes<br />

would also favour a DNA-directed <strong>in</strong>hibitory mechanism<br />

for <strong>the</strong> spacer transcripts, whereas a gene-biased distribution<br />

would support <strong>the</strong> spacer transcripts <strong>in</strong>hibit<strong>in</strong>g virus/plasmid<br />

gene expression.<br />

Previous studies on <strong>the</strong> archaeal <strong>CRISPR</strong>s of related<br />

Sulfolobus solfataricus stra<strong>in</strong>s have suggested that <strong>in</strong>dividual<br />

spacers are quite stable and that any selective pressure acts<br />

on larger blocks of spacers [1], so we <strong>in</strong>fer that any selective<br />

pressures on <strong>CRISPR</strong> spacer contents will not <strong>in</strong>fluence our<br />

results and <strong>in</strong>terpretation significantly.<br />

C○The Authors Journal compilation C○2009 Biochemical Society


24 Biochemical Society Transactions (2009) Volume 37, part 1<br />

Selection of viruses, plasmids and <strong>CRISPR</strong>s<br />

Five crenarchaeal virus families, a class of conjugative<br />

plasmids and a family of cryptic plasmids were selected for <strong>the</strong><br />

study (Table 1). They <strong>in</strong>clude six β-lipothrixviruses, family<br />

Lipothrixviridae;fourrudiviruses,familyRudiviridae;seven<br />

fuselloviruses, family Fuselloviridae; a s<strong>in</strong>gle bicaudavirus<br />

ATV (Acidianus two-tailed virus), family Bicaudaviridae;<br />

STIV (Sulfolobus turreted icosahedral virus), an unclassified<br />

icosahedral virus (reviewed <strong>in</strong> [10]), seven members of a conjugative<br />

plasmid family and four members of <strong>the</strong> pRN cryptic<br />

plasmid family (reviewed <strong>in</strong> [11]). Each extrachromosomal<br />

element can propagate <strong>in</strong> members of <strong>the</strong> related crenarchaeal<br />

<strong>the</strong>rmoacidophilic genera Sulfolobus or Acidianus. Spacer<br />

sequences were derived from 13 whole crenarchaeal chromosomal<br />

sequences, from both acido<strong>the</strong>rmophiles and neutro<strong>the</strong>rmophiles,<br />

and <strong>the</strong> partial genomes of Acidianus brierleyi,<br />

S. solfataricus P1 and Sulfolobus islandicus HVE10/4 from our<br />

laboratory and of S. islandicus stra<strong>in</strong>s LD85, YG5714,<br />

YN1551, M164 and U328 which were publicly available <strong>in</strong><br />

May 2008 (Table 1).<br />

Identify<strong>in</strong>g spacer matches<br />

<strong>CRISPR</strong> regions were localized us<strong>in</strong>g publicly available<br />

software [12,13] and exam<strong>in</strong>ed for <strong>the</strong> occurrence of spacer<br />

sequence matches to <strong>the</strong> selected viruses and plasmids. Two<br />

approaches were employed. In one, matches were identified<br />

at a nucleotide sequence level between <strong>the</strong> similarly oriented<br />

spacer sequences (correspond<strong>in</strong>g to <strong>the</strong> processed transcript<br />

sequence [1,5,6]) and ei<strong>the</strong>r strand of <strong>the</strong> virus/plasmid DNA.<br />

In a second approach, we exploited <strong>the</strong> observation that<br />

prote<strong>in</strong> sequences are more highly conserved than gene<br />

sequences and tried to detect significant matches additional<br />

to those identified at a nucleotide sequence level. Each spacer<br />

strand was translated <strong>in</strong>to three am<strong>in</strong>o acid sequences, and,<br />

after remov<strong>in</strong>g sequences conta<strong>in</strong><strong>in</strong>g stop codons (about<br />

50%), each translated sequence was aligned aga<strong>in</strong>st am<strong>in</strong>o<br />

acid sequences of all annotated ORFs (open read<strong>in</strong>g frames)<br />

of all <strong>the</strong> viruses and plasmids. Implicit <strong>in</strong> this approach is<br />

<strong>the</strong> assumption that <strong>the</strong> uptake of spacers <strong>in</strong> <strong>the</strong> oriented<br />

<strong>CRISPR</strong>s is non-directional, and this is borne out by <strong>the</strong><br />

results (see below). A nucleotide sequence approach was<br />

also applied to <strong>the</strong> whole acido<strong>the</strong>rmophile chromosomes<br />

by search<strong>in</strong>g for exact matches to <strong>CRISPR</strong> spacers (Table 1).<br />

Significant e-value cut-offs were determ<strong>in</strong>ed for both <strong>the</strong> nucleotide<br />

and am<strong>in</strong>o acid sequence searches us<strong>in</strong>g <strong>the</strong> genome<br />

sequence of Saccharomyces cerevisiae as a negative control<br />

(results not shown). All sequence alignments were performed<br />

us<strong>in</strong>g Paralign, an MMX-optimized implementation of <strong>the</strong><br />

Smith–Watermann algorithm [14].<br />

Analysis of <strong>the</strong> distribution of<br />

chromosomal spacer matches on<br />

virus/plasmid genomes<br />

In total, 82 repeat clusters, some <strong>in</strong>complete (Table 1), yielded<br />

4005 spacer sequences, after subtract<strong>in</strong>g 278 spacer sequences<br />

C○The Authors Journal compilation C○2009 Biochemical Society<br />

shared between S. solfataricus stra<strong>in</strong>s P1 and P2 [1]. Approx.<br />

30% of <strong>the</strong> spacers from <strong>the</strong> acido<strong>the</strong>rmophile genomes<br />

match to <strong>the</strong> virus and plasmid families (Table 1), whereas<br />

only approx. 5% matched for <strong>the</strong> neutro<strong>the</strong>rmophiles.<br />

This difference probably reflects that <strong>the</strong> viruses and plasmids<br />

only fall with<strong>in</strong> <strong>the</strong> host specificity range for <strong>the</strong> acido<strong>the</strong>rmophiles.<br />

The locations of all <strong>the</strong> spacer matches are<br />

superimposed on genome maps of representative genetic elements<br />

<strong>in</strong> Figure 1. Spacers giv<strong>in</strong>g nucleotide sequence matches<br />

to ei<strong>the</strong>r DNA strand (red l<strong>in</strong>es) occur ma<strong>in</strong>ly with<strong>in</strong> genes,<br />

but a few are located <strong>in</strong>tergenically or with<strong>in</strong> <strong>the</strong> non-prote<strong>in</strong>cod<strong>in</strong>g<br />

region of <strong>the</strong> ITR (<strong>in</strong>verted term<strong>in</strong>al repeat).<br />

Translated spacers yield<strong>in</strong>g am<strong>in</strong>o acid sequence matches,<br />

additionally to <strong>the</strong> nucleotide sequence matches, occur<br />

with<strong>in</strong> annotated ORFs on ei<strong>the</strong>r DNA strand (green l<strong>in</strong>es).<br />

In a series of three tests, we attempted to address <strong>the</strong><br />

question of whe<strong>the</strong>r or not <strong>the</strong> spacers present <strong>in</strong> host<br />

chromosomal <strong>CRISPR</strong>s match <strong>the</strong> virus/plasmid genomes<br />

<strong>in</strong> a biased non-random manner. Potential biases <strong>in</strong>clude <strong>the</strong><br />

preferential match<strong>in</strong>g to certa<strong>in</strong> regions of <strong>the</strong> virus/plasmid<br />

genome and DNA strand biases. We exclusively used <strong>the</strong> nucleotide<br />

sequence match<strong>in</strong>g data because it covered <strong>the</strong> whole<br />

genome.<br />

First, we exam<strong>in</strong>ed <strong>the</strong> distribution of spacer sequence<br />

matches, at a nucleotide level, along <strong>the</strong> virus/plasmid<br />

genomes. We assumed that a uniform distribution would<br />

follow, roughly, a homogeneous Poisson process, whereas<br />

an irregular distribution along <strong>the</strong> genome would yield a<br />

deviation from <strong>the</strong> homogeneous Poisson process. We <strong>in</strong>vestigated<br />

for this us<strong>in</strong>g Kolmogorov–Smirnov test statistics for<br />

each virus and plasmid and we were generally unable to detect<br />

any significant deviations from a homogeneous Poisson<br />

distribution.<br />

Secondly, we tested whe<strong>the</strong>r <strong>the</strong>re was any detectable<br />

bias <strong>in</strong> <strong>the</strong> spacer matches to <strong>the</strong> most conserved viral genes<br />

given that <strong>the</strong>y are more likely to be targets for <strong>in</strong>hibition<br />

of propagation. The number of matches to each gene was<br />

analysed us<strong>in</strong>g a Poisson regression model with <strong>the</strong> gene conservation<br />

and length as explanatory variables. This analysis<br />

showed that <strong>the</strong> number of matches to a given gene did not<br />

depend significantly upon <strong>the</strong> degree of its conservation,<br />

although, for SIRV1 (Sulfolobus islandicus rod-shaped<br />

virus 1), we did observe a weak effect for <strong>the</strong> seven to ten<br />

most conserved genes. Moreover, it was found that <strong>the</strong><br />

expected number of matches was proportional to <strong>the</strong> gene<br />

length, <strong>in</strong> agreement with <strong>the</strong> homogeneous Poisson process.<br />

Thirdly, we tested for any bias <strong>in</strong> <strong>the</strong> distribution of<br />

spacer matches <strong>in</strong> cod<strong>in</strong>g compared with non-cod<strong>in</strong>g regions<br />

or to <strong>the</strong> sense compared with antisense strands of <strong>the</strong> virus/<br />

plasmid genes us<strong>in</strong>g a specific alternative of a Poisson process<br />

with different <strong>in</strong>tensities for matches occurr<strong>in</strong>g with<strong>in</strong>,<br />

and outside, prote<strong>in</strong>-cod<strong>in</strong>g regions, treat<strong>in</strong>g each DNA<br />

strand separately. We were unable to detect any significant<br />

deviations from a homogeneous Poisson distribution for <strong>the</strong><br />

match <strong>in</strong>tensities of <strong>the</strong> cod<strong>in</strong>g compared with non-cod<strong>in</strong>g<br />

regions, with <strong>the</strong> exception of STIV, where <strong>the</strong>re is a bias to<br />

<strong>the</strong> antisense strand (Figure 1).


C○The Authors Journal compilation C○2009 Biochemical Society<br />

Table 1 Summary of <strong>the</strong> chromosomal spacer matches to <strong>the</strong> virus and plasmid genomes of <strong>the</strong> crenarchaeal acido<strong>the</strong>rmophiles<br />

The number of <strong>CRISPR</strong> spacers are given which match virus/plasmid family genomes significantly at a nucleotide level, as well as additional matches detected at an am<strong>in</strong>o acid level. Spacer matches to <strong>the</strong><br />

host’s own genome constitute only exact nucleotide matches. The total number of chromosomal spacers match<strong>in</strong>g to virus/plasmid genomes differs from <strong>the</strong> number of spacers that match each plasmid and<br />

virus family because some spacers match more than one family, but have been counted only once. Rudiviruses comprise SIRV1, SIRV2, ARV and SRV1; β-lipothrixviruses constitute AFV3, AFV6, AFV7, AFV8, AFV9<br />

and SIFV, and fuselloviruses <strong>in</strong>clude SSV2, SSV4, SSV5, SSVrh, SSVk1 and SSV1. The pNOB8 family conta<strong>in</strong>s pNOB8, pARN3, pARN4, pHVE14, pING1, pKEF9, pSOG1 and pSOG2, and <strong>the</strong> pRN family consists of pHEN7,<br />

pDL10, pRN1 and pRN2. The 278 spacers which S. solfataricus P1 shares with stra<strong>in</strong> P2 were subtracted dur<strong>in</strong>g <strong>the</strong> analysis, but have been re<strong>in</strong>serted <strong>in</strong> this Table. For <strong>the</strong> partial genomes, <strong>the</strong> total numbers of<br />

spacers are approximate, s<strong>in</strong>ce repeat clusters may not be fully sequenced. Genome sequences for S. solfataricus P1, S. islandicus HVE10/4 and A. brierleyi are unpublished work from our laboratory. Genomes<br />

of S. islandicus stra<strong>in</strong>s LD85, YG5714, YN1551, M164 and U328 are publicly available from <strong>the</strong> JGI (Jo<strong>in</strong>t Genome Institute) database (http://www.jgi.doe.gov/). All neutro<strong>the</strong>rmophile genomes were complete<br />

and obta<strong>in</strong>ed through GenBank ® accession numbers NC_000854 (Aeropyrum pernix K1), NC_008818 (Hyper<strong>the</strong>rmus butylicus DSM5456), NC_009776 (Ignicoccus hospitalis KIN4/I), NC_003364 (Pyrobaculum<br />

aerophilum IM2), NC_009376 (Pyrobaculum arsenaticum DSM 13514), NC_009073 (Pyrobaculum calidifontis JCM11548), NC_008701 (Pyrobaculum islandicum DSM4184), NC_009033 (Staphylo<strong>the</strong>rmus mar<strong>in</strong>us<br />

F1) and NC_008698 (Thermofilum pendens Hrk5).<br />

Spacers pNOB8 family pRN family Spacers Matches with GenBank ® /JGI accession<br />

Stra<strong>in</strong> (total) Rudiviruses β-Lipothrixviruses Fuselloviruses STIV ATV (conjugative) (cryptic) (total match<strong>in</strong>g) own genome number/reference<br />

Acido<strong>the</strong>rmophiles (total) 3313 331 181 134 81 126 226 63 969 1 –<br />

Sulfolobus solfataricus P2 415 53 24 15 9 20 26 12 135 0 NC_002754<br />

Sulfolobus solfataricus P1 423 50 22 19 9 26 32 7 144 0 [1]<br />

Sulfolobus islandicus HVE10/4 270 47 20 20 4 3 19 9 104 0 Unpublished<br />

Sufolobus tokodaii 7 461 23 19 19 13 2 43 6 108 1 NC_003106<br />

Sulfolobus acidocaldarius DSM639 223 14 5 2 1 2 15 4 38 0 NC_007181<br />

Metallosphaera sedula DSM5348 386 20 9 8 6 59 31 4 110 0 NC_009440<br />

Acidianus brierleyi 367 29 21 9 8 5 32 10 100 0 Unpublished<br />

Sulfolobus islandicus LD85 287 65 39 10 6 1 6 6 114 0 4023472<br />

Four Sulfolobus islandicus stra<strong>in</strong>s<br />

(YG5714, YN1551, M164, U328)<br />

481 30 22 32 25 8 19 5 116 0 4023468, 4005359,<br />

Neutro<strong>the</strong>rmophiles (total) 963 6 13 14 1 4 16 0 52 0 –<br />

4023464, 4023466<br />

Molecular Biology of <strong>Archaea</strong> 25


26 Biochemical Society Transactions (2009) Volume 37, part 1<br />

Figure 1 <strong>CRISPR</strong> spacer matches superimposed on genomes of representative viruses and plasmids<br />

SIRV1, rudiviruses; AFV9 (Acidianus filamentous virus 9), β-lipothrixviruses; SSV2 (Sulfolobus sp<strong>in</strong>dle-shaped virus 2),<br />

fuselloviruses; STIV, unclassified icosahedral virus; ATV, bicaudavirus; pNOB8, conjugative plasmids; pHEN7, cryptic plasmids.<br />

A prelim<strong>in</strong>ary version of <strong>the</strong> rudiviral data was presented <strong>in</strong> [15]. The circular genomes (SSV2, STIV, ATV, pNOB8 and pHEN7)<br />

are presented <strong>in</strong> a l<strong>in</strong>ear format. Prote<strong>in</strong>-cod<strong>in</strong>g regions are boxed and shaded, accord<strong>in</strong>g to <strong>the</strong>ir levels of conservation<br />

for those genomes for which comparative data are available (all except for STIV and ATV). Spacer sequence matches are<br />

<strong>in</strong>dicated by l<strong>in</strong>es above and below <strong>the</strong> genomes for <strong>the</strong> two DNA strands and <strong>the</strong>y are colour-coded accord<strong>in</strong>g to whe<strong>the</strong>r<br />

<strong>the</strong>y occur exclusively at a nucleotide level (red) or additionally at an am<strong>in</strong>o acid level (green).<br />

Similar results for <strong>the</strong> first and third tests were obta<strong>in</strong>ed<br />

when <strong>the</strong> analysis was limited to spacer matches from family<br />

I <strong>CRISPR</strong>s (see below).<br />

Classify<strong>in</strong>g crenarchaeal acido<strong>the</strong>rmophile<br />

<strong>CRISPR</strong> families<br />

<strong>CRISPR</strong>s are oriented and <strong>the</strong>y generally carry a 300–600 bp<br />

low-complexity flank<strong>in</strong>g sequence immediately upstream of<br />

<strong>the</strong> repeat cluster which conta<strong>in</strong>s <strong>the</strong> transcriptional leader<br />

sequence [1]. Sequence analysis of <strong>the</strong> flank<strong>in</strong>g sequences<br />

by multiple alignment [16] and motif analysis [17], along<br />

with sequence comparison of <strong>the</strong> repeat sequence from each<br />

C○The Authors Journal compilation C○2009 Biochemical Society<br />

cluster, suggested that <strong>the</strong> <strong>CRISPR</strong>s can be classified <strong>in</strong>to<br />

families. All crenarchaeal flank<strong>in</strong>g sequences share a common<br />

A/T-rich motif adjacent to <strong>the</strong> first repeat of <strong>the</strong> cluster,<br />

whereas <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> flank<strong>in</strong>g sequence is familyspecific.<br />

At least three dist<strong>in</strong>ct families, each with multiple<br />

members, were found for <strong>the</strong> acido<strong>the</strong>rmophiles by analys<strong>in</strong>g<br />

<strong>the</strong> flank<strong>in</strong>g sequences alone (Figure 2A), and this f<strong>in</strong>d<strong>in</strong>g<br />

was re<strong>in</strong>forced by construct<strong>in</strong>g a multiple alignment of repeat<br />

sequences from <strong>the</strong> clusters (Figure 2B). Thus <strong>the</strong>re is a<br />

clear correlation between <strong>the</strong> nature of <strong>the</strong> flank<strong>in</strong>g sequence<br />

and <strong>the</strong> repeat sequence which constitutes a repeat cluster.<br />

These <strong>CRISPR</strong> families cross species and genus barriers, and<br />

most of <strong>the</strong> acido<strong>the</strong>rmophile genomes conta<strong>in</strong> clusters from


Figure 2 <strong>CRISPR</strong> families of crenarchaeal acido<strong>the</strong>rmophiles<br />

(A) Schematicrepresentationof<strong>the</strong>threetypesofflank<strong>in</strong>gsequence<br />

associated with <strong>CRISPR</strong> families I, II and III. All three flank<strong>in</strong>g se-<br />

quences share a motif adjacent to <strong>the</strong> repeat cluster, whereas<br />

<strong>the</strong> upstream region of <strong>the</strong> flank is specific for each family. (B)<br />

Phylogenetic tree created us<strong>in</strong>g ClustalW [18] based on a multiple<br />

alignment of a repeats from each acido<strong>the</strong>rmophile repeat cluster.<br />

The <strong>CRISPR</strong>s studied are labelled by a four-letter prefix based<br />

on <strong>the</strong> genus and species name <strong>in</strong> addition to <strong>the</strong> number of<br />

repeats carried by <strong>the</strong> repeat cluster. Abri, Acidianus brierleyi; Msed,<br />

Metallosphaera sedula; Saci,Sulfolobus acidocaldarius; Sisl, Sulfolobus<br />

islandicus; Ssol, Sulfolobus solfataricus; Stok, Sufolobus tokodaii.<br />

S. islandicus HVE10/4 and A. brierleyi repeat clusters were not<br />

Molecular Biology of <strong>Archaea</strong> 27<br />

completely sequenced and <strong>the</strong> total number of repeats is not given.<br />

The three major repeat cluster families are <strong>in</strong>dicated by differently<br />

shaded boxes. (C) Logo-plot (http://weblogo.berkeley.edu/) of <strong>the</strong><br />

motif located upstream of <strong>the</strong> area on a virus or plasmid genome<br />

matched by a group I spacer. The CC motif was found at approx. 75% of<br />

all match<strong>in</strong>g sites.<br />

different families. Therefore no families are specific to a given<br />

species and no species is limited to a s<strong>in</strong>gle family. These<br />

results strongly re<strong>in</strong>force <strong>the</strong> hypo<strong>the</strong>sis that <strong>CRISPR</strong>–Cas<br />

<strong>system</strong>s are acquired via horizontal gene transfer [1,19].<br />

Over half of <strong>the</strong> acido<strong>the</strong>rmophile repeat clusters belong<br />

to family I, where, generally, <strong>the</strong> sequence just upstream of<br />

<strong>the</strong> virus or plasmid site which matches a family I spacer<br />

carries a CC motif (Figure 2C). Insufficient data precluded<br />

our establish<strong>in</strong>g whe<strong>the</strong>r such motifs occur adjacent to<br />

family II and family III spacer matches.<br />

Conclusions<br />

The results demonstrate that <strong>CRISPR</strong> spacer matches are<br />

uniformly distributed throughout <strong>the</strong> virus/plasmid genomes,<br />

regardless of both gene location and degree of gene conservation.<br />

Moreover, <strong>the</strong>re is no significant bias to ei<strong>the</strong>r sense<br />

or antisense strands of genes (with <strong>the</strong> exception of STIV):<br />

both strands are targeted to an equal degree. These f<strong>in</strong>d<strong>in</strong>gs<br />

strongly suggest that <strong>the</strong> spacer regions of <strong>the</strong> <strong>CRISPR</strong><br />

are taken up randomly, and non-directionally, from <strong>the</strong><br />

virus or plasmid DNA and are not generated by reverse<br />

transcriptase from virus/plasmid transcripts. The results are<br />

also consistent with <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> <strong>CRISPR</strong> spacer<br />

transcripts target <strong>the</strong> virus/plasmid by hybridiz<strong>in</strong>g directly<br />

to <strong>the</strong>ir DNA, possibly prim<strong>in</strong>g it for degradation.<br />

The results also support a mechanism whereby virus or<br />

plasmid propagation is <strong>in</strong>hibited primarily at a DNA level and<br />

not at a gene-expression level. For example, <strong>the</strong> non-prote<strong>in</strong>cod<strong>in</strong>g<br />

ITR region, which is implicated <strong>in</strong> rudiviral replication<br />

[10], carries seven spacer matches <strong>in</strong> SIRV1 (Figure 1)<br />

and o<strong>the</strong>r spacer matches occur <strong>in</strong> <strong>in</strong>tergenic regions which<br />

appear not to be <strong>in</strong>volved <strong>in</strong> transcriptional regulation<br />

(results not shown).<br />

The <strong>in</strong>hibitory mechanism also appears to be highly<br />

specific for virus/plasmid DNA, s<strong>in</strong>ce only one perfect<br />

spacer sequence match was detected with<strong>in</strong> any of <strong>the</strong> acido<strong>the</strong>rmophile<br />

chromosomal sequences exam<strong>in</strong>ed (Table 1).<br />

This may be crucial for cell survival if <strong>the</strong> <strong>in</strong>hibitory<br />

mechanism <strong>in</strong>volves DNA degradation, but, given that<br />

viruses and plasmids often <strong>in</strong>tegrate reversibly <strong>in</strong>to archaeal<br />

chromosomes [20], it suggests that <strong>the</strong> <strong>CRISPR</strong>–Cas <strong>system</strong><br />

selectively targets DNA of extrachromosomal elements,<br />

whe<strong>the</strong>r circular or l<strong>in</strong>ear.<br />

The <strong>CRISPR</strong>–Cas <strong>system</strong> has been primarily implicated<br />

<strong>in</strong> viral <strong>in</strong>hibition <strong>in</strong> both archaea and bacteria [1,3,4], but it<br />

is clear from <strong>the</strong> present analysis that, at least for archaea, its<br />

role is more complex. The apparatus targets plasmids, both<br />

conjugative and cryptic, with a similar frequency to viruses<br />

(Figure 1). Moreover, some host <strong>CRISPR</strong> spacers match <strong>the</strong>ir<br />

C○The Authors Journal compilation C○2009 Biochemical Society


28 Biochemical Society Transactions (2009) Volume 37, part 1<br />

own viruses or plasmids, suggest<strong>in</strong>g a regulatory, ra<strong>the</strong>r than<br />

an <strong>in</strong>hibitory, role, and this possibility is re<strong>in</strong>forced by <strong>the</strong> low<br />

copy numbers, and non-lytic properties, of most crenarchaeal<br />

viruses [10]. F<strong>in</strong>ally, <strong>the</strong> observation that a spacer sequence<br />

<strong>in</strong> <strong>the</strong> repeat cluster of <strong>the</strong> conjugative plasmid pKEF9<br />

[21] matches a rudiviral genome suggests that plasmids<br />

<strong>the</strong>mselves can also <strong>in</strong>hibit/regulate co-<strong>in</strong>fect<strong>in</strong>g viruses.<br />

Acknowledgements<br />

Dr Kim Brügger k<strong>in</strong>dly provided unpublished genome sequence data.<br />

Fund<strong>in</strong>g<br />

Work was supported by <strong>the</strong> Danish National Research Foundation for<br />

a Centre of Comparative Genomics and <strong>the</strong> Danish Natural Science<br />

Research Council [grant number 272-06-0442].<br />

References<br />

1 Lillestøl, R.K., Redder, P., Garrett, R.A. and Brügger, K. (2006) A putative<br />

viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59–72<br />

2 Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J. and Soria, E. (2005)<br />

Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats derive<br />

from foreign genetic elements. J. Mol. Evol. 60, 174–182<br />

3 Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I. and Koon<strong>in</strong>, E.V.<br />

(2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong><br />

prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />

mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical<br />

mechanisms of action. Biol. Direct 1, 7<br />

4Sorek,R.,Kun<strong>in</strong>,V.andHugenholtz,P.(2008)<strong>CRISPR</strong>:awidespread<br />

<strong>system</strong> that provides acquired resistance aga<strong>in</strong>st phages <strong>in</strong> bacteria and<br />

archaea. Nat. Rev. Microbiol. 6, 181–186<br />

5Tang,T.-H.,Bachellerie,J.-P.,Rozhdestvensky,T.,Bortol<strong>in</strong>,M.-L.,<br />

Huber, H., Drungowski, M., Elge, T., Brosius, J. and Hüttenhofer, A. (2002)<br />

Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong><br />

archaeon Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99,<br />

7536–7541<br />

6Tang,T.-H.,Polacek,N.,Zywicki,M.,Huber,H.,Brügger, K., Garrett, R.A.,<br />

Bachellerie, J. P. and Hüttenhofer, A. (2005) Identification of novel<br />

non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon<br />

Sulfolobus solfataricus. Mol. Microbiol. 55, 469–481<br />

7 Pourcel, C., Salvignol, G. and Vergnaud, G. (2005) <strong>CRISPR</strong> elements <strong>in</strong><br />

Yers<strong>in</strong>ia pestis acquire new repeats by preferential uptake of<br />

bacteriophage DNA, and provide additional tools for evolutionary<br />

studies. Microbiology 151, 653–663<br />

C○The Authors Journal compilation C○2009 Biochemical Society<br />

8 Jansen, R., Embden, J.D., Gaastra, W. and Schouls, L.M. (2002)<br />

Identification of genes that are associated with DNA repeats <strong>in</strong><br />

prokaryotes. Mol. Microbiol. 43, 1565–1575<br />

9 Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />

Mo<strong>in</strong>eau, S., Romero, D.A. and Horvath, P. (2007) <strong>CRISPR</strong> provides<br />

acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315,<br />

1709–1712<br />

10 Prangishvili, D., Forterre, P. and Garrett, R.A. (2006) Viruses of <strong>the</strong><br />

<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837–848<br />

11 Lipps, G. (2006) Plasmids and viruses of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />

crenarchaeote Sulfolobus. Extremophiles 10, 17–28<br />

12 Edgar, R.C. (2007) PILER-CR: fast and accurate identification of <strong>CRISPR</strong><br />

repeats. BMC Bio<strong>in</strong>formatics 8, 18–24<br />

13 Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C.<br />

and Hugenholtz, P. (2007) <strong>CRISPR</strong> Recognition Tool (CRT): a tool for<br />

automatic detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic<br />

repeats. BMC Bio<strong>in</strong>formatics 8, 209–217<br />

14 Saebø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K. and Rognes, T.<br />

(2005) PARALIGN: rapid and sensitive sequence similarity searches<br />

powered by parallel comput<strong>in</strong>g technology. Nucleic Acids Res. 33,<br />

535–539<br />

15 Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />

Briegel, A., Rachel, R., Garrett, R.A. and Prangishvili, D. (2008) SRV, a<br />

new rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />

rudiviruses with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />

6837–6845<br />

16 Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high<br />

accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797<br />

17 Bailey, T.L., Williams, N., Misleh, C. and Li, W.W. (2006) MEME:<br />

discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong> sequence motifs.<br />

Nucleic Acids Res. 34, 369–373<br />

18 Thompson, J.D., Higg<strong>in</strong>s, D.G. and Gibson, T.J. (1994) CLUSTAL W:<br />

improv<strong>in</strong>g <strong>the</strong> sensitivity of progressive multiple sequence alignment<br />

through sequence weight<strong>in</strong>g, position-specific gap penalties and weight<br />

matrix choice. Nucleic Acids Res. 22, 4673–4680<br />

19 Godde, J.S. and Bickerton, A. (2006) The repetitive DNA elements called<br />

<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />

among prokaryotes. J. Mol. Evol. 62, 718–729<br />

20 Wang, Y., Duan, Z., Zhu, H., Guo, X., Wang, Z., Zhou, J., She, Q. and<br />

Huang, L. (2007) A novel Sulfolobus non-conjugative extrachromosomal<br />

genetic element capable of <strong>in</strong>tegration <strong>in</strong>to <strong>the</strong> host genome and<br />

spread<strong>in</strong>g <strong>in</strong> <strong>the</strong> presence of a fusellovirus. Virology 363,<br />

124–133<br />

21 Greve, B., Jensen, S., Brügger, K., Zillig, W. and Garrett, R.A. (2004)<br />

Genomic comparison of archaeal conjugative plasmids from Sulfolobus.<br />

<strong>Archaea</strong> 1, 231–239<br />

Received 6 August 2008<br />

doi:10.1042/BST0370023


Molecular Microbiology (2009) 72(1), 259–272 doi:10.1111/j.1365-2958.2009.06641.x<br />

First published onl<strong>in</strong>e 2 March 2009<br />

<strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus Sulfolobus:<br />

bidirectional transcription and dynamic properties<br />

Reidun K. Lillestøl, Shiraz A. Shah, Kim Brügger, †<br />

Peter Redder, Hien Phan, Jan Christiansen and<br />

Roger A. Garrett*<br />

Centre for Comparative Genomics, Department of<br />

Biology, University of Copenhagen, Ole Maaløes Vej 5,<br />

2200 Copenhagen N, Denmark.<br />

Summary<br />

Clusters of regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />

repeats (<strong>CRISPR</strong>s) of Sulfolobus fall <strong>in</strong>to three ma<strong>in</strong><br />

families based on <strong>the</strong>ir repeats, leader regions, associated<br />

cas genes and putative recognition sequences<br />

on viruses and plasmids. Spacer sequence matches<br />

to different viruses and plasmids of <strong>the</strong> Sulfolobales<br />

revealed some bias particularly for family III <strong>CRISPR</strong>s.<br />

Transcription occurs on both strands of <strong>the</strong> five<br />

repeat-clusters of Sulfolobus acidocaldarius and a<br />

repeat-cluster of <strong>the</strong> conjugative plasmid pKEF9.<br />

Leader strand transcripts cover whole repeat-clusters<br />

and are processed ma<strong>in</strong>ly from <strong>the</strong> 3-end, with<strong>in</strong><br />

repeats, yield<strong>in</strong>g heterogeneous 40–45 nt spacer<br />

RNAs. Process<strong>in</strong>g of <strong>the</strong> pKEF9 leader transcript<br />

occurred partially <strong>in</strong> spacers, and was <strong>in</strong>complete,<br />

probably reflect<strong>in</strong>g defective repeat recognition by<br />

host enzymes. A similar level of transcripts was generated<br />

from complementary strands of each chromosomal<br />

repeat-cluster and <strong>the</strong>y were processed to<br />

yield discrete ~55 nt spacer RNAs. Analysis of <strong>the</strong><br />

partially identical repeat-clusters of Sulfolobus solfataricus<br />

stra<strong>in</strong>s P1 and P2 revealed that spacer-repeat<br />

units are added upstream only when a leader and<br />

certa<strong>in</strong> cas genes are l<strong>in</strong>ked. Downstream ends of <strong>the</strong><br />

repeat-clusters are conserved such that deletions and<br />

recomb<strong>in</strong>ation events occur <strong>in</strong>ternally.<br />

Introduction<br />

Clusters of regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />

repeats (<strong>CRISPR</strong>s) consist of identical repeats separated<br />

by unique spacer sequences of constant length. They are<br />

Accepted 14 February, 2009. *For correspondence. E-mail garrett@<br />

bio.ku.dk; Tel. (+45) 35322010; Fax (+45) 35322228. † Present<br />

address: Wellcome Trust Sanger Institute, H<strong>in</strong>xton, UK.<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd<br />

present <strong>in</strong> <strong>the</strong> sequenced chromosomes of almost all<br />

archaea and about 40% of bacteria, as well as <strong>in</strong> some<br />

plasmids (Lillestøl et al., 2006; Grissa et al., 2007; Sorek<br />

et al., 2008). The orig<strong>in</strong>al observation that some spacers<br />

show close sequence matches to viral genomes and plasmids<br />

(Mojica et al., 2005) led to <strong>the</strong> hypo<strong>the</strong>sis that<br />

spacer regions are <strong>in</strong>corporated <strong>in</strong>to <strong>the</strong> chromosome<br />

from <strong>the</strong> extra-chromosomal element and have a regulatory<br />

or <strong>in</strong>hibitory effect on <strong>the</strong>ir propagation (Bolot<strong>in</strong> et al.,<br />

2005; Mojica et al., 2005; Pourcel et al., 2005; Lillestøl<br />

et al., 2006). Recently, this hypo<strong>the</strong>sis was re<strong>in</strong>forced<br />

experimentally for bacteria by show<strong>in</strong>g that new spacers<br />

deriv<strong>in</strong>g from phage genomes <strong>in</strong>tegrate <strong>in</strong>to <strong>CRISPR</strong>s of<br />

Streptococcus <strong>the</strong>rmophilus <strong>in</strong> response to phage <strong>in</strong>fection,<br />

which <strong>in</strong> turn leads to phage resistance (Barrangou<br />

et al., 2007; Deveau et al., 2008; Horvath et al., 2008a). In<br />

both archaea and bacteria, new spacer-repeat units are<br />

added at <strong>the</strong> end of <strong>the</strong> repeat-clusters adjo<strong>in</strong><strong>in</strong>g a low<br />

complexity leader sequence (Jansen et al., 2002; Tang<br />

et al., 2002; Pourcel et al., 2005; Lillestøl et al., 2006;<br />

Barrangou et al., 2007), presumably facilitated by Cas<br />

prote<strong>in</strong>s which are generally encoded adjacent to <strong>the</strong><br />

clusters (Jansen et al., 2002; Haft et al., 2005; Makarova<br />

et al., 2006).<br />

Despite <strong>the</strong> akaryotic nature of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong><br />

(Forterre, 1992), <strong>the</strong>re are significant differences between<br />

<strong>the</strong> archaeal and bacterial <strong>system</strong>s studied so far. First,<br />

archaeal repeat-clusters tend to be very extensive and<br />

can constitute more than 1% of <strong>the</strong> chromosome (Lillestøl<br />

et al., 2006). Second, <strong>the</strong>y often exhibit a low level of , or<br />

no, dyad symmetry <strong>in</strong> <strong>the</strong>ir repeat sequences (Lillestøl<br />

et al., 2006; Kun<strong>in</strong> et al., 2007). Third, some of <strong>the</strong> cas<br />

genes implicated <strong>in</strong> RNA process<strong>in</strong>g and spacer<br />

sequence <strong>in</strong>sertion are highly divergent between archaea<br />

and bacteria (Haft et al., 2005). Fourth, <strong>the</strong> many archaeal<br />

spacer sequences which match plasmids or viruses show<br />

no clear bias to viruses (Shah et al., 2009).<br />

A mechanism for <strong>the</strong> putative regulatory or <strong>in</strong>hibitory<br />

effect <strong>in</strong> both euryarchaea and crenarchaea was suggested,<br />

at an early stage, by <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that RNA transcripts<br />

are produced, and processed, from one strand of<br />

archaeal repeat-clusters (Tang et al., 2002; 2005), with<br />

<strong>the</strong> smallest product correspond<strong>in</strong>g approximately <strong>in</strong><br />

both size and sequence to a s<strong>in</strong>gle spacer transcript<br />

(Lillestøl et al., 2006). Fur<strong>the</strong>rmore, it was demonstrated


260 R. K. Lillestøl et al. <br />

experimentally for a bacterium that a complex of Cas<br />

prote<strong>in</strong>s was responsible for process<strong>in</strong>g <strong>in</strong> <strong>the</strong> repeats to<br />

generate <strong>the</strong> small RNAs encompass<strong>in</strong>g <strong>the</strong> spacer<br />

regions (Brouns et al., 2008) and for <strong>the</strong> euryarchaeon<br />

Pyrococcus furiosus, it was shown that <strong>the</strong> Cas6 prote<strong>in</strong><br />

b<strong>in</strong>ds to <strong>the</strong> 5′-end of <strong>the</strong> repeat transcript and cuts, by<br />

a putative ruler mechanism, with<strong>in</strong> <strong>the</strong> 3′-end (Carte<br />

et al., 2008). In <strong>the</strong> crenarchaeon Sulfolobus acidocaldarius,<br />

evidence was also presented for transcription<br />

occurr<strong>in</strong>g from <strong>the</strong> complementary strand of <strong>the</strong> DNA<br />

spacer (Lillestøl et al., 2006). These results opened for<br />

<strong>the</strong> possibility of an antisense RNA or RNAi-like mechanism<br />

act<strong>in</strong>g ei<strong>the</strong>r on <strong>the</strong> viral/plasmid transcripts or<br />

directly on <strong>the</strong>ir DNA (Lillestøl et al., 2006; Makarova<br />

et al., 2006). Recent studies on <strong>the</strong> P. furiosus have<br />

shown that <strong>the</strong> leader strand spacer RNAs can generate<br />

dist<strong>in</strong>ct RNA–prote<strong>in</strong> complexes (Hale et al., 2008).<br />

Moreover, bio<strong>in</strong>formatical studies on crenarchaeal<br />

<strong>CRISPR</strong>s (Shah et al., 2009), as well as experimental<br />

studies on bacteria (Brouns et al., 2008; Marraff<strong>in</strong>i and<br />

Son<strong>the</strong>imer, 2008), support that spacer RNAs directly<br />

target DNA of extra-chromosomal elements, ra<strong>the</strong>r than<br />

<strong>the</strong>ir mRNAs.<br />

Here, we characterize <strong>CRISPR</strong> families of <strong>the</strong> model<br />

crenarchaeal genus Sulfolobus, and related members of<br />

<strong>the</strong> Sulfolobales, for which several genomes and numerous<br />

viruses and plasmids have been sequenced (Prangishvili<br />

et al., 2006; Brügger, 2007). The families are<br />

classified on <strong>the</strong> basis of <strong>the</strong>ir repeat sequences, leader<br />

region motifs, associated cas genes, and conserved<br />

d<strong>in</strong>ucleotide motifs adjo<strong>in</strong><strong>in</strong>g spacer match<strong>in</strong>g sequences<br />

on viruses and plasmids. Properties of transcripts from<br />

each strand of repeat-clusters <strong>in</strong> S. acidocaldarius chromosomes<br />

and <strong>the</strong> conjugative plasmid pKEF9 are exam<strong>in</strong>ed,<br />

as well as <strong>the</strong> possible formation of double-stranded<br />

spacer RNAs. Moreover, sequenc<strong>in</strong>g and bio<strong>in</strong>formatical<br />

analyses of <strong>the</strong> six repeat-clusters of Sulfolobus solfataricus<br />

stra<strong>in</strong>s P1 and P2 were performed and conclusions<br />

are drawn concern<strong>in</strong>g <strong>the</strong> dynamics of repeat-cluster<br />

development and functions of <strong>the</strong> different <strong>CRISPR</strong><br />

families.<br />

Results<br />

<strong>CRISPR</strong> families <strong>in</strong> Sulfolobales<br />

The repeat-clusters of <strong>the</strong> Sulfolobales are quite diverse<br />

structurally and we attempted to classify <strong>the</strong>m <strong>in</strong>to families<br />

on <strong>the</strong> basis of <strong>the</strong>ir repeat sequences, leader properties,<br />

associated cas genes and conserved sequences<br />

adjo<strong>in</strong><strong>in</strong>g spacer sequence matches on viruses/plasmids.<br />

A total of 48 complete and eight <strong>in</strong>complete repeatclusters<br />

were identified for <strong>the</strong> Sulfolobales, of which at<br />

least 51 carried putative leader sequences, and <strong>the</strong>y<br />

yielded 3685 spacer sequences. Phylogenetic tree build<strong>in</strong>g<br />

based on repeat sequences revealed three ma<strong>in</strong> families<br />

and some m<strong>in</strong>or ones where family I dom<strong>in</strong>ates<br />

(Fig. 1A). Each species typically carries representatives<br />

of two repeat families (Fig. 1A). Analyses of all cas genes<br />

associated with <strong>the</strong> repeat-clusters of <strong>the</strong> Sulfolobales<br />

re<strong>in</strong>forced <strong>the</strong> family divisions. A phylogenetic tree built<br />

from alignments of <strong>the</strong> most conserved cas1 gene, encod<strong>in</strong>g<br />

a predicted <strong>in</strong>tegrase or nuclease (Makarova et al.,<br />

2006), yielded essentially <strong>the</strong> same family tree as <strong>in</strong><br />

Fig. 1A (data not shown). Moreover, <strong>in</strong> an all-aga<strong>in</strong>st-all<br />

comparison of cas genes adjo<strong>in</strong><strong>in</strong>g repeat-clusters, each<br />

gene generally yielded best matches to o<strong>the</strong>r genes of <strong>the</strong><br />

same family, despite family I genes be<strong>in</strong>g overrepresentative<br />

(data not shown).<br />

For <strong>the</strong> leader regions, alignment of 300 bp of each<br />

sequence revealed a large fairly conserved downstream<br />

region which carries multiple dist<strong>in</strong>ct sequence motifs,<br />

most of which are specific for a given <strong>CRISPR</strong> family<br />

(Fig. 1B). Moreover, <strong>the</strong>se classes of motifs show significant<br />

levels of sequence conservation despite some of<br />

<strong>the</strong>m exhibit<strong>in</strong>g low sequence complexity. Of <strong>the</strong>se, motif A<br />

carries 70% aden<strong>in</strong>es, motif B exhibits 95% pur<strong>in</strong>es, motifs<br />

C, F, G and J conta<strong>in</strong> 50–60% thym<strong>in</strong>es, while motifs D, H<br />

and I are more complex. For all families some motifs are<br />

repeated (Fig. 1B). We <strong>in</strong>fer that <strong>the</strong>se motifs are likely to<br />

provide, directly or <strong>in</strong>directly, assembly sites for Cas prote<strong>in</strong>s<br />

<strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g RNA and/or <strong>in</strong> extend<strong>in</strong>g <strong>the</strong><br />

repeat-clusters. Lastly, exam<strong>in</strong>ation of spacer sequence<br />

matches on viruses/plasmids of <strong>the</strong> Sulfolobales revealed<br />

Fig. 1. Family classification of Sulfolobales <strong>CRISPR</strong>s.<br />

A. Phylogenetic tree based on a multiple alignment of repeat sequences show<strong>in</strong>g three ma<strong>in</strong> families I, II and III. <strong>CRISPR</strong>s are labelled by a<br />

four-letter prefix denot<strong>in</strong>g <strong>the</strong> species, and <strong>the</strong> number of repeats.<br />

B. Motif maps for leader regions of <strong>the</strong> three ma<strong>in</strong> <strong>CRISPR</strong> families. The motifs constitute conserved sequences, 30–100 bp <strong>in</strong> length,<br />

show<strong>in</strong>g on average 80% sequence identity. Sequence motifs A, B and C occur <strong>in</strong> more than one family [motif C occurs <strong>in</strong> some unclassified<br />

leaders (Fig. 1A)], whereas <strong>the</strong> o<strong>the</strong>r motifs are family specific. Thus motifs D, E, F and G occur only <strong>in</strong> family I leaders, motifs H and I are<br />

present only <strong>in</strong> family II leaders and motif J is exclusive to family III leaders. Leaders of each family show some variation <strong>in</strong> <strong>the</strong> number and<br />

order of <strong>the</strong> motifs present. Motif A overlaps with <strong>the</strong> transcriptional leader region.<br />

C. Logo-plot (http://weblogo.berkeley.edu/) of <strong>the</strong> motif located immediately upstream of <strong>the</strong> spacer match on viral/plasmid genomes where CC<br />

predom<strong>in</strong>ates <strong>in</strong> 129 matches for family I, TC <strong>in</strong> 23 matches for family II, and GT <strong>in</strong> 19 matches for family III <strong>CRISPR</strong>s, where one bit<br />

corresponds to about 75% presence and two bits correspond to 100%. The logo plots are based exclusively on spacer matches which show a<br />

maximum of five nucleotide mismatches.<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 261


262 R. K. Lillestøl et al. <br />

A<br />

Saci-133<br />

Saci_1871<br />

Saci-11<br />

Saci-5<br />

Saci_1974<br />

B<br />

- 6.5 kb<br />

L<br />

Saci_1975<br />

conserved upstream d<strong>in</strong>ucleotide motifs: CC for family I,<br />

TC for family II and GT for family III which may direct DNA<br />

<strong>in</strong>corporation <strong>in</strong>to <strong>CRISPR</strong>s (Fig. 1C). These may constitute<br />

an archaeal parallel to <strong>the</strong> AGAAA and GGNG motifs<br />

located downstream from bacterial proto-spacers of<br />

S. <strong>the</strong>rmophilus (Horvath et al., 2008a).<br />

Genome contexts of <strong>the</strong> repeat-clusters<br />

L<br />

cas1 cas4<br />

L<br />

Saci-2<br />

cas3<br />

csa3 Saci_2016<br />

Repeat cluster Repeat sequence<br />

The S. acidocaldarius chromosome carries five repeatclusters<br />

with 133, 78, 11, 5 and 2 repeats which fall <strong>in</strong>to<br />

<strong>CRISPR</strong> families II and III (Fig. 1A). Saci-133 and Saci-78<br />

(family III) are physically l<strong>in</strong>ked, with shared cas genes.<br />

They exhibit 95% identical leader sequences adjo<strong>in</strong><strong>in</strong>g<br />

<strong>the</strong> first repeat and carry identical, non-pal<strong>in</strong>dromic<br />

repeats (Fig. 2A and B). Saci-11 and Saci-2 (family II) are<br />

physically l<strong>in</strong>ked by cas genes (Fig. 2A) and carry identi-<br />

L<br />

cas-genes<br />

8.13 kb<br />

pKEF-7<br />

Saci-78<br />

CAG38159 CAG38160<br />

Saci-133/78 GTAATAACGACAAGAAACTAAAAC<br />

Saci-11/2 GATGAATCCCAAAAGGGATTGAAAG<br />

Saci-5 A T<br />

pKEF-7 GTTGCAATTCCCTAAATGTGCGGG<br />

L<br />

- 3.8 kb<br />

cas1 Saci_1882<br />

Fig. 2. A. Diagram show<strong>in</strong>g <strong>the</strong> genomic context of <strong>the</strong> S. acidocaldarius repeat-clusters, and of <strong>the</strong> pKEF9 cluster. Saci-133 and Saci-78<br />

are physically l<strong>in</strong>ked on <strong>the</strong> chromosome, as are Saci-11 and Saci-2. Saci-5 and <strong>the</strong> plasmid cluster pKEF-7 are separate. L denotes <strong>the</strong><br />

leader region. Identities of genes border<strong>in</strong>g <strong>the</strong> clusters or <strong>the</strong>ir GenBank/EMBL assignments are given and <strong>the</strong>ir directions of transcription are<br />

<strong>in</strong>dicated.<br />

B. Repeat sequences where <strong>in</strong>verted repeat sequences are underl<strong>in</strong>ed, and experimentally identified process<strong>in</strong>g sites are marked with ‘ ’s.<br />

1 kb<br />

cal leader sequences while <strong>the</strong> more divergent Saci-5<br />

(family II) exhibits a repeat with two base pair changes<br />

and a leader sequence show<strong>in</strong>g 75% sequence identity<br />

(Chen et al., 2005; Lillestøl et al., 2006). All of <strong>the</strong> family II<br />

repeats carry a 5 bp <strong>in</strong>verted repeat (Fig. 2B). Repeatclusters<br />

Saci-5 and Saci-2 each carry a degenerate<br />

repeat, distal to <strong>the</strong> leader region. The repeat-cluster<br />

(pKEF-7) of conjugative plasmid pKEF9 carries no leader<br />

sequence and no associated cas genes (Fig. 2A) (Greve<br />

et al., 2004).<br />

Repeat-clusters generate s<strong>in</strong>gle transcripts cover<strong>in</strong>g <strong>the</strong><br />

whole cluster<br />

In order to <strong>in</strong>vestigate transcripts formed dur<strong>in</strong>g <strong>the</strong><br />

growth cycle, RNA was extracted from S. acidocaldarius,<br />

and from S. solfataricus P2 conjugated with pKEF9, har-<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


exp. stat.<br />

1 2 3 4<br />

vested at different stages of exponential growth and,<br />

for <strong>the</strong> former, stationary phase. Oligonucleotide probes<br />

complementary to spacers of <strong>the</strong> repeat-clusters were<br />

tested <strong>in</strong> Nor<strong>the</strong>rn blot analyses. Initially, Saci-78 transcripts<br />

were probed for spacer 4, adjacent to <strong>the</strong> leader<br />

region, and <strong>the</strong> results demonstrate that process<strong>in</strong>g<br />

<strong>in</strong>creased progressively as stationary phase was<br />

approached (Fig. 3). The maximum transcript size, about<br />

5000 nt, exceeds <strong>the</strong> size of <strong>the</strong> 4624 bp repeat-cluster,<br />

<strong>in</strong>dicat<strong>in</strong>g that <strong>the</strong> whole cluster was transcribed (Fig. 3).<br />

However, <strong>the</strong> majority of detected transcripts fall <strong>in</strong> <strong>the</strong><br />

size range 3000–3500 nt suggest<strong>in</strong>g that endogenous<br />

degradation, process<strong>in</strong>g or premature term<strong>in</strong>ation had<br />

occurred towards <strong>the</strong> 3′-end. Evidence for <strong>the</strong> formation of<br />

whole transcripts was also obta<strong>in</strong>ed for each of <strong>the</strong> small<br />

repeat-clusters Saci-5 (Lillestøl et al., 2006), Saci-11 and<br />

Saci-2 (data not shown), and pKEF-7 (see below).<br />

Transcription from <strong>the</strong> leader strand<br />

6.0<br />

5.0<br />

4.0<br />

3.0<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

Fig. 3. Nor<strong>the</strong>rn blot of Saci-78 transcripts us<strong>in</strong>g an<br />

oligonucleotide probe aga<strong>in</strong>st spacer 4. Ten microgram RNA was<br />

isolated from S. acidocaldarius cells harvested at: (1) early log<br />

phase, (2) late log phase, (3) early stationary phase and (4) late<br />

stationary phase. RNA size markers (0.5–9 kb) were<br />

co-electrophoresed and excised from <strong>the</strong> gel prior to RNA blott<strong>in</strong>g.<br />

In order to test whe<strong>the</strong>r transcription <strong>in</strong>itiated at s<strong>in</strong>gle or<br />

multiple sites, we determ<strong>in</strong>ed start sites at <strong>the</strong> leader of<br />

Saci-133 and Saci-5 by identify<strong>in</strong>g RNA fragments carry<strong>in</strong>g<br />

5′-term<strong>in</strong>al triphosphates us<strong>in</strong>g tobacco acid phos-<br />

M<br />

phatase <strong>in</strong> 5′-RLM RACE procedures. The results<br />

demonstrate that start sites occur immediately upstream<br />

from <strong>the</strong> first repeat sequence for both repeat-clusters<br />

(Fig. 4A and B) and <strong>the</strong> start sites are preceded upstream<br />

by archaeal BRE/TATA motifs (Torar<strong>in</strong>sson et al., 2005).<br />

For Saci-133, transcription <strong>in</strong>itiated at <strong>the</strong> sequence<br />

GATGG, 17 nt upstream from <strong>the</strong> first repeat (Fig. 4A;<br />

Table 1). An identical sequence/motif pattern occurs for<br />

Saci-78. A different pattern was found for <strong>the</strong> family II<br />

clusters Saci-11, Saci-5 and Saci-2 where transcription<br />

<strong>in</strong>itiates at <strong>the</strong> sequence AAGGG, 21 nt upstream from<br />

<strong>the</strong> first repeat and is also preceded by archaeal promoter<br />

motifs (Fig. 4B; Table 1).<br />

We probed for transcripts <strong>in</strong>itiat<strong>in</strong>g at <strong>the</strong> leader of Saci-<br />

133 us<strong>in</strong>g oligonucleotides complementary to spacers 5,<br />

6, 59 and 131. Strong signals were obta<strong>in</strong>ed for each<br />

spacer (Fig. 5A) consistent with <strong>the</strong> whole cluster be<strong>in</strong>g<br />

transcribed <strong>in</strong> fairly high yield, as was demonstrated for<br />

Saci-78 (Fig. 3). The low level of larger transcripts<br />

detected with probes aga<strong>in</strong>st spacers 59 and 131 sug-<br />

A Saci-133<br />

M - +<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

0<br />

500<br />

400<br />

300<br />

200<br />

100<br />

Start<br />

4(9)<br />

5(3)<br />

5(23)<br />

6(3)<br />

6(15)<br />

B Saci-5<br />

500<br />

400<br />

300<br />

200<br />

100<br />

M - +<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 263<br />

Start<br />

1(8)<br />

1(17)<br />

C Saci-133<br />

0<br />

500<br />

400<br />

300<br />

200<br />

100<br />

M<br />

133(10)<br />

132(11)<br />

131(22)<br />

Fig. 4. Determ<strong>in</strong>ation of <strong>the</strong> transcriptional start sites, and<br />

process<strong>in</strong>g sites of RNA products generated from Saci-133 and<br />

Saci-5 us<strong>in</strong>g <strong>the</strong> 5′-RLM RACE and 3′-RLM RACE procedures.<br />

A. Determ<strong>in</strong>ation of 5′-ends of transcripts from Saci-133 where<br />

RNA was treated with (+) and without (-) tobacco acid phosphatase<br />

to remove 5′-phosphates from <strong>the</strong> 5′-end of <strong>the</strong> <strong>in</strong>itial transcript,<br />

and an oligonucleotide primer specific for spacer 7 was employed.<br />

Bands exclusive to <strong>the</strong> + lane reta<strong>in</strong> <strong>the</strong> transcriptional start site<br />

whereas bands present <strong>in</strong> both + and - lanes represent transcripts<br />

which have been processed at <strong>the</strong> 5′-end.<br />

B. Determ<strong>in</strong>ation of 5′-ends of transcripts from Saci-5 us<strong>in</strong>g an<br />

oligonucleotide primer specific for spacer 1. The band show<strong>in</strong>g <strong>in</strong><br />

<strong>the</strong> + lane of about 160 bp is an artefact, where sequenc<strong>in</strong>g<br />

revealed that two adapters had ligated to each o<strong>the</strong>r and to <strong>the</strong><br />

start site.<br />

C. 3′-RLM RACE experiment performed on transcripts from<br />

Saci-133 us<strong>in</strong>g a primer specific for spacer 130. The three bands<br />

represent transcripts which have been processed with<strong>in</strong> <strong>the</strong><br />

term<strong>in</strong>al repeats 131, 132 and 133. In each experiment process<strong>in</strong>g<br />

sites are <strong>in</strong>dicated by number of <strong>the</strong> repeat (from <strong>the</strong> leader) where<br />

<strong>the</strong> position of <strong>the</strong> 5′-nucleotide with<strong>in</strong> <strong>the</strong> repeat is given <strong>in</strong><br />

brackets.


264 R. K. Lillestøl et al. <br />

Table 1. Overview of promoters, transcriptional start sites and process<strong>in</strong>g sites <strong>in</strong> repeat-clusters Saci-133, Saci-5 and pKEF-7 identified by <strong>the</strong><br />

5′-RLM RACE method.<br />

Cluster BRE-TATA Start<br />

gests that transcript process<strong>in</strong>g occurs primarily from <strong>the</strong><br />

3′-end (Fig. 5A). Prob<strong>in</strong>g of <strong>the</strong> repeat also revealed a<br />

similar series of bands except for <strong>the</strong> smallest RNAs<br />

(Fig. 5A).<br />

Distance from<br />

first repeat Process<strong>in</strong>g sites<br />

Saci-133 GAAAATATTTATAAA GATGG +17 nt 4 (9), 5 (3), 5 (23), 6 (3), 6 (18)<br />

Saci-5 GCAAAAGTTTATTAA AAGGG +21 nt 1 (8), 1 (17)<br />

pKEF-7 GAAAAAGTTTATTA AATCT +32 nt +23, 1 (24)<br />

Putative BRE and TATA motif sequences are located approximately 25 bp upstream from transcription start sites and <strong>the</strong> process<strong>in</strong>g sites with<strong>in</strong><br />

<strong>the</strong> repeats (numbered from <strong>the</strong> leader region) give <strong>the</strong> position of <strong>the</strong> 5′-nucleotide <strong>in</strong> brackets.<br />

A<br />

150<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

M1 repeat 133(5) 133(6) 133(59) 133(131) M2<br />

B<br />

M1 repeat 133(5) 133(6) 133(60) 133(131) M2<br />

150<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

500<br />

400<br />

300<br />

200<br />

100<br />

40-50<br />

500<br />

400<br />

300<br />

200<br />

100<br />

50-60<br />

Fig. 5. Nor<strong>the</strong>rn blot analyses of Saci-133 transcripts. The repeat<br />

sequence and spacers at positions 5 (37 nt), 6 (42 nt), 59 (41 nt)<br />

and 131 (36 nt) from <strong>the</strong> leader, were probed with oligonucleotides<br />

to detect: (A) transcripts <strong>in</strong>itiat<strong>in</strong>g with<strong>in</strong> <strong>the</strong> leader sequence, and<br />

(B) transcripts generated from <strong>the</strong> complementary strand. Twenty<br />

microgram RNA was isolated from cells grown to stationary phase.<br />

RNA size markers of 10–150 nt (M1) and 100–2000 nt (M2) are<br />

aligned approximately with <strong>the</strong> transcript lanes.<br />

The larger products observed with both spacer and<br />

repeat probes correspond <strong>in</strong> size to transcripts of multiple<br />

repeat-spacer units (Fig. 5A) whereas <strong>the</strong> smallest products<br />

seen when us<strong>in</strong>g spacer probes range <strong>in</strong> size from a<br />

s<strong>in</strong>gle repeat-spacer unit (62–68 nt) to <strong>the</strong> spacer (40 nt)<br />

suggest<strong>in</strong>g that progressive exoribonuclease trimm<strong>in</strong>g<br />

occurs with<strong>in</strong> repeats flank<strong>in</strong>g <strong>the</strong> spacer, consistent with<br />

<strong>the</strong> <strong>in</strong>ability to detect <strong>the</strong> smallest RNAs with <strong>the</strong> repeat<br />

probe (Fig. 5A) and <strong>the</strong> earlier observation for Saci-5<br />

(Lillestøl et al., 2006). Saci-11 and Saci-2 were also<br />

probed with spacer-specific oligonucleotides, and Nor<strong>the</strong>rn<br />

blots yielded closely comparable patterns (data not<br />

shown).<br />

5′-RLM RACE analyses of Saci-133 and Saci-5 also<br />

revealed process<strong>in</strong>g sites (Fig. 4A and B). Multiple process<strong>in</strong>g<br />

sites were identified throughout <strong>the</strong> repeats for<br />

Saci-133 but conf<strong>in</strong>ed to <strong>the</strong> <strong>in</strong>verted repeat for Saci-5 at<br />

positions 8 and 17 (Table 1; Fig. 2B).<br />

3′-Term<strong>in</strong>i of Saci-133 transcripts were also determ<strong>in</strong>ed<br />

by <strong>the</strong> 3′-RLM RACE method employ<strong>in</strong>g a probe aga<strong>in</strong>st<br />

spacer 130. Three ma<strong>in</strong> bands were produced which,<br />

on sequenc<strong>in</strong>g, revealed process<strong>in</strong>g sites distributed<br />

throughout term<strong>in</strong>al repeats 131, 132 and 133, at positions<br />

10, 11 and 22 (Fig. 4C). The absence of fur<strong>the</strong>r<br />

downstream bands suggested that <strong>the</strong> transcript term<strong>in</strong>us<br />

had been efficiently excised.<br />

In order to confirm that process<strong>in</strong>g occurred exclusively<br />

with<strong>in</strong> repeat sequences, Saci-133 transcripts on<br />

one membrane were probed, successively, by spacer<br />

5-specific, and <strong>the</strong>n repeat-specific, probes. Both probes<br />

yielded similar patterns for <strong>the</strong> larger transcripts but <strong>the</strong><br />

smallest transcripts were only detected with spacerspecific<br />

probes (Fig. 5A). Thus, <strong>the</strong> f<strong>in</strong>al process<strong>in</strong>g step<br />

occurs <strong>in</strong> <strong>the</strong> repeat leav<strong>in</strong>g <strong>the</strong> spacer <strong>in</strong>tact.<br />

Complementary strand is transcribed<br />

In a prelim<strong>in</strong>ary experiment, we demonstrated that Saci-5<br />

transcripts are produced from both DNA strands (Lillestøl<br />

et al., 2006). As this raised <strong>the</strong> possibility that dsRNA<br />

<strong>in</strong>termediates could be formed, we studied <strong>the</strong>se effects<br />

more <strong>system</strong>atically for Saci-133 Saci-78, Saci-11, Saci-5<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


and Saci-2. Transcripts from <strong>the</strong> complementary DNA<br />

strand of Saci-133 were probed for spacers 5, 6, 60 and<br />

131 (numbered from <strong>the</strong> leader). Each showed strong<br />

signals (Fig. 5B) but <strong>the</strong>y differed from those of leader<br />

strand transcripts <strong>in</strong> that products were less regular <strong>in</strong> size<br />

and larger transcripts prevailed. Never<strong>the</strong>less, <strong>the</strong><br />

smallest product for each spacer probe was a discrete<br />

band of about 55 nt (Fig. 5B). Similarly sized RNAs<br />

were observed when prob<strong>in</strong>g each of <strong>the</strong> o<strong>the</strong>r four<br />

S. acidocaldarius repeat-clusters (data not shown), consistent<br />

with <strong>the</strong> earlier observation for Saci-5 (Lillestøl<br />

et al., 2006). These small RNAs must conta<strong>in</strong> all or most<br />

of <strong>the</strong> spacer sequence because <strong>the</strong> strong band<br />

observed with each spacer probe was not detected with a<br />

repeat probe (Fig. 5B).<br />

Nor<strong>the</strong>rn analyses of each of <strong>the</strong> chromosomal repeatclusters<br />

<strong>in</strong>dicated strong signals for all tested spacer<br />

probes, <strong>in</strong>dicat<strong>in</strong>g that transcription from <strong>the</strong> complementary<br />

strand occurred throughout each cluster, as is illustrated<br />

for Saci-133 (Fig. 5B). This result was re<strong>in</strong>forced by<br />

a Nor<strong>the</strong>rn blot analysis <strong>in</strong> which <strong>the</strong> complementary<br />

strand transcripts from <strong>the</strong> Saci-5 cluster were probed for<br />

spacer 1, adjacent to <strong>the</strong> leader region, and <strong>the</strong> largest<br />

transcript (430 nt) exceeds <strong>the</strong> m<strong>in</strong>imal size of <strong>the</strong> repeatcluster<br />

(300 bp) (Fig. 6). Moreover, each of <strong>the</strong> five clusters<br />

carries at least one putative promoter BRE/TATA<br />

motif with<strong>in</strong> 50 bp of <strong>the</strong> term<strong>in</strong>al repeat of <strong>the</strong> repeatcluster.<br />

In addition, <strong>the</strong>re are no open read<strong>in</strong>g frames<br />

(ORFs) with<strong>in</strong> at least 3 kb of <strong>the</strong> putative promoters, on<br />

500<br />

400<br />

300<br />

200<br />

100<br />

M Saci-5 0<br />

430<br />

190<br />

Fig. 6. Nor<strong>the</strong>rn blot analysis of transcripts from <strong>the</strong><br />

complementary strand of Saci-5, prob<strong>in</strong>g for spacer 1, adjacent to<br />

<strong>the</strong> leader region. RNA size markers of 100–2000 nt (M1) are<br />

aligned approximately with <strong>the</strong> transcript lanes. The size of <strong>the</strong><br />

smallest transcript was estimated us<strong>in</strong>g an <strong>in</strong>dependent,<br />

co-electrophoesed, size maker as shown <strong>in</strong> Fig. 5.<br />

52<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 265<br />

<strong>the</strong> complementary DNA strand, for any chromosomal<br />

repeat-clusters, except Saci-2 (Fig. 2B).<br />

A comparison of transcript yields from <strong>the</strong> leader and<br />

complementary strands <strong>in</strong>dicated qualitatively similar<br />

expression levels from both strands of Saci-133 (Fig. 5A<br />

and B). This is difficult to quantify accurately because of<br />

<strong>the</strong> complexity and diversity of <strong>the</strong> RNA fragment patterns<br />

(Fig. 5A and B) but qualitatively similar transcription levels<br />

were observed for all five repeat-clusters.<br />

The possibility that functional dsRNAs were generated<br />

between spacer transcripts from each DNA strand was<br />

tested by a ribonuclease digestion approach us<strong>in</strong>g<br />

ssRNA-specific enzymes RNase T1 and RNase U2 which<br />

cleave preferentially 3′- to G and A residues respectively,<br />

but do not cleave regular dsRNA (Christiansen et al.,<br />

1990). Total RNA from S. acidocaldarius was treated with<br />

<strong>in</strong>creas<strong>in</strong>g concentrations of each ribonuclease and<br />

Nor<strong>the</strong>rn blots were obta<strong>in</strong>ed by prob<strong>in</strong>g for spacer 6 of<br />

Saci-133 transcripts from each strand. The results<br />

revealed progressive cleavage of both <strong>the</strong> leader and<br />

complementary strand transcripts at <strong>in</strong>creas<strong>in</strong>g ribonuclease<br />

concentrations but no resistant dsRNA band of<br />

about 40 bp was detected (data not shown). This may<br />

reflect that specific prote<strong>in</strong>–ssRNA complexes form as<br />

was shown for <strong>the</strong> lead<strong>in</strong>g strand spacer RNA of<br />

P. furiosus (Hale et al., 2008).<br />

pKEF-7 transcripts are processed <strong>in</strong> both repeats<br />

and spacers<br />

Despite its lack of associated cas genes and leader<br />

region, we considered <strong>the</strong> pKEF9 repeat-cluster to be a<br />

<strong>CRISPR</strong> <strong>system</strong> because three of <strong>the</strong> six spacers match<br />

to Sulfolobus viruses, spacer 3 to rudiviruses and<br />

spacers 5 and 6 to fuselloviruses. This is consistent with<br />

<strong>the</strong> conjugative plasmid regulat<strong>in</strong>g <strong>the</strong> viruses <strong>in</strong>tracellularly.<br />

Therefore, RNA was isolated from S. solfataricus<br />

P2 14 h after conjugat<strong>in</strong>g with pKEF9 before plasmid<br />

levels rapidly decl<strong>in</strong>e. For <strong>the</strong> predicted leader strand<br />

(Fig. 2A), 5′-ends were determ<strong>in</strong>ed by 5′-RLM RACE<br />

analyses us<strong>in</strong>g a primer specific for spacer 1. The<br />

results revealed a s<strong>in</strong>gle transcript start site, 32 nt<br />

upstream from <strong>the</strong> first repeat, preceded by promoter<br />

motifs (Table 1). Process<strong>in</strong>g sites were also identified<br />

23 nt upstream from <strong>the</strong> first repeat, and at <strong>the</strong> junction<br />

of <strong>the</strong> first repeat and spacer (Fig. 7A; Table 1). Nor<strong>the</strong>rn<br />

blott<strong>in</strong>g experiments were <strong>the</strong>n performed prob<strong>in</strong>g each<br />

half of each spacer sequence, as well as <strong>the</strong> repeat<br />

(Fig. 7B). The results for each probe revealed a largest<br />

product of about 465 nt, correspond<strong>in</strong>g <strong>in</strong> size to a transcript<br />

from <strong>the</strong> whole repeat-cluster. The transcript patterns<br />

<strong>in</strong>dicated that smaller products disappeared,<br />

stepwise, as one probed along <strong>the</strong> transcript <strong>in</strong> a 5′ to 3′<br />

direction (Fig. 7B). The experiment was repeated, after


266 R. K. Lillestøl et al. <br />

A<br />

400<br />

300<br />

200<br />

100<br />

M - +<br />

B<br />

465<br />

410<br />

345<br />

285<br />

245<br />

210<br />

183<br />

Start 165<br />

+32<br />

148<br />

+23<br />

145<br />

139<br />

1(24)<br />

104<br />

95<br />

Fig. 7. Transcription from <strong>the</strong> pKEF-7 cluster.<br />

A. 5′-RLM RACE analyses of <strong>the</strong> transcriptional start site and process<strong>in</strong>g sites near <strong>the</strong> start of <strong>the</strong> transcript. RNA was treated with (+) and<br />

without (-) tobacco acid phosphatase.<br />

B. Nor<strong>the</strong>rn blot analyses of transcripts from <strong>the</strong> pKEF-7 cluster us<strong>in</strong>g oligonucleotide probes specific for <strong>the</strong> left (L) and right (R) halves of<br />

spacers 1, 2, 3, 4, 5, 6 and <strong>the</strong> repeat sequence respectively.<br />

C. Nor<strong>the</strong>rn blot analyses of transcripts from <strong>the</strong> complementary strand from <strong>the</strong> pKEF-7 cluster us<strong>in</strong>g oligonucleotide probes specific for<br />

spacer 6 and <strong>the</strong> repeat sequence. RNA was isolated 14 h after conjugation <strong>in</strong> A, B and C. RNA size markers of 100–500 nt (M) are aligned<br />

and approximate fragment sizes are given.<br />

conjugat<strong>in</strong>g S. solfataricus P2 for 20 h, when <strong>the</strong> smaller<br />

transcripts observed for spacer 1 were also seen for <strong>the</strong><br />

o<strong>the</strong>r spacers, consistent with <strong>in</strong>creased process<strong>in</strong>g<br />

hav<strong>in</strong>g occurred as stationary phase was approached<br />

(data not shown).<br />

The transcript patterns are complicated by <strong>the</strong> presence<br />

of sets of weak and strong signals (Fig. 7B) where <strong>the</strong><br />

former match those of <strong>the</strong> S. acidocaldarius clusters<br />

(above) <strong>in</strong> size and putative process<strong>in</strong>g <strong>in</strong> repeats<br />

(Table 2). For example, <strong>the</strong> 95 nt and 104 nt transcripts<br />

observed for <strong>the</strong> spacer 1 probe are consistent <strong>in</strong> size with<br />

<strong>the</strong>ir extend<strong>in</strong>g from <strong>the</strong> start site, or process<strong>in</strong>g site 9 nt<br />

downstream (Fig. 7A, Table 1), to a process<strong>in</strong>g site <strong>in</strong><br />

repeat 2 (Table 2). For <strong>the</strong> stronger transcripts, differences<br />

were observed when prob<strong>in</strong>g each half of <strong>the</strong><br />

spacer transcripts (Fig. 7B). Probes upstream halves (L)<br />

revealed smaller fragments than prob<strong>in</strong>g downstream<br />

halves (R), seen most dramatically for probes aga<strong>in</strong>st<br />

spacer 2 (139–148 nt) and spacer 3 (210 nt) (Fig. 7B).<br />

The strong transcripts are consistent <strong>in</strong> size with <strong>the</strong>ir<br />

extend<strong>in</strong>g from <strong>the</strong> <strong>in</strong>itiation, or downstream process<strong>in</strong>g,<br />

site to <strong>the</strong> downstream (R) spacer halves (Table 2).<br />

The repeat probe yielded a similar transcript pattern as<br />

for spacer 2 and differed from that for spacer 1 (Fig. 7B)<br />

probably because of non-anneal<strong>in</strong>g of <strong>the</strong> primer to <strong>the</strong><br />

degenerate first repeat. This was re<strong>in</strong>forced by <strong>the</strong> lack of<br />

process<strong>in</strong>g <strong>in</strong> <strong>the</strong> first repeat sequence, as determ<strong>in</strong>ed by<br />

<strong>the</strong> 5′-RLM RACE method (Fig. 7A). Transcripts from <strong>the</strong><br />

complementary DNA strand were also detected prob<strong>in</strong>g<br />

for spacer 6 and <strong>the</strong> repeat sequence (Fig. 7C), and tran-<br />

500<br />

400<br />

300<br />

200<br />

100<br />

scripts <strong>in</strong> <strong>the</strong> size range 185–480 nt were observed for <strong>the</strong><br />

former and 145–480 nt for <strong>the</strong> latter, similar <strong>in</strong> size to<br />

transcripts observed from <strong>the</strong> leader strand. The absence<br />

of spacer-sized RNAs from ei<strong>the</strong>r DNA strand could<br />

reflect that <strong>the</strong> f<strong>in</strong>al RNA process<strong>in</strong>g enzymes are activated<br />

ma<strong>in</strong>ly <strong>in</strong> stationary phase (Fig. 3) or <strong>in</strong>compatibility<br />

M1L 1R 2L 2R 3L 3R 4L 4R 5L 5R 6L 6R Rep<br />

C<br />

6<br />

Rep<br />

M<br />

500<br />

400<br />

300<br />

200<br />

100<br />

Table 2. Summary of transcriptional start sites and estimated sizes<br />

and process<strong>in</strong>g sites for transcripts deriv<strong>in</strong>g from pKEF-7 as illustrated<br />

<strong>in</strong> Fig. 7B.<br />

Transcript Start Stop<br />

Weak (normal) Repeat/(position)<br />

95/104 +23/+32 2 (9)<br />

165/183 +23/+32 3 (22)<br />

245 +23/+32 4 (19)<br />

Strong (abnormal) Spacer (position)<br />

139/148 +23/+32 2 (29)<br />

210 +23/+32 3 (25)<br />

285 +23/+32 4 (35)<br />

345 +23/+32 5 (23)<br />

410 +23/+32 6 (30)<br />

Weak (abnormal)<br />

145 1 (24) 3 (16)<br />

Transcripts <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> normal/weak category appear to be<br />

processed <strong>in</strong> <strong>the</strong> same manner as <strong>the</strong> S. acidocaldarius clusters.<br />

Process<strong>in</strong>g sites are localized by <strong>the</strong> repeat number and <strong>the</strong><br />

estimated nucleotide position <strong>in</strong> brackets. Transcripts <strong>in</strong> <strong>the</strong> abnormal<br />

category are processed <strong>in</strong> <strong>the</strong> right half of spacers (position denoted<br />

by <strong>the</strong> spacer number and <strong>the</strong> estimated nucleotide position <strong>in</strong><br />

brackets). +32 denotes <strong>the</strong> transcriptional <strong>in</strong>itiation site and +23 and<br />

1 (24) <strong>in</strong>dicates process<strong>in</strong>g sites identified by <strong>the</strong> 5′-RLM RACE<br />

method.<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


Fig. 8. Patterns of repeat-spacer units <strong>in</strong> repeat-clusters A–F of S. solfataricus stra<strong>in</strong> P1 are aligned with those from S. solfataricus stra<strong>in</strong> P2<br />

(She et al., 2001), where each arrowhead represents a s<strong>in</strong>gle spacer-repeat unit, and <strong>the</strong> number to <strong>the</strong> right <strong>in</strong>dicates <strong>the</strong> total number of<br />

units. Grey boxed regions <strong>in</strong>dicate sequences that are identical for a given pair of clusters. Blackened units lie with<strong>in</strong> <strong>the</strong>se conserved regions<br />

but yield no matches to viruses/plasmids. Spacers which yield significant matches to viruses or plasmids are colour-coded as <strong>in</strong>dicated on <strong>the</strong><br />

figure. Boxes to <strong>the</strong> left of <strong>the</strong> clusters represent leader regions that are coloured accord<strong>in</strong>g to <strong>the</strong> leader family, blue – family I, purple –<br />

family II (Fig. 1B). The larger arrowhead <strong>in</strong> cluster D of stra<strong>in</strong> P1 represents a 899 bp pNOB8-like fragment, and <strong>the</strong> large arrowhead <strong>in</strong> cluster<br />

F denotes a 106 bp <strong>in</strong>sert with two atypical repeat sequences and abnormal spacer regions. Prelim<strong>in</strong>ary data on clusters B, C and E were<br />

presented earlier (Lillestøl et al., 2006).<br />

between process<strong>in</strong>g enzymes and <strong>the</strong> plasmid repeat<br />

sequence (Carte et al., 2008).<br />

Functional properties of <strong>the</strong> <strong>CRISPR</strong> families<br />

For S. acidocaldarius <strong>the</strong> 297 spacer sequences yield<br />

only 44 (15%) significant matches to virus/plasmid<br />

sequences, relatively few compared with up to 40% for<br />

o<strong>the</strong>r Sulfolobales genomes (Lillestøl et al., 2006; Shah<br />

et al., 2009). Therefore, to ga<strong>in</strong> more <strong>in</strong>sight <strong>in</strong>to <strong>the</strong> functional<br />

diversity of different <strong>CRISPR</strong> families, we completed<br />

<strong>the</strong> sequenc<strong>in</strong>g of <strong>the</strong> six repeat-clusters A–F of<br />

S. solfataricus stra<strong>in</strong> P1 because, although repeatclusters<br />

B, C and E share regions of perfectly conserved<br />

spacer-repeat sequences with S. solfataricus stra<strong>in</strong> P2,<br />

<strong>the</strong>y also yielded many additional virus/plasmid sequence<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 267<br />

matches (Lillestøl et al., 2006). The primary structures of<br />

repeat-clusters A–F of stra<strong>in</strong> P1 are displayed toge<strong>the</strong>r<br />

with those of stra<strong>in</strong> P2 (She et al., 2001) <strong>in</strong> Fig. 8, where<br />

<strong>the</strong> locations and distributions of virus/plasmid matches<br />

are <strong>in</strong>dicated.<br />

Repeat-clusters A and B represent family II <strong>CRISPR</strong>s,<br />

while C, D, E and F belong to family I (Fig. 1A). Each<br />

repeat-cluster of stra<strong>in</strong>s P1 and P2 shares identical<br />

regions of sequence enclosed <strong>in</strong> grey boxes (Fig. 8).<br />

While cluster pairs E and F are identical, <strong>the</strong> o<strong>the</strong>rs all<br />

show evidence of repeat-spacer units hav<strong>in</strong>g been added<br />

at <strong>the</strong> leader region, after separation of <strong>the</strong> stra<strong>in</strong>s,<br />

although <strong>the</strong> repeat-cluster sizes and apparent rates of<br />

extension differ greatly. Repeat-clusters B from stra<strong>in</strong> P1<br />

and D from stra<strong>in</strong> P2 show evidence of putative deletions<br />

of 21 and 45 repeat spacer units respectively, and <strong>the</strong>re


268 R. K. Lillestøl et al. <br />

Fig. 9. Pie plots for <strong>the</strong> ma<strong>in</strong> <strong>CRISPR</strong> families I, II and III of <strong>the</strong> Sulfolobales where <strong>the</strong> percentage of spacer sequence matches are given<br />

for <strong>the</strong> different crenarchaeal viral families and plasmid classes which are colour-coded. Spacer matches <strong>in</strong>vestigated for each family: family I<br />

(2031 spacers tested, 771 significant matches), family II (710 spacers tested, 230 significant matches) and family III (298 spacers tested, 88<br />

significant matches).<br />

are m<strong>in</strong>or differences with<strong>in</strong> <strong>the</strong> conserved regions of<br />

cluster A. Moreover, cluster A shares a sequence of four<br />

repeat-spacer units with cluster B of stra<strong>in</strong> P1, suggest<strong>in</strong>g<br />

that homologous recomb<strong>in</strong>ation has occurred between<br />

different clusters of <strong>the</strong> same family. Importantly, <strong>the</strong><br />

downstream ends of each pair of repeat-clusters are conserved<br />

which suggests that <strong>the</strong> clusters lose <strong>the</strong>ir repeatspacer<br />

units primarily by <strong>in</strong>ternal deletions (Fig. 8).<br />

Cluster D of stra<strong>in</strong> P1 and cluster F of both stra<strong>in</strong>s carry<br />

anomalous <strong>in</strong>serts. The former is an 899 bp region<br />

show<strong>in</strong>g a significant sequence match to <strong>the</strong> conjugative<br />

plasmid pNOB8 (She et al., 1998) while <strong>the</strong> latter region<br />

carries a degenerate repeat-spacer region with a different<br />

repeat sequence and an abnormally sized spacer, possibly<br />

also of plasmid orig<strong>in</strong>.<br />

The absence of newly added repeat-spacer units to<br />

cluster F, and <strong>the</strong> lack of a leader region (Fig. 8), raised<br />

<strong>the</strong> question as to whe<strong>the</strong>r <strong>the</strong> cluster was active. Therefore,<br />

we probed for spacer 11 of cluster F of stra<strong>in</strong> P2<br />

us<strong>in</strong>g a Nor<strong>the</strong>rn blot analysis. A similar fragment pattern<br />

was obta<strong>in</strong>ed as for <strong>the</strong> S. acidocaldarius clusters<br />

(Fig. 5A) except that small spacer RNAs (< 66 nt) were<br />

absent (data not shown). This <strong>in</strong>dicated, as for pKEF-7<br />

(Fig. 7B), a defective f<strong>in</strong>al process<strong>in</strong>g stage which could<br />

be caused by <strong>the</strong> lack of a leader region and/or <strong>the</strong><br />

absence of some physically l<strong>in</strong>ked cas genes.<br />

The number of significant spacer matches to viruses/<br />

plasmids was 39% and 38% for stra<strong>in</strong>s P1 and P2 respectively,<br />

which carry a total of 431 and 417 spacers<br />

respectively. The colour cod<strong>in</strong>g of <strong>the</strong> matches (Fig. 8)<br />

reveals some apparent biases. For example, <strong>the</strong>re is a<br />

high proportion of bicaudaviral matches <strong>in</strong> <strong>the</strong> newly<br />

added spacers of cluster D (family I), for both stra<strong>in</strong>s,<br />

which contrasts with <strong>the</strong> high proportion of rudiviral<br />

matches <strong>in</strong> clusters A and B (family II) and suggests that<br />

<strong>in</strong>dividual <strong>CRISPR</strong> families exhibit a preference for certa<strong>in</strong><br />

extra-chromosomal elements. To test this hypo<strong>the</strong>sis<br />

fur<strong>the</strong>r, data for significant spacer sequence matches to<br />

viruses/plasmids for <strong>the</strong> three ma<strong>in</strong> <strong>CRISPR</strong> families of<br />

<strong>the</strong> Sulfolobales were summarized <strong>in</strong> Pie plots (Fig. 9).<br />

The overall ratios of spacer matches to viruses/plasmids,<br />

for families I, II and III, are 3.5, 2.0 and 3.5 respectively,<br />

suggest<strong>in</strong>g that <strong>the</strong> family II <strong>CRISPR</strong>s have a relative bias<br />

to plasmids. Although no absolute biases are apparent<br />

(Fig. 9), rudiviral matches dom<strong>in</strong>ate for family III and conjugative<br />

plasmid matches are enhanced for family II<br />

<strong>CRISPR</strong>s. The rudiviruses, lipothrixviruses and conjugative<br />

plasmids, which predom<strong>in</strong>ate <strong>in</strong> <strong>the</strong> Pie plot, are all<br />

abundant environmentally (Greve et al., 2004; Bize et al.,<br />

2008; Vestergaard et al., 2008).<br />

Discussion<br />

Biogenesis of small archaeal RNAs appears to proceed<br />

from a full-length s<strong>in</strong>gle-stranded primary transcript that is<br />

cleaved by endoribonucleases as was recently reported<br />

for <strong>the</strong> Cas6 prote<strong>in</strong> <strong>in</strong> P. furiosus (Carte et al., 2008). This<br />

suggests that <strong>the</strong> mechanism of cleavage <strong>in</strong> archaea<br />

is dist<strong>in</strong>ct from <strong>the</strong> Dicer endoribonuclease-dependent<br />

mechanism generat<strong>in</strong>g si- and miRNAs <strong>in</strong> eukarya.<br />

However, eukarya also generate small RNAs by Dicer<strong>in</strong>dependent<br />

mechanisms such as seen for piRNA-like<br />

species, and although <strong>the</strong> mechanism of biogenesis of <strong>the</strong><br />

latter <strong>in</strong> terms of trans-act<strong>in</strong>g factors is unresolved, certa<strong>in</strong><br />

aspects are rem<strong>in</strong>iscent of <strong>the</strong> process observed <strong>in</strong> this<br />

study. In particular, <strong>the</strong> presence of an <strong>in</strong>dependently processed<br />

complementary RNA strand has been reported<br />

(reviewed <strong>in</strong> Klattenhoff and Theurkauf, 2008). As <strong>the</strong>re is<br />

no evidence for an RNA-dependent RNA polymerase <strong>in</strong><br />

Sulfolobus, transcription of <strong>the</strong> complementary strand is<br />

likely to be dictated by <strong>the</strong> putative promoter elements<br />

located immediately downstream from <strong>the</strong> <strong>CRISPR</strong> loci.<br />

Inspection of downstream elements of all <strong>CRISPR</strong> clusters<br />

<strong>in</strong> S. acidocaldarius reveals BRE/TATA promoter<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


egions, that are likely to <strong>in</strong>itiate full-length complementary<br />

strand RNA products, as shown for <strong>the</strong> Saci-5 cluster<br />

(Fig. 6). Fur<strong>the</strong>r process<strong>in</strong>g of <strong>the</strong> complementary transcripts<br />

are likely to proceed by an endoribonuclease dist<strong>in</strong>ct<br />

from that generat<strong>in</strong>g spacer RNAs from <strong>the</strong> leader<br />

strand transcript, because <strong>the</strong> ‘handles’ <strong>in</strong> <strong>the</strong> repeats<br />

must be different given <strong>the</strong>ir different RNA sizes (about<br />

55 nt versus 40–45 nt). What is <strong>the</strong> functional significance<br />

of <strong>the</strong> complementary small RNAs? One possibility is that<br />

<strong>the</strong>y neutralize <strong>the</strong> leader spacer RNAs <strong>in</strong> <strong>the</strong> absence of<br />

<strong>in</strong>vad<strong>in</strong>g extra-chromosomal elements, although we failed<br />

to detect dsRNAs <strong>in</strong> <strong>the</strong> expected size range, but ano<strong>the</strong>r<br />

possibility is that load<strong>in</strong>g of leader-spacer RNAs onto<br />

an Argonaute-conta<strong>in</strong><strong>in</strong>g complex has to proceed via a<br />

dsRNA <strong>in</strong>termediate, as observed for <strong>the</strong> si- and miRNA<br />

pathways. The presence of Argonautes <strong>in</strong> archaea may<br />

facilitate a dist<strong>in</strong>ct mode of guide RNA presentation from<br />

that seen <strong>in</strong> bacteria, where <strong>the</strong>re is no evidence of <strong>the</strong><br />

participation of a complementary RNA strand <strong>in</strong> <strong>CRISPR</strong><br />

function (Brouns et al., 2008; Marraff<strong>in</strong>i and Son<strong>the</strong>imer,<br />

2008).<br />

Cellular activity of <strong>CRISPR</strong>s<br />

The observation that more than one <strong>CRISPR</strong> family is<br />

generally present <strong>in</strong> one organism suggested that <strong>the</strong>y<br />

may provide added versatility <strong>in</strong> regulat<strong>in</strong>g or <strong>in</strong>hibit<strong>in</strong>g<br />

<strong>in</strong>vad<strong>in</strong>g viruses or plasmids, and this supposition<br />

received some support from <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that putative recognition<br />

signals upstream from predicted proto-spacer<br />

sequences on viruses/plasmids are different for different<br />

<strong>CRISPR</strong> families (Fig. 1C). Analysis of <strong>the</strong> repeat-clusters<br />

of <strong>the</strong> two <strong>CRISPR</strong> families of S. solfataricus stra<strong>in</strong>s P1<br />

and P2 revealed biases to bicaudaviruses for family I, and<br />

to rudiviruses for family II, <strong>CRISPR</strong>s (Fig. 8). A study of<br />

3039 spacers from <strong>the</strong> three ma<strong>in</strong> families of all <strong>the</strong> Sulfolobales<br />

also showed significant biases, <strong>in</strong> particular a<br />

preference of family III spacers for rudiviruses (Fig. 9).<br />

This supports that <strong>the</strong> presence of multiple <strong>CRISPR</strong> families<br />

may produce a more versatile response to <strong>in</strong>vad<strong>in</strong>g<br />

genetic elements.<br />

The results also show that <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong> of<br />

S. acidocaldarius is primed to react rapidly to <strong>in</strong>vasion <strong>in</strong><br />

that <strong>the</strong> large cluster transcripts are present despite <strong>the</strong><br />

absence of viruses and plasmids. The <strong>system</strong> only<br />

requires that <strong>the</strong> RNA process<strong>in</strong>g enzymes are rapidly<br />

activated. The observation that process<strong>in</strong>g of <strong>the</strong> leader<br />

transcript strongly <strong>in</strong>creases <strong>in</strong> <strong>the</strong> stationary phase<br />

(Fig. 3) is also consistent with <strong>the</strong>se cells be<strong>in</strong>g more<br />

susceptible to external attack.<br />

Generation of spacer RNAs<br />

Transcripts on <strong>the</strong> leader strand <strong>in</strong>itiate just upstream<br />

from <strong>the</strong> first repeat, <strong>in</strong>dependently of <strong>the</strong> presence of a<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 269<br />

leader region. Process<strong>in</strong>g occurs primarily from <strong>the</strong> 3′-end<br />

of a s<strong>in</strong>gle transcript of <strong>the</strong> whole repeat-cluster, although<br />

limited process<strong>in</strong>g also occurs at <strong>the</strong> 5′-end (Fig. 4A), and<br />

repeats are targeted to generate a series of fragments.<br />

The small spacer RNAs from exponentially grow<strong>in</strong>g and<br />

stationary phase cells, rang<strong>in</strong>g <strong>in</strong> size from 40 to 52 nt<br />

and 35 to 52 nt respectively, represent a spectrum of<br />

fragments which anneal with spacer-specific, but not<br />

repeat-specific probes (Fig. 5A), consistent with earlier<br />

observations for archaea and bacteria (Lillestøl et al.,<br />

2006; Brouns et al., 2008; Hale et al., 2008). Process<strong>in</strong>g<br />

activity <strong>in</strong>itiates ma<strong>in</strong>ly at stationary phase, at least for<br />

cells lack<strong>in</strong>g extra-chromosomal elements (Fig. 3).<br />

Recently, it was shown that <strong>the</strong> Cas6 endoribonuclease<br />

b<strong>in</strong>ds to <strong>the</strong> 5′-end of a P. furiosus repeat, which can<br />

generate a hairp<strong>in</strong> structure, and cuts near <strong>the</strong> 3′-end<br />

(Carte et al., 2008). This result could expla<strong>in</strong> <strong>the</strong> anomalous<br />

process<strong>in</strong>g of <strong>the</strong> pKEF-7 transcript (Fig. 7B; Table 2)<br />

which exhibits an unusual 3′-term<strong>in</strong>al repeat sequence<br />

(Fig. 2B). Never<strong>the</strong>less, given <strong>the</strong> wide sequence and<br />

secondary structural diversity of repeat RNAs (Peng<br />

et al., 2003; Kun<strong>in</strong> et al., 2007), <strong>the</strong> enzymes must exhibit<br />

a wide range of recognition mechanisms.<br />

Transcripts of <strong>the</strong> complementary strand were <strong>in</strong>variably<br />

produced from each repeat-cluster and <strong>the</strong>y ranged<br />

<strong>in</strong> size from larger fragments to spacer RNAs of about<br />

55 nt, about 16 nt larger than <strong>the</strong> probed spacer, and<br />

consistent with <strong>the</strong> earlier observation for Saci-5 (Lillestøl<br />

et al., 2006). Although no reproducible RNA expression<br />

was observed from <strong>the</strong> complementary spacer strand for<br />

<strong>the</strong> euryarchaeon P. furiosus (Hale et al., 2008), this could<br />

have a technical explanation. For <strong>the</strong> cDNA libraries only<br />

fragments < 50 nt were screened for, and <strong>in</strong> <strong>the</strong> Nor<strong>the</strong>rn<br />

blot analysis, <strong>the</strong> 12% polyacrylamide gels used would<br />

not have resolved <strong>the</strong> large transcripts observed for<br />

Sulfolobus (Fig. 5B).<br />

Regular and irregular development of repeat-clusters<br />

The pairs of repeat-clusters E and F from S. solfataricus<br />

P1 and P2 are both identical and have not undergone<br />

structural changes s<strong>in</strong>ce <strong>the</strong> stra<strong>in</strong>s diverged (Fig. 8).<br />

Cluster E (Ssol-8) carries a family I leader but a degenerate<br />

first repeat, which may <strong>in</strong>hibit cognate enzyme recognition<br />

of <strong>the</strong> repeat and, <strong>the</strong>reby, subsequent extension<br />

of <strong>the</strong> repeat-cluster. In contrast, cluster F (Ssol-91) lacks<br />

a leader sequence which could provide an assembly site<br />

for DNA enzymes <strong>in</strong>volved <strong>in</strong> cluster extension. In addition,<br />

clusters E and F lack physically l<strong>in</strong>ked cas genes<br />

which could be important for DNA <strong>in</strong>sertion functions<br />

(Cas1) or RNA process<strong>in</strong>g (Cas2 and Cas4) (Makarova<br />

et al., 2006; Beloglazova et al., 2008).<br />

Irregularities <strong>in</strong> archaeal repeat-clusters are extremely<br />

rare (Lillestøl et al., 2006). However, <strong>in</strong> cluster F of both


270 R. K. Lillestøl et al. <br />

stra<strong>in</strong>s, a 106 bp region conta<strong>in</strong><strong>in</strong>g a half spacer preceded<br />

by two atypical repeat sequences is followed by a regular<br />

repeat sequence and no spacer (Fig. 8). These structures<br />

ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> precise size of <strong>the</strong> spacer-repeat units <strong>in</strong> <strong>the</strong><br />

cluster suggest<strong>in</strong>g that some k<strong>in</strong>d of ruler mechanism<br />

regulates <strong>the</strong> <strong>in</strong>sertion of new spacer-repeat units.<br />

Ano<strong>the</strong>r exceptional irregularity occurs <strong>in</strong> cluster D of<br />

stra<strong>in</strong> P1, where an 899 bp fragment carry<strong>in</strong>g a pNOB8like<br />

conjugative plasmid sequence (She et al., 1998) is<br />

flanked by repeats. Both examples may reflect a mechanistic<br />

defect whereby large plasmid regions, <strong>the</strong> former<br />

carry<strong>in</strong>g repeats, have been <strong>in</strong>correctly excised and <strong>in</strong>corporated<br />

<strong>in</strong>to <strong>the</strong> repeat-cluster. Fur<strong>the</strong>r exam<strong>in</strong>ation of this<br />

region may yield some <strong>in</strong>sight <strong>in</strong>to how DNA is obta<strong>in</strong>ed<br />

from extra-chromosomal elements.<br />

Mechanism of <strong>CRISPR</strong> transfer<br />

The commonality of <strong>CRISPR</strong> families <strong>in</strong> different Sulfolobus<br />

stra<strong>in</strong>s suggests that <strong>the</strong>y can be transferred horizontally<br />

(Lillestøl et al., 2006; Horvath et al., 2008b) but <strong>the</strong><br />

mechanism by which this could occur is unclear. For some<br />

bacteria it was proposed that large plasmids could carry<br />

and transmit <strong>the</strong> <strong>CRISPR</strong> apparatus (Godde and Bickerton,<br />

2006) but known crenarchaeal cryptic plasmids are<br />

quite small (5–10 kb) and <strong>the</strong> largest conjugative plasmids<br />

are only 40–50 kb (Greve et al., 2004), <strong>in</strong>sufficiently<br />

large to carry complex <strong>CRISPR</strong> <strong>system</strong>s. One possibility<br />

is that <strong>the</strong> <strong>system</strong> is transferred by chromosomal conjugation.<br />

Both S. acidocaldarius and Sulfolobus tokodaii<br />

chromosomes carry encaptured conjugative plasmids<br />

where <strong>the</strong> genes implicated <strong>in</strong> <strong>the</strong> conjugative process are<br />

ma<strong>in</strong>ta<strong>in</strong>ed (Greve et al., 2004) and for S. acidocaldarius,<br />

at least, conjugative transfer of chromosomal DNA has<br />

been demonstrated experimentally (Aagaard et al., 1995;<br />

Grogan, 1996).<br />

Experimental procedures<br />

Growth of Sulfolobus cells and preparation of DNA<br />

Sulfolobus acidocaldarius cells were grown at 78°C <strong>in</strong><br />

complex medium conta<strong>in</strong><strong>in</strong>g 2% tryptone (Schleper et al.,<br />

1995) and harvested at exponential or stationary phase by<br />

centrifug<strong>in</strong>g at 4000 r.p.m. and 4°C for 15 m<strong>in</strong>. Cells of<br />

S. solfataricus stra<strong>in</strong>s P1 and P2 were grown at 80°C <strong>in</strong><br />

complex medium conta<strong>in</strong><strong>in</strong>g 2% tryptone (Schleper et al.,<br />

1995). Total DNA used for repeat-cluster sequenc<strong>in</strong>g was<br />

isolated from S. solfataricus stra<strong>in</strong> P1 us<strong>in</strong>g DNeasy Kit<br />

(Qiagen, Westberg, Germany). Conjugation was <strong>in</strong>itiated by<br />

mix<strong>in</strong>g a culture of S. islandicus stra<strong>in</strong> Hi165 which harbours<br />

<strong>the</strong> conjugative plasmid pKEF9, with S. solfataricus P2 cells<br />

at a ratio of 1:10 000 at A600 = 0.17 (Schleper et al., 1995).<br />

Cells were harvested at 14 h after conjugation and centrifuged<br />

at 4000 r.p.m. for 6 m<strong>in</strong> at 4°C. pKEF9 was isolated<br />

us<strong>in</strong>g <strong>the</strong> Plasmid M<strong>in</strong>i Kit (Qiagen) and digested with EcoRI<br />

to verify its presence (Greve et al., 2004).<br />

RNA preparation, RNase digestion and Nor<strong>the</strong>rn blott<strong>in</strong>g<br />

Total RNA from S. acidocaldarius cells, and S. solfataricus<br />

cells conjugated with pKEF9, was prepared us<strong>in</strong>g Trizol<br />

(Invitrogen, Paisley, UK) accord<strong>in</strong>g to <strong>the</strong> Invitrogen protocol<br />

essentially as used for extract<strong>in</strong>g plant si-RNAs (Sunkar<br />

et al., 2005) and treated with DNase I (Applied Bio<strong>system</strong>s/<br />

Ambion, Aust<strong>in</strong>, TX) accord<strong>in</strong>g to <strong>the</strong> protocol from Ambion,<br />

and essentially as used for extract<strong>in</strong>g plant si-RNAs<br />

(Sunkar et al., 2005). To detect dsRNA, 20 mg of RNA was<br />

treated with various concentrations of RNase T1 (Ambion) <strong>in</strong><br />

RNase-digestion III buffer (Ambion), and RNase U2<br />

(Sankyo, Japan) <strong>in</strong> digestion buffer 20 mM Na acetate<br />

(pH 4.6), 2 mM MgCl2, 100 mM KCl, at 37°C for 30 m<strong>in</strong>.<br />

RNase was <strong>in</strong>activated and <strong>the</strong> RNA was precipitated<br />

with 225 ml of RNase <strong>in</strong>activation/precipitation solution III<br />

(Ambion) toge<strong>the</strong>r with 150 ml ethanol at -20°C for 1 h or<br />

overnight. For Nor<strong>the</strong>rn blott<strong>in</strong>g of small RNAs, 20 mg RNA<br />

was mixed with 10 ml Gel Load<strong>in</strong>g Buffer II (Applied<br />

Bio<strong>system</strong>s/Ambion) and fractionated <strong>in</strong> a 6–10% polyacrylamide<br />

gel conta<strong>in</strong><strong>in</strong>g 7 M urea, 90 mM Tris, 90 mM boric<br />

acid, 2 mM EDTA, pH 8.3, toge<strong>the</strong>r with a 10–150 nt ladder<br />

(Decade Marker System, Ambion, Huntigdon, UK) or a<br />

0.1–2.0 kb RNA ladder (Invitrogen). RNA was transferred<br />

onto Hybond N + nylon membranes (GE Healthcare, Amersham,<br />

UK) or GeneScreen plus nylon membranes (Perk<strong>in</strong>Elmer<br />

Life Sciences, Boston, USA) us<strong>in</strong>g <strong>the</strong> Bio-Rad<br />

semidry blott<strong>in</strong>g apparatus (Bio-Rad, Hercules, CA) and<br />

0.5¥ TBE (45 mM Tris, 45 mM boric acid, 1 mM EDTA,<br />

pH 8.3) as <strong>the</strong> blott<strong>in</strong>g buffer. For Nor<strong>the</strong>rn blott<strong>in</strong>g with<br />

large RNAs, 12 mg RNA was mixed with Nor<strong>the</strong>rn Max-Gly<br />

Sample Load<strong>in</strong>g Dye (Applied Bio<strong>system</strong>s/Ambion) and<br />

fractionated <strong>in</strong> a 1.5% agarose-BPTE (10 mM PIPES,<br />

pH 6.5, 30 mM Bis-Tris, 1 mM EDTA) gel, toge<strong>the</strong>r with a<br />

0.5–9 kb Millenium Marker (Applied Bio<strong>system</strong>s/Ambion).<br />

The RNA was transferred onto Hybond N + nylon membranes<br />

(GE Healthcare) by capillary blott<strong>in</strong>g with 0.2 M<br />

NaH2PO4, pH 7.4, 3.0 M NaCl, 0.02 M EDTA. After immobiliz<strong>in</strong>g<br />

<strong>the</strong> RNAs us<strong>in</strong>g a UV Crossl<strong>in</strong>ker (Stratagene, La<br />

Jolla, USA), <strong>the</strong> nylon membranes were pre-hybridized for<br />

1 h <strong>in</strong> 6¥ SSPE buffer (0.9 M NaCl, 60 mM NaH2PO4,<br />

4.6 mM EDTA, pH 7.4), 0.5% SDS and 5¥ Denhardt’s solution<br />

at 5°C lower than <strong>the</strong> Tm of <strong>the</strong> probe (TH). Oligonucleotides<br />

24–26-mers complementary to a spacer, or <strong>the</strong><br />

repeat, on ei<strong>the</strong>r strand, were end-labelled with [g 32 P]-ATP<br />

and T4 polynucleotide k<strong>in</strong>ase. Hybridization was performed<br />

at <strong>the</strong> TH of <strong>the</strong> probe <strong>in</strong> 6¥ SSPE, 0.5% SDS, 3¥ Denhardt’s<br />

solution for 18 h. The samples were washed three<br />

times at room temperature with 6¥ SSPE buffer and 0.1%<br />

SDS for 15 m<strong>in</strong> each and, subsequently, at <strong>the</strong> TH <strong>in</strong> <strong>the</strong><br />

same buffer. Membranes were exposed to Ultra UV-G X-ray<br />

film (Dupharma, Kastrup, Denmark) for 1 h to 3 days.<br />

Determ<strong>in</strong>ation of transcript ends<br />

The RLM-RACE kit (Applied Bio<strong>system</strong>s/Ambion) was used<br />

to determ<strong>in</strong>e <strong>the</strong> ends of transcripts generated from repeatclusters<br />

<strong>in</strong> S. acidocaldarius and pKEF9, with some modifications<br />

<strong>in</strong> <strong>the</strong> kit-protocol. To identify 5′-ends, 5 mg RNA was<br />

treated with tobacco acid pyrophosphatase (TAP) accord<strong>in</strong>g<br />

to <strong>the</strong> protocol. Both TAP-treated and untreated RNA were<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


<strong>the</strong>n l<strong>in</strong>ked to a 5′-RLM RACE adapter with RNA ligase,<br />

followed by reverse transcription from a spacer-specific<br />

primer accord<strong>in</strong>g to <strong>the</strong> protocol. Products were <strong>the</strong>n amplified<br />

by PCR with a 5′-RLM RACE adapter-specific primer<br />

conta<strong>in</strong><strong>in</strong>g a BamHI restriction site at <strong>the</strong> 5′ end and a spacerspecific<br />

primer carry<strong>in</strong>g an EcoRI restriction site, <strong>in</strong> order to<br />

facilitate clon<strong>in</strong>g of <strong>the</strong> PCR products <strong>in</strong>to pUC18. The PCRproducts<br />

were run on a 2% low melt<strong>in</strong>g agarose gel and<br />

purified with QIAquick Gel Extraction Kit (Qiagen). The fragments<br />

were cloned <strong>in</strong>to BamHI and EcoRI-digested pUC18 at<br />

a molar ratio of 4:1 and sequenced.<br />

Sequenc<strong>in</strong>g of clusters <strong>in</strong> S. solfataricus P1<br />

Long range PCR products were obta<strong>in</strong>ed across <strong>the</strong> chromosomal<br />

cluster regions of S. solfataricus stra<strong>in</strong> P1 us<strong>in</strong>g <strong>the</strong><br />

Herculase II kit (Stratagene, La Jolla, CA) accord<strong>in</strong>g to <strong>the</strong><br />

protocol, with 300 ng genomic DNA <strong>in</strong> 50 ml reactions. DNA<br />

fragments were purified us<strong>in</strong>g Qiaquick PCR purification kit<br />

(Qiagen) and sequenced. Sequences were analysed with<br />

Sequencher (Gene Codes, Ann Arbor, MI). BLAST searches<br />

were performed aga<strong>in</strong>st <strong>the</strong> Sulfolobus Database (http://<br />

sulfolobus.org).<br />

Bio<strong>in</strong>formatical analysis of <strong>CRISPR</strong>s of <strong>the</strong> Sulfolobales<br />

Repeat-clusters were identified us<strong>in</strong>g publicly available<br />

software (Edgar, 2007; Bland et al., 2007) <strong>in</strong> all available<br />

Sulfolobales genomes (S. solfataricus P2, S. tokodaii 7,<br />

S. acidocaldarius DSM 639, Metallosphaera sedula<br />

DSM5348 from GenBank (http://www.ncbi.nlm.nih.gov/<br />

Genbank/), Sulfolobus islandicus stra<strong>in</strong>s LD85, YG5714,<br />

YN1551, M164 and U328 from JGI (http://genome.jgi.doe.<br />

gov/mic_asmb.html), and S. islandicus stra<strong>in</strong>s HVE10/4 and<br />

REY15A and Acidianus brierleyi (unpublished data). Repeatcluster<br />

names identify <strong>the</strong> species and number of repeats.<br />

Repeat-cluster orientations were determ<strong>in</strong>ed by locat<strong>in</strong>g <strong>the</strong><br />

upstream leader sequence and/or by exam<strong>in</strong><strong>in</strong>g <strong>the</strong> repeat<br />

sequence. Leader sequences, when present, were limited to<br />

300 bp for <strong>the</strong> multiple alignment analyses (Edgar, 2004) and<br />

motif analyses (Bailey et al., 2006). Representative repeat<br />

sequences from each identified repeat-cluster were aligned<br />

(Edgar, 2004) and a phylogenetic tree was generated<br />

(Higg<strong>in</strong>s et al., 1994). Spacer sequences from each repeatcluster<br />

were aligned (Sæbø et al., 2005) aga<strong>in</strong>st <strong>the</strong><br />

genomes of extra-chromosomal elements of <strong>the</strong> Sulfolobales<br />

(http://sulfolobus.org/; Brügger, 2007) at a nucleotide level<br />

(Shah et al., 2009). Additionally, spacers were aligned<br />

aga<strong>in</strong>st am<strong>in</strong>o acid sequences of annotated ORFs of <strong>the</strong><br />

extra-chromosomal elements, at an am<strong>in</strong>o acid level (Shah<br />

et al., 2009; Vestergaard et al., 2008). Significance cut-offs<br />

were determ<strong>in</strong>ed for both alignment types by us<strong>in</strong>g <strong>the</strong><br />

genome sequence of Saccharomyces cerevisiae as a negative<br />

control.<br />

Acknowledgements<br />

The work was supported by grants from <strong>the</strong> Danish Natural<br />

Science Research Council and <strong>the</strong> Danish National<br />

Research Foundation.<br />

References<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 271<br />

Aagaard, C., Dalgaard, J., and Garrett, R.A. (1995) Intercellular<br />

mobility and hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron<br />

confers selective advantage over <strong>in</strong>tron- cells of Sulfolobus<br />

acidocaldarius. Proc Natl Acad Sci USA 92: 12285–12289.<br />

Bailey, T.L., Williams, N., Misleh, C., and Li, W.W. (2006)<br />

MEME: discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong><br />

sequence motifs. Nucleic Acids Res 34: 369–373.<br />

Barrangou, R., Fremaux, C., Deveau, H., Richards, M.,<br />

Boyaval, P., Mo<strong>in</strong>eau, S., et al. (2007) <strong>CRISPR</strong> provides<br />

acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes.<br />

Science 315: 1709–1712.<br />

Beloglazova, N., Brown, G., Zimmerman, M.D., Proudfoot,<br />

M., Makarova, K.S., Kudritska, M., et al. (2008) A novel<br />

family of sequence-specific endoribonucleases associated<br />

with <strong>the</strong> Clustered Regularly Interspaced Short Pal<strong>in</strong>dromic<br />

Repeats. J Biol Chem 29: 20361–20371.<br />

Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />

Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />

of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159:<br />

358–366.<br />

Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K.,<br />

Kyrpides, N.C., and Hugenholtz, P. (2007) <strong>CRISPR</strong><br />

Recognition Tool (CRT): a tool for automatic detection of<br />

clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />

Bio<strong>in</strong>formatics 8: 209.<br />

Bolot<strong>in</strong>, A., Qu<strong>in</strong>quis, B., Sorok<strong>in</strong>, A., and Ehrlich, S.D. (2005)<br />

Clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome repeats<br />

(<strong>CRISPR</strong>s) have spacers of extrachromosomal orig<strong>in</strong>.<br />

Microbiology 151: 2551–2561.<br />

Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R.,<br />

Slijkhuis, R.J., Snijders, A.P., et al. (2008) Small <strong>CRISPR</strong><br />

RNAs guide antiviral defense <strong>in</strong> prokaryotes. Science 321:<br />

960–964.<br />

Brügger, K. (2007) The Sulfolobus database. Nucleic Acids<br />

Res 35: D413–D415.<br />

Carte, J., Wang, R., Li, H., Terns, R.M., and Terns, M.P.<br />

(2008) Cas6 is an endoribonuclease that generates guide<br />

RNAs for <strong>in</strong>vader defense <strong>in</strong> prokaryotes. Genes Dev 22:<br />

3489–3496.<br />

Chen, L., Brügger, M., Skovgaard, M., Redder, P., She, Q.,<br />

Torar<strong>in</strong>sson, E., et al. (2005) The genome of Sulfolobus<br />

acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota.<br />

J Bacteriol 187: 4992–4999.<br />

Christiansen, J., Egebjerg, J., Larsen, N., and Garrett, R.A.<br />

(1990) Analysis of rRNA structure: experimental and <strong>the</strong>oretical<br />

considerations. In Ribosomes and Prote<strong>in</strong><br />

Syn<strong>the</strong>sis. Spedd<strong>in</strong>g, G. (ed.). Oxford: Oxford University<br />

Press, pp. 229–252.<br />

Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J.,<br />

Fremaux, C., Boyaval, P., et al. (2008) Phage response to<br />

<strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />

J Bacteriol 190: 1390–1400.<br />

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment<br />

with high accuracy and high throughput. Nucleic Acids Res<br />

32: 1792–1797.<br />

Edgar, R.C. (2007) PILER-CR: fast and accurate identification<br />

of <strong>CRISPR</strong> repeats. BMC Bio<strong>in</strong>formatics 8: 18.<br />

Forterre, P. (1992) Neutral terms. Nature 355: 305.<br />

Godde, J.S., and Bickerton, A. (2006) The repetitive DNA<br />

elements called <strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evi-


272 R. K. Lillestøl et al. <br />

dence of horizontal transfer among prokaryotes. J Mol Evol<br />

62: 718–729.<br />

Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />

R.A. (2004) Genomic comparison of archaeal conjugative<br />

plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />

Grissa, I., Vergnaud, G., and Pourcel, C. (2007) The CRISP-<br />

Rdb database and tools to display <strong>CRISPR</strong>s and to generate<br />

dictionaries of spacers and repeats. Bio<strong>in</strong>formatics 8:<br />

172.<br />

Grogan, D.W. (1996) Exchange of genetic markers at<br />

extremely high temperatures <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />

acidocaldarius. J Bacteriol 178: 3207–3211.<br />

Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F., and Nelson, K.E.<br />

(2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong><br />

families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong><br />

prokaryotic genomes. PLoS Comput Biol 1: 474–483.<br />

Hale, C., Kleppe, K., Terns, R.M., and Terns, M.P. (2008)<br />

Prokaryotic silenc<strong>in</strong>g (psi) RNAs <strong>in</strong> Pyrococcus furiosus.<br />

RNA 14: 1–8.<br />

Higg<strong>in</strong>s, D., Thompson, J., Gibson, T., Thompson, J.D.,<br />

Higg<strong>in</strong>s, D.G., and Gibson, T.J. (1994) CLUSTAL W:<br />

improv<strong>in</strong>g <strong>the</strong> sensitivity of progressive multiple sequence<br />

alignment through sequence weight<strong>in</strong>g, position-specific<br />

gap penalties and weight matrix choice. Nucleic Acids Res<br />

22: 4673–4680.<br />

Horvath, P., Romero, D.A., Coûté-Monvois<strong>in</strong>, A.C., Richards,<br />

M., Deveau, H., Mo<strong>in</strong>eau, S., et al. (2008a) Diversity, activity,<br />

and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus<br />

<strong>the</strong>rmophilus. J Bacteriol 190: 1401–1412.<br />

Horvath, P., Coûté-Monvois<strong>in</strong>, A.C., Romero, D.A., Boyaval,<br />

P., Fremaux, C., and Barrangou, R. (2008b) Comparative<br />

analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid bacteria genomes. Int<br />

J Food Microbiol doi:10.1016/j.ijfoodmicro.2008.05.030<br />

Jansen, R., Embden, J.D., Gaastra, W., and Schouls, L.M.<br />

(2002) Identification of genes that are associated with DNA<br />

repeats <strong>in</strong> prokaryotes. Mol Microbiol 43: 1565–1575.<br />

Klattenhoff, C., and Theurkauf, W. (2008) Biogenesis and<br />

germl<strong>in</strong>e functions of piRNAs. Development 135: 3–9.<br />

Kun<strong>in</strong>, V., Sorek, R., and Hugenholtz, P. (2007) Evolutionary<br />

conservation of sequence and secondary structures <strong>in</strong><br />

<strong>CRISPR</strong> repeats. Genome Biol 8: R61.<br />

Lillestøl, R.K., Redder, P., Garrett, R.A., and Brügger, K.<br />

(2006) A putative viral defence mechanism <strong>in</strong> archaeal<br />

cells. <strong>Archaea</strong> 2: 59–72.<br />

Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I.,<br />

and Koon<strong>in</strong>, E.V. (2006) A putative RNA-<strong>in</strong>terferencebased<br />

<strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes: computational<br />

analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />

analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms<br />

of action. Biol Direct 1: 7.<br />

Marraff<strong>in</strong>i, L.A., and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />

limits horizontal gene transfer <strong>in</strong> Staphylococci by<br />

target<strong>in</strong>g DNA. Science 322: 1843–1845.<br />

Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., and<br />

Soria, E. (2005) Interven<strong>in</strong>g sequences of regularly spaced<br />

prokaryotic repeats derive from foreign genetic elements.<br />

J Mol Evol 60: 174–182.<br />

Peng, X., Brügger, K., Shen, B., Chen, L., She, Q., and<br />

Garrett, R.A. (2003) Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong><br />

large clusters of DNA repeats (Short Regularly Spaced<br />

Repeats) present <strong>in</strong> Sulfolobus genomes. J Bacteriol 185:<br />

2410–2417.<br />

Pourcel, C., Salvignol, G., and Vergnaud, G. (2005) <strong>CRISPR</strong><br />

elements <strong>in</strong> Yers<strong>in</strong>ia pestis acquire new repeats by preferential<br />

uptake of bacteriophage DNA, and provide additional<br />

tools for evolutionary studies. Microbiology 151: 653–663.<br />

Prangishvili, D., Forterre, P., and Garrett, R.A. (2006) Viruses<br />

of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev Microbiol 11:<br />

837–848.<br />

Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K.,<br />

and Rognes, T. (2005) PARALIGN: rapid and sensitive<br />

sequence similarity searches powered by parallel comput<strong>in</strong>g<br />

technology. Nucleic Acids Res 33: 535–539.<br />

Schleper, C., Holz, I., Janekovic, D., Murphy, J., and Zillig, W.<br />

(1995) A Multicopy plasmid of <strong>the</strong> extremely <strong>the</strong>rmophilic<br />

archaeon Sulfolobus effects its transfer to recipients by<br />

mat<strong>in</strong>g. J Bacteriol 177: 4417–4426.<br />

Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />

of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />

of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />

<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem Soc Trans 37: 23–28.<br />

She, Q., Phan, H., Garrett, R.A., Albers, S.-V., Stedman,<br />

K.M., and Zillig, W. (1998) Genetic profile of pNOB8 from<br />

Sulfolobus: <strong>the</strong> first conjugative plasmid from an archaeon.<br />

Extremophiles 2: 417–425.<br />

She, Q., S<strong>in</strong>gh, R.K., Confalonieri, F., Zivanovic, Y., Gordon,<br />

P., Allard, G., et al. (2001) The complete genome of <strong>the</strong><br />

crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad<br />

Sci USA 98: 7835–7840.<br />

Sorek, R., Kun<strong>in</strong>, V., and Hugenholtz, P. <strong>CRISPR</strong> – a<br />

widespread <strong>system</strong> that provides acquired resistance<br />

aga<strong>in</strong>st phages <strong>in</strong> bacteria and archaea. (2008) Nat Rev<br />

Microbiol 6: 181–186.<br />

Sunkar, R., Girke, T., and Zhu, J.K. (2005) Identification and<br />

characterization of endogenous small <strong>in</strong>terfer<strong>in</strong>g RNAs<br />

from rice. Nucleic Acids Res 33: 4443–4454.<br />

Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>,<br />

M.-L., Huber, H., Drungowski, M., et al. (2002) Identification<br />

of 86 candidates for small non-messenger RNAs from<br />

<strong>the</strong> archaeon Archaeoglobus fulgidus. Proc Natl Acad Sci<br />

USA 99: 7536–7541.<br />

Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K.,<br />

Garrett, R.A., et al. (2005) Identification of novel noncod<strong>in</strong>g<br />

RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong><br />

archaeon Sulfolobus solfataricus. Mol Microbiol 55: 469–<br />

481.<br />

Torar<strong>in</strong>sson, E., Klenk, H.P., and Garrett, R.A. (2005) Divergent<br />

transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>.<br />

Environ Microbiol 7: 47–54.<br />

Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />

M., Phan, H., et al. (2008) SRV, a new rudiviral isolate from<br />

Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />

with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J Bacteriol<br />

190: 6837–6845.<br />

©2009TheAuthors<br />

Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272


Environmental Microbiology (2009) doi:10.1111/j.1462-2920.2009.02009.x<br />

Four newly isolated fuselloviruses from extreme<br />

geo<strong>the</strong>rmal environments reveal unusual morphologies<br />

and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation mechanismemi_2009 1..14<br />

Peter Redder, 1 * Xu Peng, 2 Kim Brügger, 2<br />

Shiraz A. Shah, 2 Ferd<strong>in</strong>and Roesch, 1 Bo Greve, 2<br />

Qunx<strong>in</strong> She, 2 Christa Schleper, 3 Patrick Forterre, 1<br />

Roger A. Garrett 2 and David Prangishvili 1<br />

1 Unite de Biologie Moleculaire du Gene chez les<br />

Extremophiles, Institut Pasteur, 25, rue du Dr Roux,<br />

F-75015 Paris, France.<br />

2 Danish <strong>Archaea</strong> Centre, Department of Biology,<br />

Biocenter, Ole Maaløesvej 5, Copenhagen University,<br />

DK-2200 Copenhagen N, Denmark.<br />

3 Department of Genetics <strong>in</strong> Ecology, University of<br />

Vienna, Althanstrasse 14, A-1090 Vienna, Austria.<br />

Summary<br />

Sp<strong>in</strong>dle-shaped virus-like particles are abundant <strong>in</strong><br />

extreme geo<strong>the</strong>rmal environments, from which five<br />

sp<strong>in</strong>dle-shaped viral species have been isolated to<br />

date. They <strong>in</strong>fect members of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

archaeal genus Sulfolobus, and constitute <strong>the</strong> Fuselloviridae,<br />

a family of double-stranded DNA viruses.<br />

Here we present four new members of this family, all<br />

from terrestrial acidic hot spr<strong>in</strong>gs. Two of <strong>the</strong> new<br />

viruses exhibit a novel morphotype for <strong>the</strong>ir proposed<br />

attachment structures, and specific features of <strong>the</strong>ir<br />

genome sequences strongly suggest <strong>the</strong> identity of<br />

<strong>the</strong> host-attachment prote<strong>in</strong>. All fuselloviral genomes<br />

are highly conserved at <strong>the</strong> nucleotide level, although<br />

<strong>the</strong> regions of conservation differ between viruspairs,<br />

consistent with a high frequency of homologous<br />

recomb<strong>in</strong>ation hav<strong>in</strong>g occurred between <strong>the</strong>m.<br />

We propose a fuselloviral specific mechanism for<br />

<strong>in</strong>terviral recomb<strong>in</strong>ation, and show that <strong>the</strong> spacers of<br />

<strong>the</strong> Sulfolobus <strong>CRISPR</strong> antiviral <strong>system</strong> are not<br />

biased to <strong>the</strong> highly similar regions of <strong>the</strong> fusellovirus<br />

genomes.<br />

Received 2 April, 2009; accepted 18 June, 2009. *For correspondence.<br />

E-mail peterredder@gmail.com; Tel. (+41) 774000253; Fax<br />

(+41) 223795108.<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd<br />

Introduction<br />

In contrast to <strong>the</strong> ra<strong>the</strong>r uniform landscape of virion<br />

morphotypes <strong>in</strong> aquatic <strong>system</strong>s under moderate environmental<br />

conditions, ma<strong>in</strong>ly represented by tailed bacteriophages<br />

(reviewed by Prangishvili, 2003), virus-like<br />

particles observed <strong>in</strong> ecological niches at high temperatures,<br />

low pH or high sal<strong>in</strong>ity reveal a high diversity of<br />

complex morphotypes (Guixa-Boixareu et al., 1996; Oren<br />

et al., 1997; Rice et al., 2001; Rachel et al., 2002; Här<strong>in</strong>g<br />

et al., 2005; Porter et al., 2007; Bize et al., 2008). About<br />

40 virus species isolated from such environments, all<br />

carry<strong>in</strong>g double-stranded (ds) DNA genomes, have been<br />

described, which <strong>in</strong>fect members of <strong>the</strong> third doma<strong>in</strong> of<br />

life, <strong>the</strong> <strong>Archaea</strong> (reviewed <strong>in</strong> Prangishvili et al., 2006a).<br />

Most common are viruses with an overall sp<strong>in</strong>dle-shaped<br />

morphology, ei<strong>the</strong>r tail-less, tailed or even two-tailed,<br />

which taxonomically have been assigned to <strong>the</strong> viral<br />

families Fuselloviridae (SSV1, SSV2, SSV4, SSVrh and<br />

SSVk1, s<strong>in</strong>gle-tailed), Bicaudaviridae (ATV, two-tailed)<br />

and <strong>the</strong> genus Salterprovirus (His 1 and His 2) while some<br />

still require classification (STSV1 and PAV1) (Schleper<br />

et al., 1992; Bath and Dyall-Smith, 1998; Arnold et al.,<br />

1999; Gesl<strong>in</strong> et al., 2003; Wiedenheft et al., 2004; Xiang<br />

et al., 2005; Bath et al., 2006; Prangishvili et al., 2006b;<br />

Peng, 2008).<br />

Five fuselloviruses have so far been isolated from<br />

acidic geo<strong>the</strong>rmal environments <strong>in</strong> different locations <strong>in</strong><br />

Asia, Europe and North America, and <strong>the</strong>y replicate <strong>in</strong><br />

species of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genus Sulfolobus,<br />

which represents a significant percentage of <strong>the</strong><br />

microbial population <strong>in</strong> most acidic terrestrial hot spr<strong>in</strong>gs.<br />

Ano<strong>the</strong>r major player <strong>in</strong> <strong>the</strong>se environments is <strong>the</strong> genus<br />

Acidianus, from which several viruses have been isolated,<br />

<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> l<strong>in</strong>ear filamentous and rod-shaped viruses<br />

AFV1 and ARV1, respectively, which have close viral relatives<br />

that also <strong>in</strong>fect Sulfolobus (Prangishvili et al., 2006a;<br />

Snyder et al., 2007). Although <strong>the</strong> two genera coexist, no<br />

fusellovirus has yet been isolated from Acidianus, even<br />

though it appears to be <strong>the</strong> most predom<strong>in</strong>ant Sulfolobus<br />

viral type.<br />

The circular dsDNA genomes of five known fuselloviruses<br />

are highly similar at both nucleotide and am<strong>in</strong>o acid<br />

sequence levels, with <strong>the</strong> majority of gene products be<strong>in</strong>g


2 P. Redder et al.<br />

Fig. 1. A. Representative electron microscopy images of SSV6, SSV7 and ASV1. The end-filaments of SSV7 are very sticky, and <strong>the</strong> virus is<br />

almost always observed <strong>in</strong> ‘rosettes’ or attached to vesicles (white arrows). A rare s<strong>in</strong>gle SSV7 is also shown (dotted white arrow). SSV6 and<br />

ASV1 do not have sticky ends and are always s<strong>in</strong>gle, even when ly<strong>in</strong>g close toge<strong>the</strong>r. Fur<strong>the</strong>rmore, SSV6 and ASV1 exhibit a wide range of<br />

morphotypes, vary<strong>in</strong>g from <strong>the</strong> standard sp<strong>in</strong>dle shape to an elongated sausage shape (<strong>in</strong>dicated by dotted black arrows for SSV6).<br />

B. Magnification of <strong>the</strong> end-filaments of <strong>the</strong> three viruses. The filaments of SSV6 and ASV1 are thick, and seem to form a crown around <strong>the</strong><br />

virus tips (black arrows) whereas SSV7 carries th<strong>in</strong>ner filaments, that protrude directly from <strong>the</strong> virus tips. All samples were negatively sta<strong>in</strong>ed<br />

with 2% Uranyl acetate and <strong>the</strong> scalebars are all 100 nm.<br />

of unknown function and lack<strong>in</strong>g homologues <strong>in</strong> public<br />

sequence databases o<strong>the</strong>r than <strong>in</strong> o<strong>the</strong>r archaeal viruses<br />

(Wiedenheft et al., 2004). The viral DNA is protected<br />

aga<strong>in</strong>st <strong>the</strong> harsh environment, at temperatures above<br />

80°C and pH values below 2, with<strong>in</strong> a sp<strong>in</strong>dle-shaped<br />

virion about 100 nm long and 60 nm wide, with a bunch of<br />

short, th<strong>in</strong> fibres at one of <strong>the</strong> po<strong>in</strong>ted ends (Mart<strong>in</strong> et al.,<br />

1984; Stedman et al., 2003; Wiedenheft et al., 2004;<br />

Peng, 2008). In <strong>the</strong> electron microscopy, <strong>the</strong> body is<br />

sometimes observed to be slightly elongated and more<br />

‘cigar-shaped’, and <strong>the</strong> tail fibres appear to be quite sticky,<br />

readily attach<strong>in</strong>g to cellular fragments, as well as l<strong>in</strong>k<strong>in</strong>g<br />

virions to produce rosette-like aggregates (Fig. 1A –<br />

SSV7).<br />

SSV1 is <strong>the</strong> best studied fusellovirus, and <strong>the</strong> virion has<br />

been shown to conta<strong>in</strong> prote<strong>in</strong>s VP1, VP2, VP3 and small<br />

amounts of SSV1_D244 and SSV1_C792 (Reiter et al.,<br />

1987a; Menon et al., 2008). VP1 and VP3 are thought to<br />

be capsid prote<strong>in</strong>s, whereas VP2 has been assigned a<br />

DNA-b<strong>in</strong>d<strong>in</strong>g role, organiz<strong>in</strong>g DNA, but it is not encoded<br />

by o<strong>the</strong>r fuselloviruses (Stedman et al., 2003; Wiedenheft<br />

et al., 2004). Four non-structural SSV1 prote<strong>in</strong>s have<br />

been characterized. SSV1_D63 is considered to l<strong>in</strong>k<br />

two different prote<strong>in</strong> complexes, while SSV1_F93 and<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


Fig. 2. A. Graphical alignment of <strong>the</strong> n<strong>in</strong>e circular fuselloviral genomes, l<strong>in</strong>earized at <strong>the</strong> first nucleotide after <strong>the</strong> VP3 stop codon (follow<strong>in</strong>g<br />

<strong>the</strong> convention of Wiedenheft et al., 2004). All ORFs larger than 50 am<strong>in</strong>o acids <strong>in</strong>dicated by arrows. Shades of blue and green: 13 ‘core’<br />

genes. Dark grey: ORFs found <strong>in</strong> two or more fuselloviruses. Light grey: ORFs only found <strong>in</strong> one fusellovirus. Black: VP2. Yellow:<br />

SSV1_C792 homologues, both full length and partial. Red: SSV6_B1232 homologues. Orange: SSV1_B78 homologues. Light p<strong>in</strong>k:<br />

SSV1_D244 homologues associated with <strong>the</strong> Integrase operon <strong>in</strong> all but ASV1 and SSVk1. Dark violet and light violet: Rad3-like helicase and<br />

Msed_2283 homologues substitut<strong>in</strong>g for a large part of <strong>the</strong> Integrase operon <strong>in</strong> ASV1, SSV7 and SSVk1. Dark p<strong>in</strong>k: SSV1_F93 homologues.<br />

Brown: Highly conserved SSV1_C84 homologue overlapp<strong>in</strong>g with some of <strong>the</strong> o<strong>the</strong>r ‘core’ genes. Magenta: SSV1-C80 homologues and<br />

ASV1-A59. The transcripts identified by Fröls and colleagues (2007) are <strong>in</strong>dicated below SSV1.<br />

B. The two different putative end-filament modules, exemplified by SSV1 and SSV6.<br />

SSV1_F112 are DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>s implicated <strong>in</strong> transcriptional<br />

regulation (Kraft et al., 2004a,b; Menon et al.,<br />

2008). The fourth prote<strong>in</strong> is an <strong>in</strong>tegrase of <strong>the</strong> tyros<strong>in</strong>e<br />

recomb<strong>in</strong>ase family, which catalyses site-specific <strong>in</strong>tegration<br />

of <strong>the</strong> viral genome <strong>in</strong>to <strong>the</strong> host chromosome. As <strong>the</strong><br />

viral recomb<strong>in</strong>ation site (attP) is located with<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrase<br />

gene, <strong>in</strong>tegration leads to gene partition (Palm<br />

et al., 1991; Muskhelishvili et al., 1993). Despite this<br />

highly specialized adaptation, <strong>the</strong> <strong>in</strong>tegrase was recently<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 3<br />

shown to be non-essential for virus replication and basic<br />

viral functions (Clore and Stedman, 2007).<br />

Replication of SSV1 and SSV2 can be <strong>in</strong>duced by UV<br />

irradiation (Yeats et al., 1982; Stedman et al., 2003). The<br />

SSV1 transcription cycle, follow<strong>in</strong>g UV <strong>in</strong>duction, has also<br />

been elucidated by Nor<strong>the</strong>rn analysis, physical mapp<strong>in</strong>g<br />

and DNA microarrays, and transcripts were classified as<br />

early (T5, T6 and T9), late (T3, Tx and T8) and UV <strong>in</strong>ducible<br />

(T<strong>in</strong>d) (Fig. 2) (Reiter et al., 1987b; Fröls et al., 2007).


4 P. Redder et al.<br />

The prote<strong>in</strong>s encoded <strong>in</strong> <strong>the</strong> early transcripts of SSV1,<br />

and <strong>the</strong>ir homologues <strong>in</strong> o<strong>the</strong>r fuselloviruses, are often<br />

cyste<strong>in</strong>e-rich compared with prote<strong>in</strong>s encoded <strong>in</strong> <strong>the</strong> late<br />

transcripts (Palm et al., 1991; Stedman et al., 2003;<br />

Wiedenheft et al., 2004). This has recently been proposed<br />

to be due to <strong>in</strong>tra- and extra-cellular localization of <strong>the</strong><br />

early and late prote<strong>in</strong>s respectively (Menon et al., 2008).<br />

Here we report on <strong>the</strong> isolation and properties of four<br />

novel members of <strong>the</strong> Fuselloviridae, <strong>in</strong>fect<strong>in</strong>g species of<br />

<strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera Sulfolobus and<br />

Acidianus, almost doubl<strong>in</strong>g <strong>the</strong> number of known fuselloviruses<br />

and extend<strong>in</strong>g <strong>the</strong>ir host-range to a new genus,<br />

Acidianus. This merited a revised comparative genomic<br />

analysis of fuselloviruses, which provided <strong>in</strong>sights <strong>in</strong>to<br />

functions of some viral prote<strong>in</strong>s and addressed general<br />

questions concern<strong>in</strong>g <strong>the</strong> evolution of <strong>the</strong> viruses and<br />

<strong>in</strong>teractions with <strong>the</strong>ir hosts.<br />

Results<br />

Isolation of virus–host <strong>system</strong>s<br />

Three different methods were used to acquire <strong>the</strong> new<br />

viruses reported <strong>in</strong> this communication. SSV5 was discovered<br />

as an extrachromosomal element with<strong>in</strong> cells of<br />

S. solfataricus P2 (DSM1617), <strong>in</strong>fected as a result of<br />

mix<strong>in</strong>g <strong>the</strong> cells with an icelandic HVE14 enrichment<br />

culture (see Experimental procedures). This traditional<br />

method of isolat<strong>in</strong>g new viruses allows a large number of<br />

viruses to be screened, but it restricts <strong>the</strong> search for<br />

specific virus–host <strong>system</strong>s.<br />

A different approach was used for SSV6 and SSV7,<br />

where transmission electron microscopy analysis of <strong>the</strong><br />

supernatant from an enrichment of <strong>the</strong> G4 site at<br />

Hveregedi, Iceland, revealed a large number of fuselloviruses.<br />

Attempts to isolate s<strong>in</strong>gle virus–host <strong>system</strong>s by<br />

colony purification resulted <strong>in</strong> two pure stra<strong>in</strong>s, each harbour<strong>in</strong>g<br />

a different fusellovirus. Stra<strong>in</strong> G4T-1 was a host<br />

for Sulfolobus sp<strong>in</strong>dle-shaped virus 7 (SSV7) while stra<strong>in</strong><br />

G4ST-T-11 was <strong>the</strong> natural producer of a pleiomorphic<br />

virus named Sulfolobus sp<strong>in</strong>dle-shaped virus 6, SSV6.<br />

The former was found to be produced <strong>in</strong> very low amounts<br />

under normal growth conditions, but it was possible to<br />

<strong>in</strong>crease SSV7 production about 10-fold (as estimated by<br />

count<strong>in</strong>g viral particles <strong>in</strong> <strong>the</strong> electron microscope), ei<strong>the</strong>r<br />

by shift<strong>in</strong>g <strong>the</strong> culture to a medium with lower tryptone<br />

concentration, or by <strong>in</strong>duc<strong>in</strong>g with UV light. Stra<strong>in</strong> G4T-1<br />

and G4ST-T-11 may <strong>in</strong> fact be <strong>the</strong> same species, as <strong>the</strong>ir<br />

partial 16S rRNA sequences were identical to S. islandicus<br />

stra<strong>in</strong> I7 (AY247894.1) with a s<strong>in</strong>gle base substitution<br />

to dist<strong>in</strong>guish <strong>the</strong>m from S. solfataricus P2. This virus<br />

isolation approach did not impose any bias on <strong>the</strong> choice<br />

of viral host (except for choos<strong>in</strong>g <strong>the</strong> growth conditions),<br />

and it provided a ‘natural’ virus–host <strong>system</strong>. However, a<br />

bias is imposed on <strong>the</strong> virus, which excludes <strong>the</strong> possibility<br />

of isolat<strong>in</strong>g s<strong>in</strong>gle colonies of <strong>the</strong> host if <strong>the</strong> virus is<br />

highly lytic under <strong>the</strong> chosen conditions.<br />

F<strong>in</strong>ally, <strong>the</strong> fourth virus described here, Acidianus<br />

sp<strong>in</strong>dle-shaped virus 1, ASV1, was discovered as an<br />

extrachromosomal and <strong>in</strong>tegrated element <strong>in</strong> <strong>the</strong> course<br />

of sequenc<strong>in</strong>g <strong>the</strong> genome of Acidianus brierleyi<br />

DSM1651, and <strong>the</strong> production of virions was subsequently<br />

confirmed by electron microscopy (Fig. 1). While<br />

this method for isolat<strong>in</strong>g new viruses is not generally<br />

applicable, it is likely to become more common that extrachromosomal<br />

elements are detected while sequenc<strong>in</strong>g<br />

genomes from stra<strong>in</strong> collections.<br />

Morphology<br />

The sp<strong>in</strong>dle-shaped virion of SSV7, ~90 nm long and<br />

~50 nm wide, resembles virions of all previously known<br />

fuselloviruses morphologically, as well as by its tendency<br />

to form ‘rosettes’ by stick<strong>in</strong>g to neighbour<strong>in</strong>g viral particles<br />

(Fig. 1). In contrast, negatively sta<strong>in</strong>ed virions of SSV6<br />

and ASV1, both appear much more pleiomorphic than <strong>the</strong><br />

o<strong>the</strong>r fuselloviruses, and assume shapes rang<strong>in</strong>g from<br />

th<strong>in</strong> cigar-like to pear-like, with tail fibres at <strong>the</strong> end correspond<strong>in</strong>g<br />

to where <strong>the</strong> pear ‘stalk’ would be (Fig. 1).<br />

The virion bodies, and tail fibres of ASV1 and SSV6,<br />

seem to differ from those of <strong>the</strong> o<strong>the</strong>r fuselloviruses.<br />

Instead of multiple th<strong>in</strong> fibres, <strong>the</strong>se virions carry 3 or 4<br />

thicker and slightly curved, fibres that appear to protrude<br />

sideways, not from <strong>the</strong> particle apex but from a po<strong>in</strong>t<br />

slightly more towards <strong>the</strong> body (Fig. 1B). Fur<strong>the</strong>rmore, <strong>the</strong><br />

ASV1 and SSV6 fibres seem to be less ‘sticky’ than <strong>the</strong>ir<br />

th<strong>in</strong> counterparts, and <strong>the</strong> characteristic ‘rosettes’ were<br />

never observed for ASV1 and SSV6.<br />

To exclude that <strong>the</strong> observed pleiomorphicity of SSV6<br />

virions was an artifact caused by <strong>the</strong> purification process,<br />

or by uranyl-acetate sta<strong>in</strong><strong>in</strong>g, two control experiments<br />

were carried out. (i) The virions were analysed by EM<br />

directly after removal of host cells by mild centrifugation at<br />

4000 r.p.m. (Jouan S40 rotor), and although omitt<strong>in</strong>g <strong>the</strong><br />

concentration step yielded few virions, <strong>the</strong>y exhibited <strong>the</strong><br />

normal pleiomorphicity. (ii) The virion pleiomorphicity was<br />

also observed when we used phosphotungstenate as an<br />

alternative contrast<strong>in</strong>g agent (not shown), confirm<strong>in</strong>g that<br />

<strong>the</strong> heterogeneity of <strong>the</strong> shape was an <strong>in</strong>tegral property of<br />

<strong>the</strong> virions ra<strong>the</strong>r than a result of <strong>the</strong> experimental treatment.<br />

Moreover, SSV7 virions, for which little to no pleiomorphicity<br />

was observed, were rout<strong>in</strong>ely treated <strong>in</strong> exactly<br />

<strong>the</strong> same manner as SSV6 and ASV1 virions (Fig. 1A).<br />

Genomic organization and comparison<br />

Ow<strong>in</strong>g to <strong>the</strong>ir special structural properties, we orig<strong>in</strong>ally<br />

suspected that ASV1 and SSV6 were representatives of<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


a new sp<strong>in</strong>dle-shaped viral family. However, genome<br />

analyses revealed that <strong>the</strong>y, and <strong>the</strong> SSV5 and SSV7<br />

isolates, are all closely related to known members of <strong>the</strong><br />

family Fuselloviridae, and we <strong>the</strong>refore assign <strong>the</strong> four<br />

newly isolated viruses to this family. The similarities are<br />

evident, both <strong>in</strong> terms of overall gene synteny and<br />

sequence similarity (Table 1, Figs 2 and 3), and also<br />

extends to <strong>the</strong> distribution of <strong>the</strong> cyste<strong>in</strong>e codons <strong>in</strong> a<br />

manner that supports <strong>the</strong> f<strong>in</strong>d<strong>in</strong>gs of Menon and<br />

colleagues (2008).<br />

Sequence similarity among <strong>the</strong> fuselloviruses<br />

The genome of ASV1 carries 24 186 bp and is by far <strong>the</strong><br />

largest of <strong>the</strong> fuselloviruses, and one or two gene duplications<br />

appear to have occurred (ASV1_B91 and<br />

ASV1_C137), as well as <strong>the</strong> acquisition of new genes.<br />

Most of <strong>the</strong> ASV1 genome is closely related to <strong>the</strong> o<strong>the</strong>r<br />

fuselloviruses, with several regions of more than 75%<br />

identity at <strong>the</strong> nucleotide level (Fig. 3). One 5.6 kb region<br />

that is similar to SSV6, starts <strong>in</strong> <strong>the</strong> middle of ASV1_C213<br />

and ends <strong>in</strong> ASV1_B90 (Fig. 3D).<br />

An extreme example of how closely related some fuselloviruses<br />

are, can be seen by compar<strong>in</strong>g SSV4 and SSV5,<br />

where a 7.9 kb region is almost 100% identical (Fig. 3B),<br />

consistent with a recent recomb<strong>in</strong>ation event hav<strong>in</strong>g<br />

occurred between <strong>the</strong> viruses. Moreover, <strong>the</strong> junctions of<br />

nucleotide similarity regions are generally <strong>in</strong>tragenic, such<br />

that sections of high sequence similarity are mostly short,<br />

distributed all over <strong>the</strong> genomes, and often start and stop<br />

<strong>in</strong> <strong>the</strong> middle of open read<strong>in</strong>g frames (ORFs) (Fig. 3).<br />

These patterns of similarity raise <strong>in</strong>terest<strong>in</strong>g questions<br />

concern<strong>in</strong>g <strong>in</strong>terplay and recomb<strong>in</strong>ation between fuselloviral<br />

genomes.<br />

The presence of regions of nucleotide identity between<br />

<strong>the</strong> fuselloviruses raises <strong>the</strong> question as to how <strong>the</strong>y avoid<br />

<strong>the</strong> extensive antiviral <strong>CRISPR</strong> <strong>system</strong>s present <strong>in</strong> all<br />

sequenced Sulfolobus genomes. Therefore, we analysed<br />

<strong>the</strong> correlation between sequence match<strong>in</strong>g of <strong>CRISPR</strong>spacers<br />

and fuselloviral genomes. A total of 3420<br />

<strong>CRISPR</strong> spacer sequences were obta<strong>in</strong>ed from four complete<br />

and n<strong>in</strong>e <strong>in</strong>complete Sulfolobales genomes (after<br />

subtract<strong>in</strong>g <strong>the</strong> 278 spacers which S. solfataricus P1 and<br />

P2 have <strong>in</strong> common). N<strong>in</strong>ety-one of <strong>the</strong>se spacers match<br />

to one or more of <strong>the</strong> fuselloviruses on a nucleotide<br />

sequence level. An additional 101 spacers were found<br />

match<strong>in</strong>g to one or more fuselloviruses when extend<strong>in</strong>g<br />

<strong>the</strong> search to <strong>the</strong> am<strong>in</strong>o acid sequence level. Thus out of<br />

<strong>the</strong> 3420 Sulfolobales spacers, <strong>in</strong> total 192 spacers yield<br />

436 significant matches to fuselloviral genomes. The latter<br />

number exceeds <strong>the</strong> former because many spacers,<br />

especially on <strong>the</strong> am<strong>in</strong>o acid sequence level, yield<br />

matches to more than one virus, and because some<br />

spacers match to repeats with<strong>in</strong> <strong>the</strong> same viral genome.<br />

We found no biased correlation between conserved<br />

regions and spacer matches, and it is possible that fuselloviruses<br />

recomb<strong>in</strong>e frequently enough to reduce <strong>the</strong><br />

effectiveness of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>. The results are<br />

summarized <strong>in</strong> Fig. 4, exemplified by SSV2 which has <strong>the</strong><br />

highest number of spacer matches, and by <strong>the</strong> most distantly<br />

related fusellovirus, ASV1. The spacer matches<br />

occur on both strands of <strong>the</strong> viruses, consistent with DNA<br />

recognition by <strong>the</strong> spacer transcripts, as recently proposed<br />

(Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008; Shah et al.,<br />

2009).<br />

Encoded prote<strong>in</strong>s<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 5<br />

Many of <strong>the</strong> ORFs encoded on ASV1 yield no, or very<br />

weak, matches <strong>in</strong> public sequence databases, especially<br />

ORFs found <strong>in</strong> <strong>the</strong> ‘extra’ ~6 kb that are not present <strong>in</strong><br />

o<strong>the</strong>r fuselloviruses. Exceptions are ASV1_B276 and<br />

C106, which are homologous to genes from an <strong>in</strong>tegrated<br />

virus <strong>in</strong> <strong>the</strong> Sulfolobus tokodaii chromosome (ST1724 and<br />

ST1725), and ASV1_A59, which exhibits sequence similarity<br />

to CopG transcriptional regulators <strong>in</strong> M. sedula and<br />

S. acidocaldarius. Both SSV6 and ASV1 encode a homologue<br />

of <strong>the</strong> structural prote<strong>in</strong> SSV1_VP2, which is absent<br />

from <strong>the</strong> o<strong>the</strong>r six fuselloviruses. Fur<strong>the</strong>rmore, SSV6 and<br />

ASV1 do not encode a full-length SSV1_C792 homologue,<br />

and SSV1_B78 homologue, as do all o<strong>the</strong>r fuselloviruses<br />

(Fig. 2B). Instead, <strong>the</strong>y carry two o<strong>the</strong>r genes: a<br />

small gene (SSV6_C213 and ASV1_B208) homologous<br />

only to <strong>the</strong> C-term<strong>in</strong>al 170 aa of SSV1_C792, and follow<strong>in</strong>g<br />

this gene, a large ORF (ASV1_A1231 and SSV6_<br />

B1232), which is similar to Saci_1002 from Sulfolobus<br />

acidocaldarius (49% identity, 65% similarity, for<br />

SSV6_B1232). No o<strong>the</strong>r sequence similarity is found <strong>in</strong><br />

databases, but a clue to <strong>the</strong> function of both <strong>the</strong><br />

SSV1_C792 and SSV6_B1232 homologues is given by<br />

<strong>the</strong> Phyre fold-prediction-server (Kelley and Sternberg,<br />

2009), which suggest <strong>the</strong>y both have a fold similar to<br />

<strong>the</strong> adsorption prote<strong>in</strong> P2 from bacteriophage prd1<br />

(E-value < 0.5, estimated precision 85%).<br />

ASV1, SSV7 and SSVk1 differ from <strong>the</strong> o<strong>the</strong>r fuselloviruses<br />

by lack<strong>in</strong>g all genes of <strong>the</strong> SSV1_T5 operon except<br />

<strong>the</strong> <strong>in</strong>tegrase and, for ASV1 and SSVk1, a predicted<br />

helix–turn–helix transcriptional regulator (Fig. 2). Instead,<br />

<strong>the</strong> three viruses carry a set of ORFs on <strong>the</strong> plus-strand,<br />

which encode a putative Rad3-like helicase, an Msed_<br />

2283 homologue (hypo<strong>the</strong>tical prote<strong>in</strong>) and a few small<br />

prote<strong>in</strong>s (Fig. 2).<br />

Beside <strong>the</strong>se peculiarities of <strong>the</strong> <strong>in</strong>dividual genomes,<br />

analyses have revealed 13 genes that are conserved <strong>in</strong> all<br />

n<strong>in</strong>e fuselloviruses. These ‘core’ genes <strong>in</strong>clude VP1 and<br />

VP3, <strong>the</strong> <strong>in</strong>tegrase and three putative transcriptional regulators,<br />

<strong>in</strong>clud<strong>in</strong>g one helix–turn–helix and two z<strong>in</strong>c-f<strong>in</strong>ger<br />

prote<strong>in</strong>s (Fig. 2). The attP sites with<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrase genes


6 P. Redder et al.<br />

Table 1. Genes <strong>in</strong> SSV5, SSV6, SSV7 and ASV1, as well as <strong>the</strong> homologues from o<strong>the</strong>r fuselloviruses.<br />

Size-range<br />

(aa) Comments<br />

ASV1<br />

(24 186 bp)<br />

SSV7<br />

(17 602 bp)<br />

SSV6<br />

(15 684 bp)<br />

SSVrh<br />

(16 473 bp)<br />

SSVk1<br />

(17 385 bp)<br />

SSV5<br />

(15 330 bp)<br />

SSV4<br />

(15 135 bp)<br />

SSV2<br />

(14 796 bp)<br />

SSV1<br />

(15 465 bp)<br />

Japan Iceland Iceland Iceland Kamchatka USA Iceland Iceland USA Isolated from<br />

Russia<br />

Arg (CCG) Gly (CCC) Glu (TTC) Gln (CTG) Asp (GTC), Leu (GAG) Gln (CTG) Gly (CCC) Lys (TTT) Match<strong>in</strong>g S. solfataricus P2 tRNA of <strong>the</strong> attP site <strong>in</strong><br />

Glu (CTC),<br />

<strong>the</strong> <strong>in</strong>tegrase gene (anticodon)<br />

Glu (TTC)<br />

VP2 C76 A82a 74–82 VP2 prote<strong>in</strong> detected <strong>in</strong> <strong>the</strong> SSV1 virion and thought<br />

to be <strong>the</strong> DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (Reiter et al., 1987a)<br />

A82 ORF83 ORF82 gp07 B83 A83 C83 C82 A83 82–83 Putative membrane prote<strong>in</strong>a a<br />

C84 ORF88c ORF81 gp08 B90 C78 B81 B83 C97 81–104<br />

A92 ORF90 ORF89 gp09 A82 A93 C90 C90 A94 89–94 Overlaps o<strong>the</strong>r genes<br />

B277 ORF276 ORF280 gp10 C279 C277 A269 A281 C263 269–281 Putative membrane prote<strong>in</strong>a A154 ORF153 ORF152 gp11 C157 C154 B149 C150 C155 149–157 Also found <strong>in</strong> pSSVxa B251 ORF233 ORF233 gp12 A231 A247 C234 A255 A232 231–255 DnaA-like (Koon<strong>in</strong>, 1992)<br />

Also <strong>in</strong> pSSVx, ATV and A. pernixa D335 ORF328 ORF330 Integrase F340 D355 F354 D336 D347 328–355 Integrasea E79 79<br />

C176 176<br />

A66* C72 A58a 58–72 Also found <strong>in</strong> AFV2 (gp06)<br />

B204 A171 171–204<br />

C74 B80 74–80<br />

B494 A583 C559 494–583 Rad3-like helicase<br />

A460 B471 C674 460–674 Similar to Metallosphera sedula prote<strong>in</strong> Msed_2283<br />

B192 192 Similar to C-term<strong>in</strong>al of ASV1_C674<br />

A136 B119 119–154<br />

B64 B102 64–102<br />

D244 ORF211 ORF209 gp15 D212 F215 209–244 Similar to Saci_0475<br />

D108 108 Similar to SIRV2gp12<br />

F90 90 Similar to ORFs from pARN3 and pSOG1<br />

E94 94<br />

F93 E81 F110 D95 81–110 Putative HTH transcriptional regulator (Kraft et al.,<br />

2004b)<br />

D63 ORF57 ORF63 gp16 F61 E60 57–63 3D X-ray structure from SSV1 (Kraft et al., 2004a)<br />

ORF159b gp18 E152 F185 152–185<br />

ORF61 ORF61 gp21 F62 E61 61–62<br />

ORF79a ORF73 gp23 E73 D77 73–79<br />

A49 49 C-term<strong>in</strong>al similar to SSV7_B76<br />

A100 ORF96 ORF96 gp24 C96 C93 C106 C96 93–106 Weak hit to ARV1<br />

C48 C49 48–49<br />

ORF88a B87 87–88<br />

B92 92<br />

A59 59 Similar to CopG from M. sedula and Saci_0942.<br />

Possible functional homologue of SSV1_C80<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


C80 ORF82A ORF79 gp26 C82 B64 A78 C80 64–82 RHH prote<strong>in</strong>, CopG-likea A109 109 Paralogue of ASV1_B91<br />

A79 ORF82B ORF80B gp27 A80 B79 B82 B82 B91 79–91 Z<strong>in</strong>c f<strong>in</strong>ger motif. Similar to ATV_gp28 and<br />

pHVE14–51. ASV1_B91 is a paralogue of<br />

ASV1_A109a C54 54<br />

C102a ORF100 ORF100 gp29 B98 A102b C100 A101 98–102 B-block_TFIIC-doma<strong>in</strong>, Z<strong>in</strong>c f<strong>in</strong>ger<br />

ORF205 ORF206 gp30 A204 C287 B206 204–287 Similar to <strong>CRISPR</strong> associated gene Cas4 <strong>in</strong><br />

Staphylo<strong>the</strong>rmus mar<strong>in</strong>us.<br />

B129 ORF155 ORF124 gp31 B158 C150 B123 C128 C137 124–173 Two Z<strong>in</strong>c f<strong>in</strong>ger motifs. ASV1_C137 is a paralogue<br />

of ASV1_C125<br />

B99 99<br />

ORF107b gp32 B111 C113 C113 107–113 Similar to ST1721 from S. tokodaii b<br />

ORF311 gp33 B252 252–311 Similar to ST1722 from S. tokodaii<br />

ORF111 gp35 C108 108–111 Similar to ST1723 from S. tokodaii<br />

B85 C62 62–85<br />

C247 A298 247–298<br />

B74 C67 67–74<br />

B276 276 Similar to ST1724 from S. tokodaii<br />

C106 106 Similar to ST1725 from S. tokodaii<br />

C125 125 Paralogue of ASV1_C137<br />

A367 367<br />

A137 137 Similar to STS262 from S. tokodaii<br />

C806 806 558–785 similar to APE_0858 from Aeropyrum pernix<br />

A96 96<br />

C792 ORF809 ORF808 gp01 B793 B812 C213 C811 B208 208–812 ASV1_B208 and SSV6_C211 are similar to <strong>the</strong><br />

C-term<strong>in</strong>al of <strong>the</strong> C792 homologues<br />

B78 ORF79 ORF80a gp02 A79 A79 B79 79–80 Part of <strong>the</strong> SSV1_C792 module<br />

B68 A58b 58–68<br />

B1232 A1231 1231–1232 Similar to Saci1002 from S. acidocaldarius<br />

C166 ORF176 ORF167 gp03 B169 B170 C134 C170 B130 130–176 Gapped <strong>in</strong> ASV1 and SSV6. Putative membrane<br />

prote<strong>in</strong><br />

B115 ORF112 ORF107a gp04 A123 A113 A88 B112 A82b 82–123 Putative HTH transcriptional regulator<br />

Shorter <strong>in</strong> ASV1 and SSV6a VP1 ORF88b ORF136 VP1 B137 A89 A143 C88 A140 88–143 VP1 structural prote<strong>in</strong> <strong>in</strong> SSV1 (Reiter et al., 1987a) a<br />

VP3 ORF92 ORF92 VP3 A93 C96 B94 C97 B90 92–96 VP3 structural prote<strong>in</strong> <strong>in</strong> SSV1 (Reiter et al., 1987a)<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 7<br />

A, B and C <strong>in</strong>dicate genes on <strong>the</strong> three read<strong>in</strong>g frames of <strong>the</strong> plus-strand, and D, E and F <strong>in</strong>dicate genes on <strong>the</strong> m<strong>in</strong>us-strand. The number follow<strong>in</strong>g <strong>the</strong> letter is <strong>the</strong> number of encoded am<strong>in</strong>o<br />

acids. The 13 ‘core’ genes are <strong>in</strong> boldface, and prote<strong>in</strong>s for which experimental data are available are underl<strong>in</strong>ed. The asterisk <strong>in</strong>dicates an ad hoc ORF name for a gene which is not present <strong>in</strong><br />

<strong>the</strong> NCBI annotation.<br />

a. Core gene <strong>in</strong> Held and Whitaker (2009).<br />

b. The upstream 40 bp of <strong>the</strong> SSV7_C113 homologues are highly conserved <strong>in</strong> all fuselloviruses, with two copies <strong>in</strong> ASV1. In SSV1, this motif is immediately next to <strong>the</strong> BRE+TATA-box of <strong>the</strong> T3<br />

transcript.


8 P. Redder et al.<br />

Fig. 3. Similarity at <strong>the</strong> nucleotide level between selected representative pairs of fusellovirusal genomes.<br />

A. Comparison between SSV1 and SSV5.<br />

B. Between SSV5 and SSV4.<br />

C. Between SSV4 and SSV6.<br />

D. Between SSV6 and ASV1.<br />

E. Between ASV1 and SSVk1.<br />

F. Between SSVk1 and SSV7.<br />

Regions of high (> 70%) pairwise identity on <strong>the</strong> nucleotide level (light grey boxes) are <strong>in</strong>terspersed by regions with no detectable similarity<br />

(white boxes). The dark grey box <strong>in</strong>dicates an exceptional example of similarity between SSV4 and SSV5, where a 7.9 kb region is almost<br />

100% identical between <strong>the</strong> two genomes. The junctions between similar regions and a dissimilar regions (<strong>in</strong>dicated by dotted l<strong>in</strong>es) often<br />

occur <strong>in</strong> <strong>the</strong> middle of genes, and are not conf<strong>in</strong>ed to <strong>in</strong>tergenic regions. Short regions (< 100 bp) of similarity or dissimilarity are not shown.<br />

Black arrows denote ‘core’ genes, dark grey arrows denote ORFs that are found <strong>in</strong> more than one fusellovirus, and light grey arrows denote<br />

ORFs that have no homologues <strong>in</strong> <strong>the</strong> database, some of which may not be prote<strong>in</strong>-cod<strong>in</strong>g.<br />

all have <strong>the</strong>ir best hits to tRNAs from S. solfataricus, with<br />

Gln, Gln, Gly and Lys for SSV5, SSV6, SSV7 and ASV1<br />

respectively. Table 1 shows an overview of <strong>the</strong> genes <strong>in</strong><br />

SSV5, SSV6, SSV7 and ASV1, as well as <strong>the</strong> correspond<strong>in</strong>g<br />

homologues <strong>in</strong> <strong>the</strong> o<strong>the</strong>r fuselloviruses.<br />

Discussion<br />

In this paper we describe four new members of <strong>the</strong> family<br />

Fuselloviridae, SSV5, SSV6, SSV7 and ASV1, isolated<br />

from acidic hot spr<strong>in</strong>gs of Iceland and USA, which <strong>in</strong>fect<br />

members of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera<br />

Sulfolobus and Acidianus.<br />

Until now, fuselloviruses had only been found to replicate<br />

<strong>in</strong> Sulfolobus species. Our discovery of ASV1 <strong>in</strong><br />

Acidianus brierleyi shows that fuselloviruses can propagate<br />

<strong>in</strong> both <strong>the</strong> major culturable genera from aerobic,<br />

acidic hot spr<strong>in</strong>gs. Therefore, it is likely that fuselloviruses<br />

also <strong>in</strong>fect o<strong>the</strong>r host species from <strong>the</strong>se environments,<br />

such as Caldococcus, Vulcanisaeta and<br />

Stygiolobus (Snyder et al., 2007). Fur<strong>the</strong>rmore, <strong>the</strong><br />

family Fuselloviridae presumably also extends its host<br />

range <strong>in</strong>to <strong>the</strong> vast number of currently uncultured<br />

species found <strong>in</strong> o<strong>the</strong>r extreme environments, such<br />

as <strong>the</strong> acid m<strong>in</strong>e dra<strong>in</strong>age eco<strong>system</strong>, where a VP2<br />

homologue, recently found by community genomics<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


Fig. 4. <strong>CRISPR</strong> spacer sequence matches for ASV1 and SSV2 are superimposed on l<strong>in</strong>earized genome maps of ASV1 and SSV2<br />

respectively. ORFs are shown as arrows above and below <strong>the</strong> l<strong>in</strong>e. Sequence matches to spacers are shown as vertical l<strong>in</strong>es. The black<br />

vertical l<strong>in</strong>es denote <strong>the</strong> nucleotide sequence matches, and <strong>the</strong> grey vertical l<strong>in</strong>es show match<strong>in</strong>g am<strong>in</strong>o acid sequences, after translation of<br />

<strong>the</strong> spacer sequences from both DNA strands. The dark boxes below <strong>the</strong> genome maps <strong>in</strong>dicate areas > 50 bp with nucleotide level sequence<br />

similarity to o<strong>the</strong>r fuselloviruses (<strong>the</strong> relevant fusellovirus is <strong>in</strong>dicated to <strong>the</strong> left of <strong>the</strong> dark boxes). In total, <strong>the</strong>re are 12 spacer matches to<br />

ASV1 and 22 matches to SSV2 at a nucleotide level. At an am<strong>in</strong>o acid sequence level, <strong>the</strong>re are 42 spacer matches to ASV1 and 28 matches<br />

to SSV2.<br />

(Andersson and Banfield, 2008), <strong>in</strong>dicate <strong>the</strong> presence<br />

of fuselloviruses.<br />

‘Core’ genes<br />

By almost doubl<strong>in</strong>g <strong>the</strong> number of described fuselloviruses,<br />

we are ref<strong>in</strong><strong>in</strong>g <strong>the</strong> def<strong>in</strong>ion of ‘core’ genes of <strong>the</strong><br />

family. The 18 conserved, or ‘core’ genes, that were<br />

def<strong>in</strong>ed for SSV1, SSV2, SSVk1 and SSVrh (Wiedenheft<br />

et al., 2004) can now be reduced to 13 (Table 1) and<br />

may have to be revised fur<strong>the</strong>r as more fuselloviruses<br />

are sequenced, but our f<strong>in</strong>d<strong>in</strong>gs correlate well with a<br />

recent analysis of fuselloviral proviruses <strong>in</strong> S. islandicus<br />

stra<strong>in</strong>s (Held and Whitaker, 2009). We exclude <strong>the</strong><br />

SSV1_C792 homologues from <strong>the</strong> list of ‘core’ genes,<br />

because we do not consider SSV6_C213 and<br />

ASV1_B208 to be able to fully complement <strong>the</strong> prote<strong>in</strong>s<br />

found <strong>in</strong> o<strong>the</strong>r fuselloviruses, which are about four times<br />

larger (Table 1).<br />

Six of <strong>the</strong> ‘core’ genes have no discernible function<br />

based on <strong>the</strong>ir primary sequence, except for some of<br />

<strong>the</strong>m carry<strong>in</strong>g predicted transmembrane segments<br />

(Table 1), and experimental data will be needed to determ<strong>in</strong>e<br />

<strong>the</strong>ir functional roles. Of <strong>the</strong> rema<strong>in</strong><strong>in</strong>g seven, <strong>the</strong><br />

<strong>in</strong>tegrase function was characterized experimentally<br />

(Muskhelishvili et al., 1993; Muskhelishvili, 1994; Serre<br />

et al., 2002; Letzelter et al., 2004; Clore and Stedman,<br />

2007). Moreover, VP1 and VP3 are virion components <strong>in</strong><br />

SSV1 virions, and VP1 is processed from <strong>the</strong> N-term<strong>in</strong>us<br />

<strong>in</strong> SSV1, to a length of 73 aa (Reiter et al., 1987a), which<br />

may expla<strong>in</strong> <strong>the</strong> significant size difference we observe<br />

among <strong>the</strong> VP1 genes (Table 1). The rema<strong>in</strong><strong>in</strong>g<br />

C-term<strong>in</strong>us of VP1 is similar <strong>in</strong> both length and sequence<br />

to <strong>the</strong> VP3 prote<strong>in</strong>, and <strong>the</strong>ir roles might be partially <strong>in</strong>terchangeable<br />

<strong>in</strong> <strong>the</strong> virion matrix. Bio<strong>in</strong>formatical analyses<br />

predict DnaA-like activity for SSV1_B251 homologues<br />

(Koon<strong>in</strong>, 1992) and transcriptional regulation activity for<br />

three o<strong>the</strong>r ‘core’ genes: SSV1_A79 and SSV1_B129,<br />

which are transcribed early, dur<strong>in</strong>g <strong>in</strong>fection and are probably<br />

<strong>in</strong>volved <strong>in</strong> controll<strong>in</strong>g <strong>the</strong> hosts transcriptional apparatus,<br />

and SSV1_B115, which is co-transcribed toge<strong>the</strong>r<br />

with VP1, VP2, VP3 and SSV1_C792, later <strong>in</strong> <strong>in</strong>fection,<br />

and may be <strong>in</strong>volved <strong>in</strong> controll<strong>in</strong>g <strong>the</strong> assembly and/or<br />

packag<strong>in</strong>g of virions.<br />

‘Non-core’ genes<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 9<br />

Genes that are highly conserved but present <strong>in</strong> a subset<br />

of <strong>the</strong> fuselloviruses could provide a possible way of classify<strong>in</strong>g<br />

<strong>the</strong> fuselloviruses <strong>in</strong>to subgroups, albeit subgroups<br />

that overlap.<br />

Thus, ASV1, SSV6 and SSV1, encode a VP2 homologue,<br />

<strong>in</strong>dicat<strong>in</strong>g that <strong>the</strong>y all share a DNA packag<strong>in</strong>g<br />

<strong>system</strong>. However, <strong>the</strong> difference to <strong>the</strong> SSV1 prote<strong>in</strong> <strong>in</strong><br />

<strong>the</strong> C-term<strong>in</strong>us may <strong>in</strong>dicate an alternative mode of <strong>in</strong>teraction<br />

of <strong>the</strong> prote<strong>in</strong> and viral DNA with <strong>the</strong> major virion<br />

prote<strong>in</strong>s, VP1 and VP3.<br />

Ano<strong>the</strong>r subgroup would be <strong>the</strong> SSVs, which all encode<br />

a highly conserved homologue of SSV1_C80, a prote<strong>in</strong><br />

conta<strong>in</strong><strong>in</strong>g <strong>the</strong> RHH 1 CopG doma<strong>in</strong>. ASV1 does not<br />

encode any gene with obvious sequence similarity to<br />

SSV1_C80. However, ASV1_A59 also has an RHH 1<br />

CopG doma<strong>in</strong>, although it groups with o<strong>the</strong>r RHH1conta<strong>in</strong><strong>in</strong>g<br />

genes, <strong>in</strong>clud<strong>in</strong>g a few Sulfolobus chromosomal<br />

genes (e.g. Saci_0942). Fur<strong>the</strong>rmore, ASV1_A59<br />

occupies <strong>the</strong> same genomic position as <strong>the</strong> SSV1_C80


10 P. Redder et al.<br />

homologues do <strong>in</strong> <strong>the</strong> SSV genomes, and it very likely<br />

acts as a functional homologue of SSV1_C80.<br />

A third subgroup consists of ASV1, SSV7 and SSVk1,<br />

which all encode <strong>the</strong> Rad3-like helicase prote<strong>in</strong> and <strong>the</strong><br />

neighbour<strong>in</strong>g Msed_2283 homologue (Fig. 2). The presence<br />

of <strong>the</strong> helicase strongly suggests that <strong>the</strong>se two<br />

prote<strong>in</strong>s are <strong>in</strong>volved <strong>in</strong> DNA replication or recomb<strong>in</strong>ation,<br />

and it is possible that <strong>the</strong> o<strong>the</strong>r fuselloviruses recruit host<br />

prote<strong>in</strong>s to fulfill <strong>the</strong> same function.<br />

A possible filament prote<strong>in</strong><br />

The most strik<strong>in</strong>g genomic difference among <strong>the</strong> fuselloviruses<br />

is <strong>the</strong> ‘replacement’ of <strong>the</strong> SSV1_C792 module<br />

with <strong>the</strong> SSV6_B1232 module (Fig. 2B). It seems <strong>the</strong><br />

C-term<strong>in</strong>al 170 aa from SSV1_C792 are essential, s<strong>in</strong>ce<br />

<strong>the</strong>y are reta<strong>in</strong>ed as a small separate gene <strong>in</strong> both <strong>the</strong><br />

ASV1 and SSV6 genomes; however, <strong>the</strong> rema<strong>in</strong><strong>in</strong>g<br />

~620 aa of SSV1_C792 and <strong>the</strong> whole of SSV1_B78 are<br />

substituted by SSV6_B1232. The presence of <strong>the</strong><br />

SSV6_B1232 module correlates with a difference <strong>in</strong> <strong>the</strong><br />

number and structure of <strong>the</strong> sticky term<strong>in</strong>al filaments of<br />

<strong>the</strong> SSV6 and ASV1 virions, when compared with <strong>the</strong><br />

SSV1_C792 module viruses (Fig. 1B). Possibly, <strong>the</strong>re is a<br />

phenotype–genotype l<strong>in</strong>k, with <strong>the</strong> SSV1_C792 module<br />

be<strong>in</strong>g responsible for <strong>the</strong> multiple, th<strong>in</strong>, sticky filaments<br />

and <strong>the</strong> SSV6_B1232 module for <strong>the</strong> few, thick, less sticky<br />

filaments. In support of this hypo<strong>the</strong>sis, small amounts of<br />

SSV1_C792 were recently found by mass-spectrometry<br />

<strong>in</strong> SSV1 virions (Menon et al., 2008). Moreover, <strong>the</strong> Phyre<br />

prediction tool suggested that both SSV1_C792 and<br />

SSV6_B1232 had a similar fold to <strong>the</strong> P2 receptor b<strong>in</strong>d<strong>in</strong>g<br />

prote<strong>in</strong> prd1, and it was recently shown that a large<br />

prote<strong>in</strong> is responsible for <strong>the</strong> sticky end-fibres <strong>in</strong> <strong>the</strong> rudivirus<br />

SIRV2 (Ste<strong>in</strong>metz et al., 2008). Never<strong>the</strong>less,<br />

fur<strong>the</strong>r studies will be needed to determ<strong>in</strong>e <strong>the</strong> exact<br />

functions of <strong>the</strong> SSV1_C792 and SSV6_B1232 modules<br />

<strong>in</strong> fuselloviruses.<br />

Fuselloviral nucleotide similarity and a putative<br />

mechanism for <strong>in</strong>terviral recomb<strong>in</strong>ation<br />

The multiple regions of high nucleotide similarity, or even<br />

identity, between <strong>the</strong> fuselloviral genomes do not represent<br />

a ‘core’ fusello-genome, s<strong>in</strong>ce <strong>the</strong> regions of similarity<br />

differ between <strong>the</strong> various pairs of viruses, and often do<br />

not <strong>in</strong>clude <strong>the</strong> ‘core’ genes (Fig. 3). Instead, <strong>the</strong> pattern<br />

of similar and non-similar sections of DNA <strong>in</strong>dicates<br />

frequent recomb<strong>in</strong>ation events between fuselloviruses,<br />

similar to that observed for some bacteriophages (Hendrix<br />

et al., 1999). Possibly this occurs between pairs of fuselloviruses,<br />

present <strong>in</strong> <strong>the</strong> same host; however, we do not<br />

see a similar pattern of sequence similarity for <strong>the</strong> l<strong>in</strong>ear<br />

non-<strong>in</strong>tegrat<strong>in</strong>g archaeal viruses (Vestergaard et al.,<br />

2008a). Therefore, we suggest that a different mechanism<br />

is more likely.<br />

Integrated fusellovirus genomes have been found <strong>in</strong> <strong>the</strong><br />

Sulfolobus solfataricus P2 and <strong>in</strong> four S. islandicus chromosomes,<br />

where no trace of <strong>the</strong> covalently closed circular<br />

DNA (cccDNA) form was detected (Stedman et al., 2003;<br />

Held and Whitaker, 2009). Once a virus has been<br />

‘caught’, a second, slightly different, fusellovirus might<br />

<strong>in</strong>fect <strong>the</strong> same host, and <strong>in</strong>sert itself <strong>in</strong>to <strong>the</strong> same tRNA<br />

gene, result<strong>in</strong>g <strong>in</strong> a concatamer of <strong>the</strong> two fuselloviruses<br />

<strong>in</strong> <strong>the</strong> host chromosome (Fig. 5). This structure might be<br />

ma<strong>in</strong>ta<strong>in</strong>ed for a couple of generations, but it would be<br />

<strong>in</strong>herently unstable if <strong>the</strong> two viral genomes are reasonably<br />

similar, as <strong>the</strong>re would be a high chance of homologous<br />

recomb<strong>in</strong>ation between <strong>the</strong> two <strong>in</strong>tegrated viruses.<br />

Such a recomb<strong>in</strong>ation event would lead to <strong>the</strong> formation of<br />

one cccDNA virus and one <strong>in</strong>serted virus, both of which<br />

would consist of a part of each of <strong>the</strong> orig<strong>in</strong>al two viruses<br />

(Fig. 5). Ow<strong>in</strong>g to <strong>the</strong> very short sequence similarity<br />

required for homologous recomb<strong>in</strong>ation <strong>in</strong> Sulfolobus<br />

(Grogan, 2009), <strong>the</strong> cross-over po<strong>in</strong>t could potentially be<br />

<strong>in</strong> many different places, and each of <strong>the</strong>se recomb<strong>in</strong>ation<br />

events would form a unique mixture of <strong>the</strong> two viruses,<br />

similar to meiosis <strong>in</strong> eukaryotes. Thus, this offers a<br />

mechanism for rapidly generat<strong>in</strong>g a large number of<br />

diverse viral offspr<strong>in</strong>g. Our model does not exclude direct<br />

recomb<strong>in</strong>ation between <strong>the</strong> cccDNA forms of fuselloviruses,<br />

but we propose that this type of ‘tandem <strong>in</strong>sertion’<br />

event happens frequently (on an evolutionary scale) <strong>in</strong><br />

nature, and that repeated events, each <strong>in</strong>volv<strong>in</strong>g a different<br />

pair of ‘parent’ fuselloviruses, would eventually<br />

produce <strong>the</strong> patchwork viral genomes we see today<br />

(Fig. 3).<br />

Our model also serves to expla<strong>in</strong> why fuselloviruses<br />

have developed an <strong>in</strong>tegrase that is <strong>in</strong>activated upon <strong>in</strong>tegration.<br />

The <strong>in</strong>tegrase is not essential for viral propagation<br />

(Clore and Stedman, 2007) but if <strong>the</strong> proposed recomb<strong>in</strong>ation<br />

mechanism is correct, <strong>the</strong>n <strong>the</strong> unique SSV-type<br />

<strong>in</strong>tegrase will help <strong>the</strong> virus <strong>in</strong> <strong>the</strong> long term, by promot<strong>in</strong>g<br />

recomb<strong>in</strong>ation with closely related viruses, s<strong>in</strong>ce <strong>the</strong> <strong>in</strong>activation<br />

provides a high chance of <strong>the</strong> viral genome be<strong>in</strong>g<br />

‘caught’ <strong>in</strong> an <strong>in</strong>tegrated form <strong>in</strong> <strong>the</strong> chromosome. Never<strong>the</strong>less,<br />

<strong>in</strong>activation of <strong>the</strong> <strong>in</strong>tegrase is not required for<br />

recomb<strong>in</strong>ation between tandem <strong>in</strong>sertions. Studies of <strong>the</strong><br />

Sulfolobus plasmids pARN3 and pARN4 reveal stretches<br />

of nucleotide identity, which might have been generated<br />

by tandem <strong>in</strong>sertions, even though <strong>the</strong>se plasmids carry<br />

non-<strong>in</strong>activatable <strong>in</strong>tegrases (Greve et al., 2004).<br />

The <strong>in</strong>herent <strong>in</strong>stability of a tandem <strong>in</strong>sertion makes it<br />

difficult, if not impossible, to detect <strong>in</strong> nature. However, a<br />

concatamer of <strong>in</strong>serted viral genomes, similar to <strong>the</strong> one<br />

proposed <strong>in</strong> our model, was recently discovered <strong>in</strong> <strong>the</strong><br />

chromosome of Methanococcus voltae A3. There, <strong>the</strong> two<br />

viral genomes <strong>in</strong>tegrated <strong>in</strong>to <strong>the</strong> same tRNA gene are<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


very different, prevent<strong>in</strong>g homologous recomb<strong>in</strong>ation,<br />

thus ‘trapp<strong>in</strong>g’ <strong>the</strong> viral concatamer <strong>in</strong> <strong>the</strong> host chromosome<br />

(Krupovic and Bamford, 2008). The attP sites of<br />

SSV2 and SSV7 as well as SSV5 and SSV6 match <strong>the</strong><br />

same tRNA <strong>in</strong> S. solfataricus P2 (Table 1), mak<strong>in</strong>g it likely<br />

that fuselloviruses are also able to <strong>in</strong>tegrate <strong>in</strong>to <strong>the</strong> same<br />

tRNA, form<strong>in</strong>g concatamers, which are unstable due to<br />

<strong>the</strong> similarity between <strong>the</strong> fuselloviruses. Moreover, it was<br />

shown that SSVk1 is able to <strong>in</strong>tegrate <strong>in</strong>to several different<br />

sites <strong>in</strong> <strong>the</strong> host genome (Wiedenheft et al., 2004),<br />

<strong>in</strong>creas<strong>in</strong>g <strong>the</strong> likelihood of f<strong>in</strong>d<strong>in</strong>g a ‘partner’ for recomb<strong>in</strong>ation.<br />

F<strong>in</strong>ally, examples of related viruses <strong>in</strong>fect<strong>in</strong>g <strong>the</strong><br />

same host at <strong>the</strong> same time are known for Sulfolobales,<br />

such as AFV6, AFV7 and AFV8 <strong>in</strong> Acidianus convivator<br />

(Vestergaard et al., 2008b).<br />

If <strong>the</strong> ‘tandem <strong>in</strong>sertion’ model is correct, <strong>the</strong>n an evolutionary<br />

tree of an entire viral genome has no mean<strong>in</strong>g,<br />

nor would that from <strong>in</strong>dividual ‘core’ genes (s<strong>in</strong>ce two<br />

halves of <strong>the</strong> same gene might orig<strong>in</strong>ate from different<br />

‘parent’ viruses). One might <strong>in</strong>stead analyse genes,<br />

described <strong>in</strong> <strong>the</strong> previous section, that are not shared<br />

by all fuselloviruses, s<strong>in</strong>ce <strong>the</strong>se genes cannot serve<br />

as cross-over po<strong>in</strong>ts for homologous recomb<strong>in</strong>ation.<br />

Although for <strong>the</strong> moment, <strong>the</strong> data set is too small for a<br />

phylogenetic analysis based on <strong>the</strong>se genes, <strong>the</strong> presence<br />

or absence of certa<strong>in</strong> genes <strong>in</strong> a subset of <strong>the</strong><br />

viruses, has provided important clues to understand<strong>in</strong>g<br />

prote<strong>in</strong> functions <strong>in</strong> <strong>the</strong> fuselloviruses, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> putative<br />

filament prote<strong>in</strong>s SSV1_C792 and SSV6_B1232.<br />

With <strong>the</strong> current understand<strong>in</strong>g of <strong>the</strong> <strong>CRISPR</strong> antiviral<br />

<strong>system</strong>, high nucleotide similarity between viruses<br />

should be disadvantageous, s<strong>in</strong>ce a s<strong>in</strong>gle spacer,<br />

match<strong>in</strong>g a conserved region, will provide a host with<br />

immunity to several virus stra<strong>in</strong>s (Lillestøl et al., 2009).<br />

Never<strong>the</strong>less, <strong>the</strong> puzzl<strong>in</strong>g fact rema<strong>in</strong>s that fuselloviruses<br />

do possess highly similar, sometimes identical,<br />

nucleotide regions, and it is possible that <strong>the</strong> <strong>in</strong>tegration<br />

and/or <strong>the</strong> frequent recomb<strong>in</strong>ation somehow provide <strong>the</strong><br />

fuselloviruses with <strong>the</strong> means to evade <strong>the</strong> <strong>CRISPR</strong><br />

<strong>system</strong> <strong>in</strong> <strong>the</strong>ir hosts.<br />

It has been proposed that <strong>the</strong>rmoacidophilic archaeal<br />

viruses are highly mobile, even between distant hot<br />

spr<strong>in</strong>gs <strong>in</strong> <strong>the</strong> same geo<strong>the</strong>rmal area, and that different<br />

fuselloviruses cont<strong>in</strong>uously <strong>in</strong>fect a more-or-less stable<br />

population of host species (Snyder et al., 2007). The high<br />

nucleotide similarity we have found, even between<br />

fuselloviruses isolated on different cont<strong>in</strong>ents, seems to<br />

confirm that <strong>the</strong>y do manage to exchange genetic material<br />

over <strong>the</strong> <strong>in</strong>tercont<strong>in</strong>ental distances that separate some of<br />

<strong>the</strong> geo<strong>the</strong>rmal ‘islands’ <strong>in</strong> <strong>the</strong> cold ‘ocean’.<br />

Experimental procedures<br />

Sulfolobus and Acidianus medium<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 11<br />

Fig. 5. Proposed model for recomb<strong>in</strong>ation<br />

between <strong>in</strong>tegrated fuselloviruses.<br />

A. The first fusellovirus (SSVa) <strong>in</strong>fects <strong>the</strong><br />

host, and <strong>in</strong>tegrates <strong>in</strong>to <strong>the</strong> chromosome.<br />

B. The second fusellovirus (SSVb) <strong>in</strong>fects <strong>the</strong><br />

host, and <strong>in</strong>tegrates <strong>in</strong>to <strong>the</strong> same tRNA as<br />

SSVa.<br />

C. The ‘tandem <strong>in</strong>tegration’ of SSVa and<br />

SSVb. The dashed arrows <strong>in</strong>dicate examples<br />

of homologous recomb<strong>in</strong>ation sites.<br />

D. Examples of ‘offspr<strong>in</strong>g’ cccDNA<br />

fuselloviruses from <strong>the</strong> recomb<strong>in</strong>ation of SSVa<br />

and SSVb.<br />

Z medium: 25 mM (NH4)2SO4, 3 mM K2SO4, 1.5 mM KCl,<br />

20 mM glyc<strong>in</strong>e, 4.0 mM MnCl2, 10.4 mM Na2B4O7, 0.38 mM<br />

ZnSO4, 0.13 mM CuSO4, 62 nM Na2MoO4, 59 nM VOSO4,<br />

18 nM CoSO4, 19 nM NiSO4, 0.1 mM HCl, 1 mM MgCl2,<br />

0.3 mM Ca(NO3)2 adjusted to pH 3.5 with H2SO4. T medium:<br />

Identical to Z medium, but with 0.2% Tryptone added. ST<br />

medium: Identical to T medium, but with small amounts of<br />

elemental sulphur added.


12 P. Redder et al.<br />

Isolation and purification of hosts and viruses<br />

Samples were collected from <strong>the</strong> Hveragerdi hot-spr<strong>in</strong>g area<br />

<strong>in</strong> south-western Iceland and 1 ml was used to establish an<br />

enrichment culture, by <strong>in</strong>cubat<strong>in</strong>g <strong>in</strong> 50 ml ST medium for<br />

9 days at 80°C, after which 1 ml was of <strong>the</strong> enrichment was<br />

transferred to fresh ST medium and <strong>in</strong>cubated for a fur<strong>the</strong>r<br />

4 days. Four millilitres of <strong>the</strong> enrichment (designated G4ST)<br />

was <strong>the</strong>n centrifuged at 4000 r.p.m. for 20 m<strong>in</strong> (Jouan S40<br />

rotor) to remove cells, whereupon <strong>the</strong> supernatant was spun<br />

fur<strong>the</strong>r at 38 000 r.p.m. for 3 h to pellet virions (Beckman<br />

SW60 rotor). F<strong>in</strong>ally, <strong>the</strong> pellet was resuspended <strong>in</strong> 50 ml of<br />

<strong>the</strong> supernatant. The resuspension was <strong>the</strong>n exam<strong>in</strong>ed by<br />

electron microscopy, and several different morphotypes of<br />

virus-like particles were <strong>in</strong> evidence. Among <strong>the</strong>se was a<br />

group of fusellovirus-like particles, but with different filament<br />

structures at <strong>the</strong> end, and a large diversity <strong>in</strong> <strong>the</strong>ir morphotypes,<br />

rang<strong>in</strong>g from sausage-shaped to an almost sp<strong>in</strong>dlelike<br />

pear-shape (Fig. 1).<br />

To isolate s<strong>in</strong>gle host–virus <strong>system</strong>s, G4ST was spread on<br />

a plate conta<strong>in</strong><strong>in</strong>g ST medium and solidified with Gel-rite<br />

(Sigma-Aldrich, St Louis, USA). After 10 days of <strong>in</strong>cubation<br />

at 80°C, 30 colonies of representative sizes, shapes and<br />

colours were transferred to 5 ml liquid ST medium and <strong>in</strong>cubated<br />

with vigorous shak<strong>in</strong>g for 4 days. Each of <strong>the</strong> grow<strong>in</strong>g<br />

stra<strong>in</strong>s was exam<strong>in</strong>ed for virus <strong>in</strong> <strong>the</strong> electron microscope,<br />

and <strong>the</strong> SSV6 and SSV7 were detected <strong>in</strong> <strong>the</strong> supernatant of<br />

stra<strong>in</strong> G4ST-T-11 and G4T-1 respectively. The 16S rRNA<br />

genes of G4T-1 and G4ST-T-11 were amplified us<strong>in</strong>g <strong>the</strong><br />

primers 8aF: TCYGGTTGATCCTGCC and 1512uR: ACG<br />

GHTACCTTGTTACGACTT (Accession number FJ870913<br />

for G4ST-T-11 and FJ870914 for G4T-1).<br />

SSV5 was present <strong>in</strong> HVE14, an enrichment culture, established<br />

from a natural sample collected near <strong>the</strong> G4 site, but<br />

10 years previously (Zillig et al., 1996). It was propagated <strong>in</strong><br />

S. solfataricus P2, by mix<strong>in</strong>g a small amount of HVE14 with a<br />

well-grown S. solfataricus P2 culture (1:1000), which was<br />

<strong>the</strong>n harvested and used for DNA isolation of extrachromosomal<br />

elements us<strong>in</strong>g plasmid m<strong>in</strong>iprep kit from Qiagen.<br />

Acidianus brierleyi were cultured at 70°C <strong>in</strong> ST medium and<br />

ASV1 was recovered from <strong>the</strong> supernatant by ultracentrifugation<br />

(38 000 r.p.m. for 3 h <strong>in</strong> a Beckman SW41 rotor).<br />

Electron microscopy<br />

Ten microlitres of <strong>the</strong> samples was deposited on a carbon<br />

and formvar coated grid (Ted Pella, Redd<strong>in</strong>g, CA, USA) and<br />

left for 2 m<strong>in</strong> before remov<strong>in</strong>g excess fluid. Ten microlitres of<br />

2% Uranyl-acetate or phosphotunstenate (Sigma-Aldrich)<br />

was allowed to sta<strong>in</strong> <strong>the</strong> samples negatively for 10 s. Images<br />

were taken on a JEOL1200EXII microscope with an 80 kV<br />

beam, us<strong>in</strong>g a CCD camera.<br />

DNA isolation and sequenc<strong>in</strong>g<br />

Six litres of G4ST-T-11 was grown <strong>in</strong> a fermentor, and after<br />

remov<strong>in</strong>g cells by centrifug<strong>in</strong>g twice at 4000 r.p.m. for 20 m<strong>in</strong><br />

(Sorvall GS-3 rotor), <strong>the</strong> virions <strong>in</strong> <strong>the</strong> supernatant were concentrated<br />

us<strong>in</strong>g a Sartorius Vivaflow 200 filter cartridge (Sartorius,<br />

Goett<strong>in</strong>gen Germany). The result<strong>in</strong>g 15 ml was fur<strong>the</strong>r<br />

concentrated by sp<strong>in</strong>n<strong>in</strong>g at 38 000 r.p.m. for 3 h us<strong>in</strong>g a<br />

SW41 Beckman rotor, and f<strong>in</strong>ally <strong>the</strong> virions were treated with<br />

Protease K and <strong>the</strong> DNA was extracted with Phenol, Phenol/<br />

Chloroform and Chloroform extraction. The SSV6 DNA was<br />

<strong>the</strong>n treated as described below for SSV7.<br />

In order to sequence SSV7, 5 ml of an exponential G4T-1culture<br />

was pelleted by centrifugation, and resuspended <strong>in</strong> Z<br />

medium. The SSV7 production was <strong>in</strong>duced by 50 J cm -2 UV<br />

radiation (254 nm) under constant mild agitation, and <strong>the</strong> cells<br />

were <strong>the</strong>n transferred to 45 ml T medium for over-night <strong>in</strong>cubation.<br />

Five millilitres was used for a m<strong>in</strong>iprep (QIAprep Sp<strong>in</strong><br />

M<strong>in</strong>iprep Kit, QIAGEN SA, Courtaboeuf, France), which was<br />

used for amplification and subsequent library construction<br />

based on <strong>the</strong> L<strong>in</strong>ker Amplified Shotgun Library method described<br />

at http://www.sci.sdsu.edu/PHAGE/LASL/<strong>in</strong>dex.htm.<br />

Shot-gun library construction of SSV5 and SSV6, as well<br />

as ASV1-conta<strong>in</strong><strong>in</strong>g A. brierleyi total DNA, was performed<br />

as described previously us<strong>in</strong>g SmaI digested pUC18 as<br />

clon<strong>in</strong>g vector (Peng, 2008). Plasmid DNA of clones, from<br />

all four libraries, were purified us<strong>in</strong>g a Model 8000 Biorobot<br />

(Qiagen, Westburg, Germany) and sequenced <strong>in</strong><br />

MegaBACE 1000 Sequenators (Amersham Biotech, Amersham,<br />

UK). Sequences were assembled us<strong>in</strong>g Sequencher<br />

4.5 (http://www.genecodes.com). Genome annotations and<br />

comparisons were done us<strong>in</strong>g <strong>the</strong> MUTAGEN software<br />

(Brügger et al., 2003) with a m<strong>in</strong>imum ORF-length set to<br />

50 aa and allow<strong>in</strong>g AUG, GUG and UUG as possible start<br />

codons. Accession numbers are EU030939, FJ870915,<br />

FJ870916 and FJ870917 for SSV5, SSV6, SSV7 and ASV1<br />

respectively.<br />

<strong>CRISPR</strong> spacer analysis<br />

To obta<strong>in</strong> a list of spacer sequences from Sulfolobales, <strong>the</strong><br />

follow<strong>in</strong>g partial or full genomes were used: S. solfataricus<br />

P2, S. tokodaii 7, S. acidocaldarius DSM 639, Metallosphaera<br />

sedula DSM5348 from GenBank (http://<br />

www.ncbi.nlm.nih.gov/Genbank/), Sulfolobus islandicus<br />

stra<strong>in</strong>s LD85, YG5714, YN1551, M164 and U328 from JGI<br />

(http://www.jgi.doe.gov/genome-projects/), and S. islandicus<br />

stra<strong>in</strong>s HVE10/4 and REY15A and Acidianus brierleyi (K.<br />

Brügger and Q. She, unpubl. data). <strong>CRISPR</strong>s were identified<br />

us<strong>in</strong>g publicly available software (Edgar, 2007; Bland et al.,<br />

2007). Spacer sequences from each repeat-cluster were<br />

aligned (Sæbø et al., 2005) aga<strong>in</strong>st <strong>the</strong> fuselloviral genomes<br />

at a nucleotide level (Shah et al., 2009). Additionally, spacers<br />

were aligned aga<strong>in</strong>st am<strong>in</strong>o acid sequences of annotated<br />

ORFs of <strong>the</strong> Fuselloviruses, at an am<strong>in</strong>o acid level (Vestergaard<br />

et al., 2008a; Shah et al., 2009). Significance cut-offs<br />

were determ<strong>in</strong>ed for both alignment types by us<strong>in</strong>g <strong>the</strong><br />

genome sequence of Saccharomyces cerevisiae as a negative<br />

control.<br />

Acknowledgements<br />

P.R. was funded by grant VIRAR (NT05-2_41674) from <strong>the</strong><br />

Agence Nationale de la Recherche, France. The research <strong>in</strong><br />

Copenhagen was supported by grants from <strong>the</strong> Grundforskn<strong>in</strong>gsfond<br />

and <strong>the</strong> Reseach Council for Natural Sciences. We<br />

would also like to thank <strong>the</strong> Electron Microscopy Platform<br />

at Institut Pasteur for helpful advice and use of <strong>the</strong>ir<br />

JEOL1200EXII microscope.<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


References<br />

Andersson, A.F., and Banfield, J.F. (2008) Virus population<br />

dynamics and acquired virus resistance <strong>in</strong> natural microbial<br />

communities. Science 320: 1047–1050.<br />

Arnold, H.P., She, Q., Phan, H., Stedman, K., Prangishvili,<br />

D., Holz, I., et al. (1999) The genetic element pSSVx of<br />

<strong>the</strong> extremely <strong>the</strong>rmophilic crenarchaeon Sulfolobus is a<br />

hybrid between a plasmid and a virus. Mol Microbiol 34:<br />

217–226.<br />

Bath, C., and Dyall-Smith, M.L. (1998) His1, an archaeal<br />

virus of <strong>the</strong> Fuselloviridae family that <strong>in</strong>fects Haloarcula<br />

hispanica. J Virol 72: 9392–9395.<br />

Bath, C., Cukalac, T., Porter, K., and Dyall-Smith, M.L. (2006)<br />

His1 and His2 are distantly related, sp<strong>in</strong>dle-shaped haloviruses<br />

belong<strong>in</strong>g to <strong>the</strong> novel virus group, Salterprovirus.<br />

Virology 350: 228–239.<br />

Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />

Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />

of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159:<br />

358–366.<br />

Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K.,<br />

Kyrpides, N.C., and Hugenholtz, P. (2007) <strong>CRISPR</strong> Recognition<br />

Tool (CRT): a tool for automatic detection of<br />

clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />

Bio<strong>in</strong>formatics 8: 209–217.<br />

Brügger, K., Redder, P., and Skovgaard, M. (2003)<br />

MUTAGEN: multi-user tool for annotat<strong>in</strong>g genomes. Bio<strong>in</strong>formatics<br />

19: 2480–2481.<br />

Clore, A.J., and Stedman, K.M. (2007) The SSV1 viral <strong>in</strong>tegrase<br />

is not essential. Virology 361: 103–111.<br />

Edgar, R.C. (2007) PILER-CR: fast and accurate identification<br />

of <strong>CRISPR</strong> repeats. BMC Bio<strong>in</strong>formatics 8: 18–24.<br />

Fröls, S., Gordon, P.M., Panlilio, M.A., Schleper, C., and<br />

Sensen, C.W. (2007) Elucidat<strong>in</strong>g <strong>the</strong> transcription cycle of<br />

<strong>the</strong> UV-<strong>in</strong>ducible hyper<strong>the</strong>rmophilic archaeal virus SSV1 by<br />

DNA microarrays. Virology 365: 48–59.<br />

Gesl<strong>in</strong>, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot,<br />

G., and Prieur, D. (2003) PAV1, <strong>the</strong> first virus-like particle<br />

isolated from a hyper<strong>the</strong>rmophilic euryarchaeote, ‘Pyrococcus<br />

abyssi’. J Bacteriol 185: 3888–3894.<br />

Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />

R.A. (2004) Genomic comparison of archaeal conjugative<br />

plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />

Grogan, D.W. (2009) Homologous recomb<strong>in</strong>ation <strong>in</strong> Sulfolobus<br />

acidocaldarius: genetic assays and functional properties.<br />

Biochem Soc Trans 37 (Pt 1): 88–91.<br />

Guixa-Boixareu, N., Calderon-Paz, J.I., Heldal, M., Bratbak,<br />

G., and Pedros-Alio, C. (1996) Viral lysis and bacterivory<br />

as prokaryotic loss factors along a sal<strong>in</strong>ity gradient. Aquat<br />

Microb Ecol 11: 215–227.<br />

Här<strong>in</strong>g, M., Rachel, R., Peng, X., Garrett, R.A., and Prangishvili,<br />

D. (2005) Viral diversity <strong>in</strong> hot spr<strong>in</strong>gs of Pozzuoli,<br />

Italy, and characterization of a unique archaeal virus, Acidianus<br />

bottle-shaped virus, from a new family, <strong>the</strong> Ampullaviridae.<br />

J Virol 79: 9904–9911.<br />

Held, N.L., and Whitaker, R.J. (2009) Viral biogeography<br />

revealed by signatures <strong>in</strong> Sulfolobus islandicus genomes.<br />

Environ Microbiol 11: 457–466.<br />

Hendrix, R.W., Smith, M.C.M., Burns, R.N., Ford, M.E., and<br />

Hatfull, G.F. (1999) Evolutionary relationships among<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />

Fuselloviral diversity 13<br />

diverse bacteriophages and prophages: all <strong>the</strong> world’s a<br />

phage. Proc Natl Acad Sci USA 96: 2192–2197.<br />

Kelley, L.A., and Sternberg, M.J.E. (2009) Prote<strong>in</strong> structure<br />

prediction on <strong>the</strong> web: a case study us<strong>in</strong>g <strong>the</strong> Phyre server.<br />

Nature Protocols 4: 363–371.<br />

Koon<strong>in</strong>, E.V. (1992) Archaebacterial virus SSV1 encodes a<br />

putative DnaA-like prote<strong>in</strong>. Nucleic Acids Res 20: 1143.<br />

Kraft, P., Kümmel, D., Oeck<strong>in</strong>ghaus, A., Gauss, G.H.,<br />

Wiedenheft, B., Young, M., and Lawrence, C.M. (2004a)<br />

Structure of D-63 from Sulfolobus sp<strong>in</strong>dle-shaped virus 1:<br />

surface properties of <strong>the</strong> dimeric four-helix bundle suggest<br />

an adaptor prote<strong>in</strong> function. J Virol 78: 7438–7442.<br />

Kraft, P., Oeck<strong>in</strong>ghaus, A., Kümmel, D., Gauss, G.H.,<br />

Gilmore, J., Wiedenheft, B., et al. (2004b) Crystal structure<br />

of F-93 from Sulfolobus sp<strong>in</strong>dle-shaped virus 1, a w<strong>in</strong>gedhelix<br />

DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>. J Virol 78: 11544–11550.<br />

Krupovic, M., and Bamford, D.H. (2008) <strong>Archaea</strong>l proviruses<br />

TKV4 and MVV extend <strong>the</strong> PRD1-adenovirus l<strong>in</strong>eage to<br />

<strong>the</strong> phylum Euryarchaeota. Virology 375: 292–300.<br />

Letzelter, C., Duguet, M., and Serre, M.C. (2004) Mutational<br />

analysis of <strong>the</strong> archaeal tyros<strong>in</strong>e recomb<strong>in</strong>ase SSV1 <strong>in</strong>tegrase<br />

suggests a mechanism of DNA cleavage <strong>in</strong> trans.<br />

J Biol Chem 279: 28936–28944.<br />

Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H.,<br />

Christiansen, J., and Garrett, R.A. (2009) <strong>CRISPR</strong> families<br />

of<strong>the</strong>crenarchaealgenusSulfolobus:bidirectionaltranscription<br />

and dynamic properties. Mol Microbiol 72: 259–272.<br />

Marraff<strong>in</strong>i, L.A., and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />

limits horizontal gene transfer <strong>in</strong> Staphylococci by<br />

target<strong>in</strong>g DNA. Science 322: 1843–1845.<br />

Mart<strong>in</strong>, A., Yeats, S., Janekovic, D., Reiter, W.-D., Aicher, W.,<br />

and Zillig, W. (1984) SAV1, a temperate u.v.-<strong>in</strong>ducible DNA<br />

virus-like particle from archaebacterium Sulfolobus acidocaldarius<br />

isolate B12. EMBO J 3: 2165–2168.<br />

Menon, S.K., Maaty, W.S., Corn, G.J., Kwok, S.C., Eilers,<br />

B.J., Kraft, P., et al. (2008) Cyste<strong>in</strong>e usage <strong>in</strong> Sulfolobus<br />

sp<strong>in</strong>dle-shaped virus 1 and extension to hyper<strong>the</strong>rmophilic<br />

viruses <strong>in</strong> general. Virology 376: 270–278.<br />

Muskhelishvili, G. (1994) The archaeal SSV <strong>in</strong>tegrase promotes<br />

<strong>in</strong>termolecular excisive recomb<strong>in</strong>ation <strong>in</strong> vitro. Syst<br />

Appl Microbiol 16: 605–608.<br />

Muskhelishvili, G., Palm, P., and Zillig, W. (1993) SSV1encoded<br />

site-specific recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus<br />

shibatae. Mol Gen Genet 237: 334–342.<br />

Oren, A., Bratbak, G., and Hendal, M. (1997) Occurrence of<br />

virus-like particles <strong>in</strong> <strong>the</strong> Dead Sea. Extremophiles 1: 143–<br />

149.<br />

Palm, P., Schleper, C., Grampp, B., Yeats, S., McWilliam, P.,<br />

Reiter, W.D., and Zillig, W. (1991) Complete nucleotide<br />

sequence of <strong>the</strong> virus SSV1 of <strong>the</strong> archaebacterium Sulfolobus<br />

shibatae. Virology 185: 242–250.<br />

Peng, X. (2008) Evidence for <strong>the</strong> horizontal transfer of an<br />

<strong>in</strong>tegrase gene from a fusellovirus to a pRN-like plasmid<br />

with<strong>in</strong> a s<strong>in</strong>gle stra<strong>in</strong> of Sulfolobus and <strong>the</strong> implications for<br />

plasmid survival. Microbiol 154 (Pt 2): 383–391.<br />

Porter, K., Russ, B.E., and Dyall-Smith, M.L. (2007) Virus–<br />

host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10:<br />

418–424.<br />

Prangishvili, D. (2003) Evolutionary <strong>in</strong>sights from studies on<br />

viruses of hyper<strong>the</strong>rmophilic archaea. Res Microbiol 154:<br />

289–294.


14 P. Redder et al.<br />

Prangishvili, D., Garrett, R.A., and Koon<strong>in</strong>, E.V. (2006a)<br />

Evolutionary genomics of archaeal viruses: unique viral<br />

genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–67.<br />

Prangishvili, D., Vestergaard, G., Här<strong>in</strong>g, M., Aramayo, R.,<br />

Basta, T., Rachel, R., and Garrett, R.A. (2006b) Structural<br />

and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />

virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />

cycle. J Mol Biol 359: 1203–1216.<br />

Rachel, R., Bettstetter, M., Hedlund, B.P., Här<strong>in</strong>g, M.,<br />

Kessler, A., Stetter, K.O., and Prangishvili, D. (2002)<br />

Remarkable morphological diversity of viruses and viruslike<br />

particles <strong>in</strong> hot terrestrial environments. Arch Virol 147:<br />

2419–2429.<br />

Reiter, W.-D., Palm, P., Henschen, A., Lottspeich, F., Zillig,<br />

W., and Grampp, B. (1987a) Identification and characterization<br />

of <strong>the</strong> genes encod<strong>in</strong>g three structural prote<strong>in</strong>s of<br />

<strong>the</strong> Sulfolobus virus-like particle SSV1. Mol Gen Genet<br />

206: 144–153.<br />

Reiter, W.D., Palm, P., Yeats, S., and Zillig, W. (1987b)<br />

Gene expression <strong>in</strong> archaebacteria: physical mapp<strong>in</strong>g<br />

of constitutive and UV-<strong>in</strong>ducible transcripts from <strong>the</strong><br />

Sulfolobus virus-like particle SSV1. Mol Gen Genet 209:<br />

270–275.<br />

Rice, G., Stedman, K., Snyder, J., Wiedenheft, B., Willits, D.,<br />

et al. (2001) Viruses from extreme <strong>the</strong>rmal environments.<br />

Proc Natl Acad Sci USA 98: 13341–13345.<br />

Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K.,<br />

and Rognes, T. (2005) PARALIGN: rapid and sensitive<br />

sequence similarity searches powered by parallel comput<strong>in</strong>g<br />

technology. Nucleic Acids Res 33: 535–539.<br />

Schleper, C., Kubo, K., and Zillig, W. (1992) The particle<br />

SSV1 from <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon<br />

Sulfolobus is a virus: demonstration of <strong>in</strong>fectivity and of<br />

transfection with viral DNA. Proc Natl Acad Sci USA 89:<br />

7645–7649.<br />

Serre, M.-C., Letzelter, C., Garel, J.-R., and Duguet, M.<br />

(2002) Cleavage properties of an archaeal site-specific<br />

recomb<strong>in</strong>ase, <strong>the</strong> SSV1 <strong>in</strong>tegrase. J Biol Chem 277:<br />

16758–16767.<br />

Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />

of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />

of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />

<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem Soc Trans 37: 23–<br />

28.<br />

Snyder, J.C., Wiedenheft, B., Lav<strong>in</strong>, M., Roberto, F.F.,<br />

Spuhler, J., Ortmann, A.C., et al. (2007) Virus movement<br />

ma<strong>in</strong>ta<strong>in</strong>s local virus population diversity. Proc Natl Acad<br />

Sci USA 104: 19102–19107.<br />

Stedman, K.M., She, Q., Phan, H., Arnold, H.P., Holz, I.,<br />

Garrett, R.A., and Zillig, W. (2003) Relationships between<br />

fuselloviruses <strong>in</strong>fect<strong>in</strong>g <strong>the</strong> extremely <strong>the</strong>rmophilic<br />

archaeon Sulfolobus: SSV1 and SSV2. Res Microbiol 154:<br />

295–302.<br />

Ste<strong>in</strong>metz, N.F., Bize, A., K<strong>in</strong>dlay, K.C., Lomonosoff, G.P.,<br />

Manchester, M., Evans, D.J., and Prangishvili, D. (2008)<br />

Site-specific and spatially controlled addressability of a<br />

new viral nanobuild<strong>in</strong>g block: Sulfolobus islandicus rodshaped<br />

virus 2. Adv Funct Mater 18: 1–9.<br />

Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />

M., Phan, H., et al. (2008a) SRV, a new rudiviral isolate<br />

from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />

with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>.<br />

J Bacteriol 190: 6837–6845.<br />

Vestergaard, G., Aramayo, R., Basta, T., Här<strong>in</strong>g, M., Peng,<br />

X., Brügger, K., et al. (2008b) Structure of <strong>the</strong> acidianus<br />

filamentous virus 3 and comparative genomics of related<br />

archaeal lipothrixviruses. J Virol 82: 371–381.<br />

Wiedenheft, B., Stedman, K., Roberto, F., Willits, D., Gleske,<br />

A.K., Zoeller, L., et al. (2004) Comparative genomic analysis<br />

of hyper<strong>the</strong>rmophilic archaeal Fuselloviridae viruses.<br />

J Virol 78: 1954–1961.<br />

Xiang, X., Chen, L., Huang, X., Luo, Y., She, Q., and Huang,<br />

L. (2005) Sulfolobus tengchongensis sp<strong>in</strong>dle-shaped virus<br />

STSV1: virus–host <strong>in</strong>teractions and genomic features.<br />

J Virol 79: 8677–8686.<br />

Yeats, S., McWilliam, P., and Zillig, W. (1982) A plasmid <strong>in</strong> <strong>the</strong><br />

archaebacterium Sulfolobus acidocaldarius. EMBO J 1:<br />

1035–1038.<br />

Zillig, W., Prangishvilli, D., Schleper, C., Elfer<strong>in</strong>k, M., Holz, I.,<br />

Albers, S., et al. (1996) Viruses, plasmids and o<strong>the</strong>r<br />

genetic elements of <strong>the</strong>rmophilic and hyper<strong>the</strong>rmophilic<br />

<strong>Archaea</strong>. FEMS Microbiol Rev 18: 225–236.<br />

©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology


Environmental Microbiology (2010) 12(11), 2918–2930 doi:10.1111/j.1462-2920.2010.02266.x<br />

Metagenomic analyses of novel viruses and plasmids<br />

from a cultured environmental sample of<br />

hyper<strong>the</strong>rmophilic neutrophilesemi_2266 2918..2930<br />

Roger A. Garrett, 1 * David Prangishvili, 2<br />

Shiraz A. Shah, 1 Monika Reuter, 2,3 Karl O. Stetter 3<br />

and Xu Peng 1<br />

1 <strong>Archaea</strong> Centre, Department of Biology, Copenhagen<br />

University, Ole Maaløes Vej 5, DK-2200 Copenhagen N,<br />

Denmark.<br />

2 Institut Pasteur, Molecular Biology of <strong>the</strong> Gene <strong>in</strong><br />

Extremophiles Unit, rue Dr. Roux 25, 75724 Paris<br />

Cedex 15, France.<br />

3 Department of Microbiology, <strong>Archaea</strong> Centre, University<br />

of Regensburg, D-93053 Regensburg, Germany.<br />

Summary<br />

Two novel viral genomes and four plasmids were<br />

assembled from an environmental sample collected<br />

from a hot spr<strong>in</strong>g at Yellowstone National Park, USA,<br />

and ma<strong>in</strong>ta<strong>in</strong>ed anaerobically <strong>in</strong> a bioreactor at 85°C<br />

and pH 6. The double-stranded DNA viral genomes<br />

are l<strong>in</strong>ear (22.7 kb) and circular (17.7 kb), and derive<br />

apparently from archaeal viruses HAV1 and HAV2.<br />

Genomic DNA was obta<strong>in</strong>ed from samples enriched <strong>in</strong><br />

filamentous and tadpole-shaped virus-like particles<br />

respectively. They yielded few significant matches <strong>in</strong><br />

public sequence databases re<strong>in</strong>forc<strong>in</strong>g, fur<strong>the</strong>r, <strong>the</strong><br />

wide diversity of archaeal viruses. Several variants of<br />

HAV1 exhibit major genomic alterations, presumed to<br />

arise from viral adaptation to different hosts. They<br />

<strong>in</strong>clude <strong>in</strong>sertions up to 350 bp, deletions up to 1.5 kb,<br />

and genes with extensively altered sequences. Some<br />

result from recomb<strong>in</strong>ation events occurr<strong>in</strong>g at low<br />

complexity direct repeats distributed along <strong>the</strong><br />

genome. In addition, a 33.8 kb archaeal plasmid pHA1<br />

was characterized, encod<strong>in</strong>g a possible conjugative<br />

apparatus, as well as three cryptic plasmids of <strong>the</strong>rmophilic<br />

bacterial orig<strong>in</strong>, pHB1 of 2.1 kb and two<br />

closely related variants pHB2a and pHB2b, of 5.2 and<br />

4.8 kb respectively. Strategies are considered for<br />

assembl<strong>in</strong>g genomes of smaller genetic elements<br />

from complex environmental samples, and for estab-<br />

Received 10 February, 2010; accepted 20 April, 2010. *For correspondence:<br />

E-mail garrett@bio.ku.dk; Tel. (+45) 35322010; Fax (+45)<br />

35322128.<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd<br />

lish<strong>in</strong>g possible host identities on <strong>the</strong> basis of<br />

sequence similarity to host <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s.<br />

Introduction<br />

<strong>Archaea</strong>l viruses exhibit a wide variety of morphotypes<br />

and genomic properties. They have been isolated and<br />

characterized primarily from terrestial acidic hot spr<strong>in</strong>gs or<br />

hypersal<strong>in</strong>e lakes, <strong>in</strong> many different geographical locations.<br />

Several viruses from terrestial acidic hot spr<strong>in</strong>gs<br />

have now been classified <strong>in</strong>to new viral families while<br />

o<strong>the</strong>rs, toge<strong>the</strong>r with a few haloarchaeal viruses from <strong>the</strong><br />

euryarchaeal k<strong>in</strong>gdom, rema<strong>in</strong> unclassified (Prangishvili<br />

et al., 2006a; Porter et al., 2007; Lawrence et al., 2009).<br />

Although some crenarchaeal and euryarchaeal virions<br />

share similar morphotypes, <strong>the</strong>ir genomic properties show<br />

little <strong>in</strong> common (Ortmann et al., 2006; Porter et al., 2007)<br />

nor, with <strong>the</strong> exception of a few head-tail euryarchaeal<br />

viruses, do <strong>the</strong>y share many homologous genes with<br />

ei<strong>the</strong>r bacterial or eukaryal viruses (Prangishvili et al.,<br />

2006b). Despite <strong>the</strong> broad diversity of characterized<br />

archaeal viruses, as a group <strong>the</strong>y probably constitute a<br />

biased sample because most of <strong>the</strong>m exclusively <strong>in</strong>fect<br />

<strong>the</strong>rmoacidophilic members of <strong>the</strong> order Sulfolobales or a<br />

few haloarchaeal stra<strong>in</strong>s.<br />

Few studies, to date, have addressed <strong>the</strong> relative abundance<br />

of different viral morphotypes <strong>in</strong> archaea-rich environments.<br />

Electron microscopy studies of samples from<br />

terrestial hot spr<strong>in</strong>gs suggest that sp<strong>in</strong>dles, filaments,<br />

rods and spheres predom<strong>in</strong>ate (Rachel et al., 2002; Bize<br />

et al., 2008), while o<strong>the</strong>r morphotypes are much less<br />

common. In hypersal<strong>in</strong>e environments sp<strong>in</strong>dle-shaped<br />

and spherical forms predom<strong>in</strong>ate (Oren et al., 1997; Diez<br />

et al., 2000; Porter et al., 2007) while head-tail virus-like<br />

particles (VLPs) are quite common and <strong>the</strong>ir proviruses<br />

have been detected <strong>in</strong> some sequenced genomes of haloand<br />

methanoarchaea (Porter et al., 2007; Krupovič and<br />

Bamford, 2008; Krupovič et al., 2010).<br />

Only four crenarchaeal viruses from extreme geo<strong>the</strong>rmal<br />

environments at neutral pH values have been fully<br />

characterized to date, <strong>the</strong> rod-shaped Thermoproteus<br />

tenax lipothrixvirus, TTV1 (Janekovic et al., 1983), Pyrobaculum<br />

spherical virus, PSV (Här<strong>in</strong>g et al., 2004), <strong>the</strong><br />

closely related T. tenax spherical virus 1, TTSV1 (Ahn


et al., 2006), and <strong>the</strong> Aeropyrum pernix bacilliform virus 1,<br />

APBV1 (Mochizuki et al., 2010). However, electron<br />

microscopy studies of an enrichment culture from a<br />

sample collected from Obsidian Pool, Yellowstone<br />

National Park, USA, ma<strong>in</strong>ta<strong>in</strong>ed at 85°C and pH 6 under<br />

anaerobic conditions, revealed five morphologically<br />

diverse VLPs (fig. 1 <strong>in</strong> Rachel et al., 2002), <strong>in</strong>clud<strong>in</strong>g<br />

spherical virions of <strong>the</strong> virus PSV which was characterized<br />

earlier (Här<strong>in</strong>g et al., 2004). The enrichment culture<br />

also carried a variety of genera, <strong>in</strong>clud<strong>in</strong>g crenarchaeal<br />

Thermofilum, Thermoproteus and Thermosphaera, euryarchaeal<br />

Archaeoglobus and <strong>the</strong> bacterial genera<br />

Thermus, Geo<strong>the</strong>rmobacterium and Thermodesulfobacterium<br />

(Rachel et al., 2002). The enrichment culture was<br />

ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> a bioreactor over a 2-year period and aliquots<br />

were extracted at regular <strong>in</strong>tervals over by this time<br />

and screened for VLPs by electron microscopy. The different<br />

VLP morphotypes observed <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> spherical<br />

PSV varied considerably <strong>in</strong> <strong>the</strong>ir relative yields over time<br />

(Fig. 1).<br />

In this study, we attempted to obta<strong>in</strong> genome sequences<br />

associated with <strong>the</strong> rema<strong>in</strong><strong>in</strong>g unidentified VLPs. To this<br />

end, <strong>the</strong> samples extracted from <strong>the</strong> bioreactor at different<br />

time <strong>in</strong>tervals were <strong>in</strong>vestigated, as well as mixtures of<br />

samples. A variety of approaches were used to generate<br />

clone libraries and to dist<strong>in</strong>guish viral from plasmid DNA,<br />

and circular from l<strong>in</strong>ear DNA genomes and, for <strong>the</strong> VLPs, to<br />

correlate genome-type with morphotype.<br />

S<strong>in</strong>ce attempts to f<strong>in</strong>d hosts for <strong>the</strong> VLPs were unsuccessful,<br />

we <strong>in</strong>vestigated potential hosts for <strong>the</strong> archaeal<br />

viruses and plasmids by match<strong>in</strong>g <strong>the</strong>ir genome<br />

sequences to spacer sequences of <strong>the</strong> chromosomal<br />

<strong>immune</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> (Van der Oost et al., 2009).<br />

These chromosomal spacers derive from <strong>in</strong>fect<strong>in</strong>g viruses<br />

or plasmids (Barrangou et al., 2007) and are present<br />

with<strong>in</strong> all <strong>the</strong> available sequenced genomes of <strong>the</strong>rmophilic<br />

neutrophiles. The spacers represent a history of<br />

<strong>in</strong>vad<strong>in</strong>g viruses and plasmids and a close sequence<br />

match implies that <strong>the</strong> host has been <strong>in</strong>fected by a similar<br />

virus or plasmid (Lillestøl et al., 2006; Andersson and<br />

Banfield, 2008; Shah et al., 2009).<br />

Results<br />

An enrichment culture established from a sample collected<br />

from Obsidian Pool at Yellowstone National Park,<br />

USA, was ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> a bioreactor at 85°C and pH 6.<br />

Virus-like particles were concentrated from supernatant<br />

aliquots taken from <strong>the</strong> bioreactor and subjected to CsCl<br />

density gradient ultracentrifugation. Initially, shot-gun<br />

clone libraries were prepared from a mixture of bioreactor<br />

samples (bioreactor-mix) (Fig. 1A) which were deprote<strong>in</strong>ized<br />

after density gradient centrifugation without any pretreatment<br />

but most clones were found to derive from<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2919<br />

contam<strong>in</strong>at<strong>in</strong>g chromosomal DNA fragments. Therefore,<br />

DNase I treatment was <strong>in</strong>troduced to remove chromosomal<br />

contam<strong>in</strong>ation before deprote<strong>in</strong>ization. Moreover,<br />

we exam<strong>in</strong>ed samples collected at different times over a<br />

2-year period, for which a given VLP type was dom<strong>in</strong>ant <strong>in</strong><br />

electron micrographs (Fig. 1B and C), <strong>in</strong> order to correlate<br />

viral genome types with morphotypes. Fur<strong>the</strong>rmore, strategies<br />

were developed for dist<strong>in</strong>guish<strong>in</strong>g viral from plasmid<br />

DNA, and l<strong>in</strong>ear from circular DNA genomes (Table 1; see<br />

also Experimental procedures).<br />

Five ma<strong>in</strong> libraries were prepared to generate viral DNA<br />

and plasmid clones (Table 1). These <strong>in</strong>clude <strong>the</strong> larger<br />

library of supernatant DNA from <strong>the</strong> mix of bioreactor<br />

samples collected at different time <strong>in</strong>tervals (4000<br />

sequences) (Fig. 1A), and an earlier library that was used<br />

to sequence <strong>the</strong> partially purified Pyrobaculum spherical<br />

virus PSV, isolated from <strong>the</strong> same bioreactor (Här<strong>in</strong>g<br />

et al., 2004). Moreover, samples enriched <strong>in</strong> two of <strong>the</strong><br />

novel VLPs were obta<strong>in</strong>ed (Fig. 1B and C) and used to<br />

generate clone libraries. Thus, a shot-gun filament library<br />

was prepared from two samples rich <strong>in</strong> short filamentous<br />

VLPs (Fig. 1B), and fur<strong>the</strong>r clone libraries were prepared<br />

from samples rich <strong>in</strong> tadpole-shaped particles (Fig. 1C)<br />

after select<strong>in</strong>g for (i) circular plasmids which were preferentially<br />

amplified (tadpole-1) and (ii) circular viral<br />

genomes after degrad<strong>in</strong>g chromosomal and plasmid DNA<br />

and <strong>the</strong>n deprote<strong>in</strong>iz<strong>in</strong>g virions and treat<strong>in</strong>g with circular<br />

DNA-safe nucleases (tadpole-2). We also screened,<br />

unsuccessfully, for RNA viral genomes by generat<strong>in</strong>g<br />

cDNA libraries (data not shown).<br />

Complete genomes from two putative archaeal viruses<br />

HAV1 and HAV2 were assembled, <strong>the</strong> former 22.7 kb and<br />

l<strong>in</strong>ear, and <strong>the</strong> latter 17.7 kb and circular, and, <strong>in</strong> addition,<br />

four plasmids were sequenced pHA1 – 33.8 kb, pHB1 –<br />

2.1 kb, and two variants of pHB2a and pHB2b of 4.8 kb<br />

and 5.4 kb respectively. The approximate percentages of<br />

clones from <strong>the</strong> five ma<strong>in</strong> libraries that were <strong>in</strong>corporated<br />

<strong>in</strong>to each assembled genetic element are given (Table 1),<br />

and <strong>the</strong> numbers are consistent with <strong>the</strong> strategy<br />

employed for dist<strong>in</strong>guish<strong>in</strong>g viral from plasmid genomes,<br />

except that <strong>the</strong> relatively high percentage of clones of<br />

plasmids pHB2a and pHB2b (20%), obta<strong>in</strong>ed from <strong>the</strong><br />

tadpole-2 library, probably reflects <strong>in</strong>complete DNase-1<br />

digestion of non-viral circular DNA (Table 1). The average<br />

genome coverage was about fivefold for each element,<br />

unless o<strong>the</strong>rwise stated, and all sequence ambiguities<br />

were resolved by primer walk<strong>in</strong>g on clones. The identities<br />

and general properties of <strong>the</strong> sequenced genetic elements<br />

are summarized <strong>in</strong> Table 2.<br />

Filamentous VLPs<br />

Two bioreactor samples rich <strong>in</strong> short filamentous VLPs<br />

(Fig. 1B) were pooled and treated with DNase I at 37°C<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2920 R. A. Garrett et al.<br />

A<br />

B C<br />

Fig. 1. Electron micrographs show<strong>in</strong>g VLP morphotypes observed <strong>in</strong> <strong>the</strong> analysed bioreactor culture.<br />

A. A mixture of all preparations of VLPs collected from <strong>the</strong> bioreactor.<br />

B. Preparation enriched <strong>in</strong> filamentous VLPs.<br />

C. Preparation enriched <strong>in</strong> tadpole-shaped VLPs.<br />

The size marker corresponds to 500 nm.<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


Table 1. Pre-treatment of viral samples before library construction and <strong>the</strong> approximate percentage of clone sequences assembled from each<br />

library for each genetic element.<br />

Clone libraries<br />

Treatment Bioreactor mix PSV Filament Tadpole-1 Tadpole-2<br />

Element Size (bp) i, iii i, iii i, iii, v i, iii, v i, ii, iii, iv, v<br />

HAV1 22 743 5 95<br />

HAV2 17 666 5 95<br />

pHA1 33 795 40 57 3<br />

pHB1 2 099 100<br />

pHB2a/2b 4 780/5 370 80 20<br />

Bioreactor supernatant extracts were subjected to <strong>the</strong> follow<strong>in</strong>g treatments: i. CsCl gradient centrifugation of virions. ii. DNase I treatment of <strong>the</strong><br />

virion band from CsCl gradients. iii. Deprote<strong>in</strong>ization with SDS and prote<strong>in</strong>ase K followed by phenol extraction of DNA. iv. Plasmid-safe DNase<br />

treatment of DNA. v. In vitro amplification of DNA.<br />

The total number of sequenced clones that were assembled <strong>in</strong>to <strong>the</strong> virus and plasmid genomes (prior to sequence polish<strong>in</strong>g) were HAV1 – 956,<br />

HAV2 – 195, pHA1 – 188, pHB1 – 49, pHB2a/2b which were co-assembled – 55.<br />

for 15 m<strong>in</strong>, to remove extraneous chromosomal and<br />

plasmid DNA before extract<strong>in</strong>g DNA from VLPs by phenol<br />

treatment. A clone library was generated and DNA<br />

sequenc<strong>in</strong>g yielded a non-circular contig of about 20 kb,<br />

consistent with a l<strong>in</strong>ear genome. S<strong>in</strong>ce term<strong>in</strong>al<br />

sequences are <strong>in</strong>variably absent from shot-gun clone<br />

libraries of l<strong>in</strong>ear genomes (e.g. Vestergaard et al.,<br />

2008a), libraries were produced us<strong>in</strong>g <strong>the</strong> L<strong>in</strong>ker Amplified<br />

Shotgun Library method (see Experimental procedures)<br />

which yielded a high sequence coverage of <strong>the</strong><br />

DNA term<strong>in</strong>i. The complete l<strong>in</strong>ear DNA genome consists<br />

of 22 743 bp with a 21 bp <strong>in</strong>verted term<strong>in</strong>al repeat (ITR) of<br />

sequence 5′-CGTCTCTCTGTGTGTATGGGA-3′. We<br />

<strong>in</strong>fer that both term<strong>in</strong>i are free, blunt and unmodified,<br />

because <strong>the</strong>y were efficiently ligated with <strong>the</strong> blunt end of<br />

<strong>the</strong> adaptor dur<strong>in</strong>g library construction (see Experimental<br />

procedures). S<strong>in</strong>ce only one major contig was assembled<br />

from <strong>the</strong> filament-library sequences, we <strong>in</strong>ferred that it<br />

derived from <strong>the</strong> filamentous virus (Fig. 1B). Genome<br />

analyses <strong>in</strong>dicated that <strong>the</strong> virus was of archaeal orig<strong>in</strong><br />

(Torar<strong>in</strong>sson et al., 2005), and all of <strong>the</strong> predicted genes<br />

lie on one strand of <strong>the</strong> genome, similar to <strong>the</strong> highly<br />

biased strand usage of <strong>the</strong> crenarchaeal <strong>the</strong>rmoneutrophilic<br />

viruses PSV and TTSV1 (Här<strong>in</strong>g et al., 2004; Ahn<br />

et al., 2006). A few clone sequences from <strong>the</strong> filament<br />

library assembled <strong>in</strong>to <strong>the</strong> PSV genome, <strong>in</strong>dicat<strong>in</strong>g that<br />

small amounts of that virus had co-purified with HAV1,<br />

Table 2. Genomic properties of <strong>the</strong> <strong>the</strong>rmoneutrophilic viruses and plasmids.<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2921<br />

consistent with <strong>the</strong> low levels of spherical particles<br />

observed <strong>in</strong> electron micrographs (Fig. 1B).<br />

No genes yielded highly significant matches <strong>in</strong> public<br />

sequence databases; only weak but persistent matches<br />

were observed to a Cas4-like prote<strong>in</strong> (DUF83) (ORF218),<br />

possibly a nuclease, for which matches were also<br />

observed <strong>in</strong> several crenarchaeal fuselloviruses (Redder<br />

et al., 2009) and <strong>the</strong> filamentous lipothrixvirus virus AFV1<br />

(Bettstetter et al., 2003), a parB-like partition prote<strong>in</strong><br />

(ORF253) and a transcriptional regulator (ORF146)<br />

(Table 3). Several ORFs carry putative transmembrane<br />

motifs, some with predicted signal peptides, as illustrated<br />

<strong>in</strong> Fig. 2A. The very low level of gene matches to public<br />

sequence databases is a characteristic of <strong>the</strong> o<strong>the</strong>r<br />

sequenced <strong>the</strong>rmoneutrophilic viruses PSV, TTSV1 and<br />

TTV1 (Janekovic et al., 1983; Bettstetter et al., 2003; Ahn<br />

et al., 2006), and appears to be a general feature of many<br />

crenarchaeal viral genomes (Prangishvili et al., 2006b).<br />

HAV1 genomic variants<br />

Although <strong>the</strong>re is a low level of sequence heterogeneity<br />

throughout <strong>the</strong> genome, <strong>the</strong>re are numerous local heterogeneity<br />

‘hot-spots’, present <strong>in</strong> almost half of <strong>the</strong> predicted<br />

genes (17 out of 40) as <strong>in</strong>dicated <strong>in</strong> Fig. 2A. In addition,<br />

several genomic variants of HAV1 were assembled with<br />

major alterations <strong>in</strong>clud<strong>in</strong>g gene <strong>in</strong>sertions of up to 350 bp<br />

Element ds DNA (kb) Form G+C content Doma<strong>in</strong> GenBank accession number<br />

HAV1 22 743 L<strong>in</strong>ear 46.2 <strong>Archaea</strong> GU722196<br />

HAV2 17 666 Circular 52.1 <strong>Archaea</strong> GU722197<br />

pHA1 33 795 Circular 45.4 <strong>Archaea</strong> GU722198<br />

pHB1 2 099 Circular 54.7 Bacteria GU722199<br />

pHB2a 4 780 Circular 61.6 Bacteria GU722200<br />

pHB2b 5 370 Circular 60.2 Bacteria GU722201<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2922 R. A. Garrett et al.<br />

Table 3. Significant ORF matches with<strong>in</strong> public sequence databases.<br />

ORF e-value Match Orig<strong>in</strong><br />

HAV1<br />

ORF253 4e-06 parB-like partition Dethiobacter alkaliphilus AHT 1<br />

ORF218 9e-05 Cas4-like (DUF83) Sulfolobus fusellovirus SSV2<br />

ORF170 1e-06 Hypo<strong>the</strong>tical – Tpen_1879 Thermofilum pendens Hrk 5<br />

ORF146 5e-05 CopG/Arc/MetJ family transcriptional regulator Pyrobaculum aerophilum str. IM2<br />

HAV2<br />

ORF1767 2e-11 Hypo<strong>the</strong>tical – ATV_gp60 (ORF710 – C-term<strong>in</strong>al 375 aa) Acidianus bicaudavirus ATV<br />

ORF909 1e-125 Primase/DNA polymerase Sulfolobus neozealandicus pORA1<br />

ORF506 2e-15 AAA-ATPase, CDC48-type (ORF618 N-term<strong>in</strong>al 210 aa) Acidianus bicaudavirus ATV<br />

ORF263 2e-04 ORF731 N-term<strong>in</strong>al 100 aa Sulfolobus bicaudavirus STSV1<br />

ORF122 3e-45 IS element Dka2 OrfA Desulfurococcus kamchatkensis<br />

ORF420 2e-160 IS element Dka2 OrfB Desulfurococcus kamchatkensis<br />

pHA1<br />

ORF575 2e-08 Phage/plasmid primase COG3378 (C-term<strong>in</strong>al 300 aa) P4 family<br />

ORF396 2e-25 C5-cytos<strong>in</strong>e-specific methylase Thermus phage P23-45<br />

ORF375 8e-42 Type III restriction enzyme, res subunit Thermofilum pendens Hrk 5<br />

ORF337 2e-07 DEAD/DEAH box helicase Thermofilum pendens Hrk 5<br />

ORF320 5e-18 Abortive <strong>in</strong>fection prote<strong>in</strong> Thermofilum pendens Hrk 5<br />

ORF282 4e-74 Integrase Thermofilum pendens Hrk 5<br />

ORF93 6e-06 Holliday junction resolvase Methanocaldococcus vulcanius M7<br />

pHB1<br />

ORF477 8e-14 Rep prote<strong>in</strong>-roll<strong>in</strong>g circle Bacterial plasmid pAB49<br />

pHB2a+b<br />

ORF399/557 8e-75 TraA-like, conjugal transfer Polaromonas naphthalenivorans CJ2<br />

ORF269 2e-47 RepB prote<strong>in</strong> Ac<strong>in</strong>etobacter baumannii ACICU<br />

pHB2a<br />

ORF116 3e-18 Hypo<strong>the</strong>tical – Veis_1406 Verm<strong>in</strong>ephrobacter eiseniae EF01-2<br />

pHB2b<br />

ORF115 2e-19 Hypo<strong>the</strong>tical – StreC_09508 Streptomyces sp. C<br />

and deletions of up to 1.5 kb, genes with altered<br />

sequences, and duplications. The number of clone<br />

sequences that assembled <strong>in</strong>to each of <strong>the</strong> variant<br />

regions (Table 4), relative to <strong>the</strong> number of clones <strong>in</strong> <strong>the</strong><br />

dom<strong>in</strong>ant genome, <strong>in</strong>dicated that <strong>the</strong> orig<strong>in</strong>al viral population<br />

was very heterogeneous, and this was re<strong>in</strong>forced by<br />

preparative gel electrophoresis pattern of <strong>the</strong> viral DNA<br />

which revealed a broad heterogeneous band between<br />

DNA size markers of 19.4 and 24 kb (Fig. 3).<br />

Ten assembled contigs of HAV1 variants showed major<br />

genomic changes with some carry<strong>in</strong>g two to three <strong>in</strong>dependent<br />

alterations (Table 4). Most of <strong>the</strong> deletions and<br />

o<strong>the</strong>r major genomic changes occur at one or more of <strong>the</strong><br />

11 adjo<strong>in</strong><strong>in</strong>g pyrimid<strong>in</strong>e-rich and pur<strong>in</strong>e-rich sequences,<br />

most of which are <strong>in</strong>tergenic (Fig. 2A; Table 5). These<br />

sites constitute partially conserved, low-complexity direct<br />

repeats along <strong>the</strong> genome, and some carry <strong>in</strong>verted<br />

repeats (Table 5). Only a quarter of <strong>the</strong> viral genes are<br />

affected by <strong>the</strong>se genomic changes. Of <strong>the</strong>se, ORFs<br />

123a, 156, 284, 102, 78a and 170 appear dispensable for<br />

<strong>the</strong> virion, while ORFs 140, 174, 276 and 352 can<br />

undergo large sequence variations, and ORF585, which<br />

conta<strong>in</strong>s two putative recomb<strong>in</strong>ation sites (Fig. 2A;<br />

Table 5), has undergone <strong>in</strong>sertions, partial deletions<br />

and/or extensive sequence changes, and exhibits altered<br />

start codon positions.<br />

Tadpole-shaped VLPs<br />

DNA was extracted from a purified viral preparation that<br />

was rich <strong>in</strong> tadpole-shaped VLPs (Fig. 1C), and was<br />

amplified us<strong>in</strong>g <strong>the</strong> f29 polymerase, before prepar<strong>in</strong>g a<br />

shot-gun clone library (Table 1; see Experimental procedures).<br />

Sequences were assembled, toge<strong>the</strong>r with some<br />

sequences from <strong>the</strong> bioreactor mix library (Table 1), <strong>in</strong>to<br />

a circular double-stranded (ds) DNA genome of 17 666<br />

kb (Fig. 2B), where <strong>the</strong> predicted genes are preceded by<br />

archaea-specific motifs (Torar<strong>in</strong>sson et al., 2005). There<br />

was little sequence heterogeneity <strong>in</strong> <strong>the</strong> HAV2 genome<br />

which almost certa<strong>in</strong>ly reflects <strong>the</strong> DNA amplification<br />

step prior to clon<strong>in</strong>g, such that an <strong>in</strong>itial dom<strong>in</strong>at<strong>in</strong>g component<br />

was preferentially amplified. As for HAV1, only<br />

one major contig was assembled and we <strong>in</strong>ferred <strong>the</strong>refore<br />

that it derived from <strong>the</strong> tadpole-shaped VLPs<br />

(Fig. 1C).<br />

In contrast to HAV1, a few significant matches to public<br />

sequence databases were found (Table 3). Highly significant<br />

matches occurred for an archaea-specific bifunctional<br />

DNA primase-polymerase encoded on two plasmids<br />

of Sulfolobus neozealandicus (Lipps et al., 2004; Greve<br />

et al., 2005), and for an IS element of <strong>the</strong> IS 200/650<br />

family present <strong>in</strong> Desulfurococcus kamchatkensis.<br />

Moreover, two matches occurred to a crenarchaeal<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


icaudavirus ATV which exhibits a similar sp<strong>in</strong>dle-shaped<br />

morphology but with two tails (Prangishvili et al., 2006c).<br />

Thus, <strong>the</strong> C-term<strong>in</strong>al 350 am<strong>in</strong>o acids of HAV2-ORF1767<br />

showed significant sequence similarity to <strong>the</strong> correspond<strong>in</strong>g<br />

region of ATV-ORF710, and HAV2-ORF506 also<br />

carries an AAA-ATPase doma<strong>in</strong> of <strong>the</strong> CDC48 type,<br />

similar to that present <strong>in</strong> ATV-ORF618 (Fig. 2B).<br />

<strong>Archaea</strong>l and bacterial plasmids<br />

Plasmid sequences were assembled ma<strong>in</strong>ly from <strong>the</strong><br />

clone libraries of samples lack<strong>in</strong>g DNase I treatment<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2923<br />

Fig. 2. Genome maps of <strong>the</strong> HAV1 (A) and HAV2 (B) viruses where predicted genes are <strong>in</strong>dicated by arrows and denoted by <strong>the</strong>ir am<strong>in</strong>o acid<br />

lengths. Significant predictions of gene product functions are <strong>in</strong>dicated. Striated genes encode predicted transmembrane motifs. In (A) red<br />

sections <strong>in</strong>dicate gene regions carry<strong>in</strong>g hot-spots for s<strong>in</strong>gle-site mutations. Putative recomb<strong>in</strong>ation sites are <strong>in</strong>dicated (•).<br />

before viral DNA extraction (Table 1). The 33 795 bp<br />

pHA1 (Hyper<strong>the</strong>rmophilic Archaeon) is of archaeal orig<strong>in</strong><br />

and was assembled from different libraries of nonamplified<br />

DNA, <strong>in</strong>clud<strong>in</strong>g that of <strong>the</strong> bioreactor mix and<br />

<strong>the</strong> PSV virus (Här<strong>in</strong>g et al., 2004) (Table 1). M<strong>in</strong>or<br />

sequence heterogeneities occurred throughout <strong>the</strong><br />

genome but no larger genomic changes were observed.<br />

About one-third of <strong>the</strong> 59 predicted genes are homologous<br />

to genes <strong>in</strong> <strong>the</strong> 31 504 bp plasmid TPEN01 from<br />

Thermofilum pendens Hrk5 (Anderson et al., 2008), and<br />

<strong>the</strong>y are clustered <strong>in</strong> <strong>the</strong> pHA1 genome (Fig. 4A). Several<br />

genes encod<strong>in</strong>g hypo<strong>the</strong>tical prote<strong>in</strong>s carry putative<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2924 R. A. Garrett et al.<br />

Table 4. Properties of <strong>the</strong> HAV1 genomic variants.<br />

HAV1 variant Number of clones Viral position Genome change ORF changes<br />

1 2 492–1827 Deleted 1337 bp, 80–143 replaced Deleted ORFs123a/156<br />

2 14 1032 65 bp partial duplication, T-C-rich region No ORF<br />

1659–3188 Deleted 1529 bp Deleted ORFs284/102<br />

3 6 3761 Altered gene ORF174 (ORF183, 36% identity/59% similarity)<br />

4 9 8042 Insert 92 bp – 60 bp repeat <strong>in</strong> start ORF141 end extends 10 aa, ORF94 start extends 36 aa<br />

5 41 8050 Insert 49 bp <strong>in</strong> C-rich region ORF141 end extends 9 aa, ORF94 start extends 10 aa<br />

8868–8755 Altered gene ORF218 (ORF218, 88% identity/92% similarity)<br />

9946–10757 Deleted 813 bp Deleted ORFs78a/170<br />

6 19 14430–14699 270 bp duplication No ORF<br />

15000 Insert 350 bp Variant ORF585, C-term<strong>in</strong>al half (ORF653)<br />

7 7 14360–14498,14737–14885 Altered gene Heterogeneities ORF585<br />

8 2 14995–15460 Deleted 375 bp Truncated ORF585<br />

15717–16062 Altered gene ORF585, altered 345 bp centrally<br />

9 8 17352–17899 Altered gene ORF276, altered central 170 aa (ORF259, 61% identity/71% similarity)<br />

10 16 19439–20029 Altered gene ORF325 (ORF315, 72% identity/80% similarity)<br />

Fig. 3. Characterization of DNA isolated from <strong>the</strong> purified<br />

preparation of <strong>the</strong> filamentous virus VLP enriched preparation<br />

(HAV1), after removal of plasmid and chromosomal DNA, and prior<br />

to generat<strong>in</strong>g <strong>the</strong> filament library (Table 1). M – DNA size markers.<br />

transmembrane motifs, some also exhibit<strong>in</strong>g predicted<br />

signal peptides, and <strong>the</strong>se <strong>in</strong>clude a cluster of 10 tightly<br />

l<strong>in</strong>ked genes some of which are probably co-transcribed<br />

(Fig. 4A). Although <strong>the</strong>re is no significant sequence similarity,<br />

<strong>the</strong>se prote<strong>in</strong>s may generate a novel conjugative<br />

apparatus, by analogy with a group of conjugative membrane<br />

prote<strong>in</strong>s encoded by a conserved gene cluster of<br />

conjugative plasmids of <strong>the</strong> crenarchaeal <strong>the</strong>rmoacidophiles<br />

(Greve et al., 2004).<br />

The three smaller plasmids were assembled exclusively<br />

from <strong>the</strong> tadpole-1/2 libraries of amplified DNA (Table 1)<br />

and <strong>the</strong> sequences are relatively homogeneous. Each<br />

plasmid is of bacterial orig<strong>in</strong>, as judged by promoter and<br />

ribosome b<strong>in</strong>d<strong>in</strong>g motifs (Torar<strong>in</strong>sson et al., 2005). The<br />

2099 bp pHB1 (Hyper<strong>the</strong>rmophilic Bacterium) encodes a<br />

large replication prote<strong>in</strong>, probably of <strong>the</strong> roll<strong>in</strong>g circle type<br />

(Table 3), and o<strong>the</strong>r predicted genes overlap on <strong>the</strong> two<br />

DNA strands (Fig. 4B). pHB2a and pHB2b, of 4780 and<br />

5370 bp, respectively, are variants shar<strong>in</strong>g 3780 bp of<br />

highly similar sequence but with two altered regions as<br />

illustrated (Fig. 4B). Whereas <strong>the</strong> shorter altered regions<br />

exhibit no sequence similarity, <strong>the</strong> larger regions of 781 bp<br />

and 1257 bp for pHB2a and pHB2b, respectively, carry<br />

about 300 bp with a low but significant level of sequence<br />

similarity. These altered sequences resulted <strong>in</strong> ORF125<br />

be<strong>in</strong>g exclusive to pHB2a, and ORFs 58, 60, 67, 68 and<br />

97 be<strong>in</strong>g specific to pHB2b (Fig. 4B). In addition, ORFs<br />

399 and 157 <strong>in</strong> pHB2a are fused <strong>in</strong> pHB2b (ORF557) and<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


Table 5. Putative recomb<strong>in</strong>ation sites associated with alterations <strong>in</strong> <strong>the</strong> variant HAV1 genomes.<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2925<br />

Variant number Genome change Genome positions Recomb<strong>in</strong>ation sites<br />

1a 1337 bp deletion 474–496 CCCTCCCCTTTTTCTATGAAGTCGAAGGTGGA<br />

1b Recomb<strong>in</strong>ation 1805–1833 TTTTTTCTTTTTCCTCTTTTTTTCCCTTCGGAGAAAAG<br />

2a 65 bp partial duplication 1014–1052 TCTTTTTTCCCCTCTTTTCCTTTCTTCATGATGAAAGGA<br />

2b 1529 bp deletion 1650–1677 CCTCTTTTTTTCTAGCCGCACCTCCTTTGGAGAAAAA<br />

2c 1529 bp deletion 3183–3193 TCTGACCCTTCGGAGAAAAA<br />

5a 49 bp <strong>in</strong>sertion 8060 CCCGTTCCCGGCGTCTCGGTGGAA<br />

6b 350 bp altered 15000 CTCCTCACTCTTCTTCTCGCTGTTCAGGAGGAGGA<br />

8b 345 bp replacement 16040–16065 CTTTGCTGTATCTATTGCGAGGAAGA<br />

Similar sites exist also at genome positions 559–585, 2728–2748 and 2766–2778. Inverted repeats (underl<strong>in</strong>ed) are present <strong>in</strong> some recomb<strong>in</strong>ation<br />

sites. Details of <strong>the</strong> variants are given <strong>in</strong> Table 4.<br />

Fig. 4. Genome maps of <strong>the</strong> circular plasmids (A) archaeal pHA1 and (B) three bacterial plasmids pHB1, pHB2a and pHB2b, where arrows<br />

<strong>in</strong>dicate predicted genes denoted by <strong>the</strong>ir am<strong>in</strong>o acid lengths. Striated genes encode predicted transmembrane motifs while grey shaded<br />

genes are homologous to genes <strong>in</strong> TPEN01. Shaded areas <strong>in</strong>side <strong>the</strong> circles for pHB2a and pHB2b <strong>in</strong>dicate regions of different sequence.<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2926 R. A. Garrett et al.<br />

ORF96 (pHB2a) and ORF98 (pHB2b), as well as ORF115<br />

(pHB2a) and ORF116 (pHB2b), show limited sequence<br />

differences (Fig. 4B). Both plasmid variants encode a replication<br />

prote<strong>in</strong> and a Tra-like conjugal prote<strong>in</strong> and carry a<br />

high G+C-rich region which may constitute an RNA gene<br />

(Fig. 4B).<br />

As for HAV2, <strong>the</strong>re was little sequence heterogeneity for<br />

<strong>the</strong> bacterial plasmids, which probably also reflects <strong>the</strong>ir<br />

amplification prior to clon<strong>in</strong>g. Detection of variants pHB2a<br />

and pHB2b suggests that both were substantial components<br />

<strong>in</strong> <strong>the</strong> orig<strong>in</strong>al DNA preparation.<br />

<strong>CRISPR</strong> spacer matches<br />

RNAs transcribed from <strong>CRISPR</strong> repeat clusters, and processed<br />

to spacer RNAs, can target and <strong>in</strong>activate extrachromosomal<br />

elements (reviewed <strong>in</strong> Van der Oost et al.,<br />

2009). Thus, host repeat clusters ma<strong>in</strong>ta<strong>in</strong> a record of<br />

<strong>in</strong>vad<strong>in</strong>g genetic elements. In pr<strong>in</strong>ciple <strong>the</strong>refore it should<br />

be possible to determ<strong>in</strong>e a host of an isolated genetic<br />

element by compar<strong>in</strong>g its genome sequence with<br />

<strong>CRISPR</strong> spacer sequences from chromosomes of potential<br />

hosts. We attempted to do this for <strong>the</strong> newly characterized<br />

viruses and plasmids by compar<strong>in</strong>g <strong>the</strong>ir<br />

sequences, and those of o<strong>the</strong>r available <strong>the</strong>rmoneutrophilic<br />

viruses and plasmids, with <strong>the</strong> 1321 spacer<br />

sequences <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> clusters of <strong>the</strong> 13 sequenced<br />

<strong>the</strong>rmoneutrophilic genomes (see Experimental procedures).<br />

Although a sequence comparison at a nucleotide<br />

level yielded no significant matches, a few significant<br />

matches were found when search<strong>in</strong>g at <strong>the</strong> more conserved<br />

am<strong>in</strong>o acid sequence level, after translat<strong>in</strong>g <strong>the</strong><br />

spacers <strong>in</strong>to all six read<strong>in</strong>g frames, essentially as<br />

described earlier (Shah et al., 2009). At an e-value cut-off<br />

of 0.12, <strong>the</strong>re are seven significant matches to <strong>the</strong> viruses<br />

and plasmids which are listed <strong>in</strong> Table 6, and 35 matches<br />

to annotated crenarchaeal prote<strong>in</strong>s <strong>in</strong> <strong>the</strong> 13 genomes<br />

(some of which may occur to <strong>in</strong>tegrated viruses or<br />

plasmids which were not removed from <strong>the</strong> data set).<br />

Given that viral/plasmid ORFs constituted only 0.8% of<br />

sequences present <strong>in</strong> <strong>the</strong> search (correspond<strong>in</strong>g to<br />

54 684 out of a total of 6 317 506 am<strong>in</strong>o acids), <strong>the</strong> results<br />

show a 20-fold preference for spacers match<strong>in</strong>g viral/<br />

plasmid ORFs over crenarchaeal genome ORFs which<br />

re<strong>in</strong>forces <strong>the</strong> significance of <strong>the</strong> matches (Table 6). A<br />

similar, and significant, analysis of <strong>the</strong> bacterial plasmids<br />

was not possible because of <strong>the</strong>ir small sizes and <strong>the</strong><br />

paucity of available bacterial <strong>the</strong>rmophile <strong>CRISPR</strong><br />

sequences.<br />

Of <strong>the</strong> four published genetic elements, PSV, TTSV1<br />

and TPEN01 yielded one or more significant spacer<br />

matches to a known host genus (Table 6). Moreover,<br />

HAV1 gave a good match to a Pyrobaculum, while<br />

HAV2 yielded good matches to Desulfurococcus and Table<br />

6. Significant <strong>CRISPR</strong> spacer matches to crenarchaeal <strong>the</strong>rmoneutrophilic viruses and plasmids.<br />

Crenarchaeal genome Total spacers <strong>CRISPR</strong> Spacer Virus/plasmid Host genus ORF e-value Alignment<br />

1LGRSYDTIRKYQ12<br />

:: : :::. : : :.<br />

114 AKILGREYDTVRKYRNAA 131<br />

Pyrobaculum arsenaticum 126 90 47 HAV1 253 0.012<br />

1 WLHWLYIYGASHTG 14<br />

:..:::.::.:.::<br />

568 RGKW IRWLYLYGSSKTGKTT 587<br />

Desulfurococcus kamchatkensis 94 88 10 HAV2 909 0.023<br />

1VVYVDETYTSATCP14<br />

:::::.:::. ::<br />

329 GITAVYVDEAYTSSKCPIHG 348<br />

Thermoproteus neutrophilus 225 26 13 HAV2 420 0.067<br />

1 DIWKIRWPEAIKS 13<br />

:: : : . . : : :.. .<br />

75 RNFDIWKVKWPTALRAQIA 93<br />

97 0.017<br />

Pyrobaculum<br />

Thermoproteus<br />

Thermoproteus neutrophilus 225 16 5 PSV<br />

1 RCDLCGRRVSYET 13<br />

:::.::: . .. :<br />

40 PDTRCD I CGRK I GYGPYMV 58<br />

Thermoproteus neutrophilus 225 38 5 TTSV1 Thermoproteus 100a 0.041<br />

1 RCDLCGRRVSYET 13<br />

::: .::: ... :<br />

40 PDTRCD ICGRK I GYGPYMV 58<br />

Thermoproteus neutrophilus 225 38 6 TTSV1 Thermoproteus 100a 0.041<br />

1 AQYNSWLESRL 11<br />

:::::: :::::<br />

112 EVEAQYNSWLESRLAVL 128<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930<br />

Thermofilum pendens 182 27 17 pTPEN01 Thermofilum 633 0.079<br />

<strong>CRISPR</strong> repeat clusters are identified by <strong>the</strong> total number of repeats, and spacers are numbered from <strong>the</strong> leader end. e-values derive from search<strong>in</strong>g translated spacers aga<strong>in</strong>st a database with a length of<br />

6.3 million am<strong>in</strong>o acids, where <strong>the</strong> viral/plasmid ORFs comprise 0.8%, and <strong>the</strong> crenarchaeal genome ORFs constitute 99.2%, of <strong>the</strong> sequence database. A total of 1321 <strong>CRISPR</strong> spacers were extracted from<br />

<strong>the</strong> 13 genomes which ranged from 39 repeats for Caldivirga maquil<strong>in</strong>gensis IC-167 to 225 for T. neutrophilus.


Thermoproteus, consistent with <strong>the</strong> high sequence similarity<br />

of <strong>the</strong> HAV2 genome to a Desulfurococcus IS<br />

element (Table 3; Fig. 2B). The next two most significant<br />

matches (not <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> table) were both between<br />

pHA1 ORF68 and s<strong>in</strong>gle spacers <strong>in</strong> T. pendens and T.<br />

neutrophilus, with e-values of 0.64 and 0.67, respectively,<br />

also consistent with <strong>the</strong> extensive gene homology<br />

between pHA1 and <strong>the</strong> T. pendens plasmid TPEN01<br />

(Fig. 4A). Thus, this approach appears to yield useful<br />

<strong>in</strong>sights <strong>in</strong>to possible hosts for <strong>the</strong> newly characterized<br />

archaeal genetic elements, and it should be more generally<br />

applicable for such metagenomic studies as more<br />

archaeal <strong>CRISPR</strong> repeat-cluster sequences, or whole<br />

genome sequences become available.<br />

Discussion<br />

We characterized <strong>the</strong> genomic diversity of viruses and<br />

plasmids <strong>in</strong> a bioreactor established from a sample from a<br />

hot spr<strong>in</strong>g at Yellowstone National Park (Obsidian Pool)<br />

and ma<strong>in</strong>ta<strong>in</strong>ed at 85°C and pH 6 for 2 years (Rachel<br />

et al., 2002). Us<strong>in</strong>g a variety of clon<strong>in</strong>g strategies to select<br />

for l<strong>in</strong>ear or circular genomes, and to dist<strong>in</strong>guish viruses<br />

from plasmids, <strong>the</strong> analyses yielded two novel viral<br />

genomes, HAV1 and HAV2, from samples highly enriched<br />

<strong>in</strong> filamentous and tadpole-shaped VLPs, respectively,<br />

where <strong>the</strong> former yielded several genomic variants. No<br />

additional longer genomic contigs were assembled, from<br />

ei<strong>the</strong>r sample, which could correspond to <strong>the</strong> o<strong>the</strong>r elongated<br />

VLP that was observed <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al sample<br />

(Fig. 1) (Rachel et al., 2002). Nei<strong>the</strong>r viral genome shows<br />

any clear similarity to o<strong>the</strong>r known archaeal viruses; only<br />

HAV2 shows morphological similarities with <strong>the</strong> two-tailed<br />

bicaudavirus ATV, and limited sequence similarity<br />

between two genes (Prangishvili et al., 2006c), and <strong>the</strong>y<br />

may be distantly related.<br />

Electron microscopic visualization of bioreactor<br />

samples taken at regular <strong>in</strong>tervals <strong>in</strong>dicated that <strong>the</strong> levels<br />

of <strong>in</strong>dividual types of VLPs dramatically rose and fell over<br />

time. This was also true for <strong>the</strong> HAV1 variants which<br />

showed different yields with time as revealed by gel electrophoresis<br />

(data not shown). Presumably this reflects a<br />

reaction to: (i) <strong>the</strong> availability of receptive host cells, and<br />

(ii) <strong>the</strong> ability to overcome <strong>the</strong> archaeal cellular <strong>CRISPR</strong><br />

<strong>immune</strong> <strong>system</strong>s (Lillestøl et al., 2006; Shah et al., 2009;<br />

Van der Oost et al., 2009). We <strong>in</strong>fer that <strong>the</strong> extensive<br />

variety of HAV1 variants, which carry numerous sequence<br />

changes and major genomic structural alterations, reflect<br />

adaptation of <strong>the</strong> virus to <strong>the</strong>se constra<strong>in</strong>ts. Moreover, <strong>the</strong><br />

fact that <strong>the</strong>y were isolated as virions (Table 1) suggests<br />

that <strong>the</strong>y are all functional. These observations may be<br />

relevant to an earlier study which demonstrated a selective<br />

bias of viruses <strong>in</strong> laboratory cultures of environmental<br />

samples which conta<strong>in</strong>ed diverse crenarchaeal fusellovi-<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2927<br />

ruses (Snyder et al., 2004). Possibly those that were<br />

undetectable by viral DNA amplification <strong>in</strong> <strong>the</strong> laboratory<br />

cultures had undergone genome rearrangements or were<br />

present <strong>in</strong> very low amounts, or <strong>in</strong> <strong>in</strong>tegrated form, at <strong>the</strong><br />

time of test<strong>in</strong>g.<br />

Attempts were made to identify putative viral hosts by<br />

isolat<strong>in</strong>g stra<strong>in</strong>s from <strong>the</strong> bioreactor us<strong>in</strong>g a laser microscope<br />

and cell sorter but none of <strong>the</strong>m were <strong>in</strong>fected with<br />

viruses and, moreover, no crenarchaeal stra<strong>in</strong>s were<br />

found which were <strong>in</strong>fected by <strong>the</strong> crude virus preparations<br />

(Fig. 1B and C) except for <strong>the</strong> spherical PSV, characterized<br />

earlier, which <strong>in</strong>fected two Pyrobaculum and<br />

Thermoproteus stra<strong>in</strong>s (Här<strong>in</strong>g et al., 2004). Moreover, at<br />

present no reliable practical procedures have been<br />

developed for transfect<strong>in</strong>g viral DNA <strong>in</strong>to neutrophilic<br />

crenarchaea.<br />

Sequence heterogeneities occur throughout <strong>the</strong> HAV1<br />

genome, such that <strong>the</strong> f<strong>in</strong>al sequence is necessarily a<br />

consensus, where <strong>the</strong> dom<strong>in</strong>ant nucleotide is taken at<br />

each position. Moreover, nearly half <strong>the</strong> predicted genes<br />

carry regions that were particularly susceptible to<br />

sequence change (Fig. 2A), and some of <strong>the</strong>se also <strong>in</strong>cur<br />

deletions or <strong>in</strong>sertions. We <strong>in</strong>fer that <strong>the</strong>ir gene products<br />

are most likely to be <strong>in</strong>volved <strong>in</strong> virus–host <strong>in</strong>teractions,<br />

cell adhesion or viral extrusion mechanisms. These genes<br />

<strong>in</strong>clude ORFs 140, 174, 276, 325 and 585 (Fig. 2A) and<br />

ORF585 is by far <strong>the</strong> most susceptible to change and is<br />

<strong>the</strong>refore a strong candidate for recognition of cellular<br />

receptors. The latter is rem<strong>in</strong>iscent of <strong>the</strong> hypervariable<br />

ORFTPX of <strong>the</strong> Thermoproteus virus TTV1, although<br />

ORFTPX sequence changes occurred by a different<br />

mechanism (Neumann and Zillig, 1990a,b)<br />

The most conserved genes of HAV1 (Fig. 2A) are<br />

strong candidates for participation <strong>in</strong> <strong>the</strong> basic viral<br />

mechanisms of DNA replication, transcriptional regulation<br />

and virion packag<strong>in</strong>g. Earlier studies on <strong>the</strong> filamentous<br />

and rod-shaped viruses of crenarchaeal <strong>the</strong>rmoacidophiles<br />

concluded that <strong>the</strong> conserved core viral genes tend<br />

to be concentrated at <strong>the</strong> centre of l<strong>in</strong>ear genomes (Vestergaard<br />

et al., 2008a,b) and this is consistent with <strong>the</strong><br />

variants carry<strong>in</strong>g deletions of comb<strong>in</strong>ations of <strong>the</strong> four<br />

genes at <strong>the</strong> left end of <strong>the</strong> genome (Fig. 2A).<br />

There is a precedent for <strong>the</strong> formation of multiple<br />

genomic variants of a crenarchaeal virus. Earlier, <strong>the</strong> crenarchaeal<br />

rudivirus SIRV1 was isolated and passed<br />

through a series of closely related Sulfolobus islandicus<br />

stra<strong>in</strong>s, before reisolat<strong>in</strong>g <strong>the</strong> virions and sequenc<strong>in</strong>g <strong>the</strong>ir<br />

genomes. Several SIRV1 variants were detected which<br />

also exhibited localized regions of <strong>in</strong>sertions, deletions,<br />

duplications and extensive gene sequence changes<br />

(Peng et al., 2004). However, at least some of <strong>the</strong> underly<strong>in</strong>g<br />

mechanisms of genomic change appear to be different.<br />

For example, HAV1 carries recomb<strong>in</strong>ation sites<br />

constitut<strong>in</strong>g low-complexity direct repeats, some of which<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2928 R. A. Garrett et al.<br />

can generate hairp<strong>in</strong> structures (Table 5) and are possibly<br />

related to <strong>the</strong> recomb<strong>in</strong>ation sites characterized for plasmids<br />

of Sulfolobus which can generate regular hairp<strong>in</strong><br />

structures (Peng et al., 2000; Greve et al., 2004). In contrast,<br />

SIRV1 variants <strong>in</strong>curred multiple 12 bp <strong>in</strong>dels,<br />

ma<strong>in</strong>ly with<strong>in</strong> genes (Peng et al., 2004) and <strong>the</strong>y were not<br />

observed for HAV1. Despite some mechanistic differences,<br />

<strong>the</strong> overall genomic changes <strong>in</strong> <strong>the</strong> viral variants<br />

are quite similar with some genes be<strong>in</strong>g conserved,<br />

o<strong>the</strong>rs dispensable and deleted, and a few genes are<br />

radically changed <strong>in</strong> sequence.<br />

In contrast to classical studies on virus characterization,<br />

a degree of uncerta<strong>in</strong>ty necessarily exists <strong>in</strong> <strong>the</strong> <strong>in</strong>terpretation<br />

of metagenomic data. It is difficult to confirm<br />

unambiguously a genome-type–morphotype relationship,<br />

especially when so few archaeal viral families are characterized<br />

although, as shown here, <strong>the</strong> uncerta<strong>in</strong>ty can be<br />

m<strong>in</strong>imized by first enrich<strong>in</strong>g VLPs. Moreover, attempts to<br />

identify potential archaeal hosts, on <strong>the</strong> basis of <strong>the</strong><br />

<strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>, will become more robust as<br />

more archaeal host chromosomes are sequenced but it<br />

will always be limited by <strong>the</strong> ability of some crenarchaeal<br />

viruses to <strong>in</strong>fect a broader range of host species (Lillestøl<br />

et al., 2006; 2009; Vestergaard et al., 2008b).<br />

Experimental procedures<br />

DNA isolation and sequenc<strong>in</strong>g<br />

All virion preparations from CsCl density gradients were<br />

dialysed aga<strong>in</strong>st 10 mM Tris-acetate, pH 6 overnight. For<br />

some libraries (Table 1) chromosomal and plasmid DNA contam<strong>in</strong>ation<br />

was removed from viral samples (tadpole-2 and<br />

filament) by treat<strong>in</strong>g first with DNase I (50 units ml -1 ) at 37°C<br />

for 15 m<strong>in</strong>. followed by heat <strong>in</strong>activation of <strong>the</strong> DNase I at<br />

85°C for 15 m<strong>in</strong>. Nucleic acid was isolated from virions as<br />

described earlier (Peng et al., 2004); briefly, virions were<br />

disrupted by <strong>in</strong>cubation with 1% SDS and 0.5 mg ml -1 prote<strong>in</strong>ase<br />

K at 50°C for 1 h, DNA was extracted by phenol and<br />

phenol-chloroform treatment before precipitat<strong>in</strong>g with 0.1 vol.<br />

of 3 M sodium acetate, pH 5.3, 0.8 vol. of isopropanol. The<br />

DNA pellet was washed with 70% ethanol, air-dried and<br />

resuspended <strong>in</strong> an appropriate volume of 10 mM Tris-HCl, pH<br />

8.0, 1 mM EDTA. Clone libraries were prepared by sonicat<strong>in</strong>g<br />

DNA to produce fragments of 2–3 kb and <strong>the</strong>n construct<strong>in</strong>g<br />

shot-gun libraries us<strong>in</strong>g SmaI-digested pUC18 as clon<strong>in</strong>g<br />

vector (Peng, 2008) and, also, us<strong>in</strong>g <strong>the</strong> L<strong>in</strong>ker Amplified<br />

Shotgun Library method described at http://www.sci.<br />

sdsu.edu/PHAGE/LASL/. DNA was extracted us<strong>in</strong>g a Model<br />

8000 Biobot (Qiagen, Westburg, Germany) and sequenced <strong>in</strong><br />

MegaBACE 1000 sequenators (Amersham Biotech, Amersham,<br />

UK). Viral and plasmid sequences were assembled<br />

us<strong>in</strong>g Sequencher 4.9 (http://www.genecodes.com/).<br />

Genome analyses and gene annotations were performed<br />

us<strong>in</strong>g Artemis (http://www.sanger.ac.uk/Software/Artemis/).<br />

Gene sequence searches were made <strong>in</strong> GenBank/EMBL<br />

(http://www.ncbi.nlm.nih.gov/blast) and motifs were identified<br />

us<strong>in</strong>g <strong>the</strong> SMART facility (http://smart.embl-heidelberg.de/).<br />

Identify<strong>in</strong>g spacer matches<br />

<strong>CRISPR</strong> spacer sequences were extracted from <strong>the</strong> available<br />

crenarchaeal <strong>the</strong>rmoneutrophilic genomes: A. pernix<br />

K1 (NC_000854), Caldivirga maquil<strong>in</strong>gensis IC-167<br />

(NC_009954), D. kamchatkensis 1221n (NC_011766),<br />

Hyper<strong>the</strong>rmus butylicus DSM 5456 (NC_008818), Ignicoccus<br />

hospitalis KIN4/I (NC_009776), Nitrosopumilus maritimus<br />

SCM1 (NC_010085), Pyrobaculum aerophilum IM2 (NC_<br />

003364), Pyrobaculum arsenaticum DSM 13514<br />

(NC_009376), Pyrobaculum calidifontis JCM 11548 (NC_<br />

009073), Pyrobaculum islandicum DSM 4184 (NC_008701),<br />

Staphylo<strong>the</strong>rmus mar<strong>in</strong>us F1 (NC_009033), T. pendens Hrk 5<br />

(NC_008698) and T. neutrophilus V24Sta (NC_010525).<br />

They were aligned aga<strong>in</strong>st HAV1, HAV2, pHA1, pHB1,<br />

pHB2a and pHB2b, and published genomes of <strong>the</strong> viruses<br />

PSV (AJ635161), TTSV1 (AY722806) and TTV1 (X14855),<br />

and <strong>the</strong> plasmid pTPEN01 (NC_008696), us<strong>in</strong>g an MMX<br />

optimized Smith-Waterman implementation (Saebø et al.,<br />

2005). Alignments were performed at both a nucleotide level<br />

and an am<strong>in</strong>o acid sequence level by translat<strong>in</strong>g <strong>the</strong> spacers<br />

<strong>in</strong> all seven read<strong>in</strong>g frames essentially as described earlier<br />

(Shah et al., 2009), where <strong>the</strong> false positive level was estimated<br />

by align<strong>in</strong>g <strong>the</strong> spacers aga<strong>in</strong>st all <strong>the</strong> above crenarchaeal<br />

genomes (m<strong>in</strong>us <strong>CRISPR</strong> repeat regions) and us<strong>in</strong>g<br />

this as a negative control.<br />

Acknowledgements<br />

We thank Ariane Bize, Lanm<strong>in</strong>g Chen, Hien Phan, John<br />

Smyth, Gisle Vestergaard and Kim Brügger for much help <strong>in</strong><br />

<strong>the</strong> early stages of this work. The research <strong>in</strong> Copenhagen<br />

was supported by <strong>the</strong> Natural Science Research Council.<br />

References<br />

Ahn, D.G., Kim, S.I., Rhee, J.K., Kim, K.P., Pan, J.G., and Oh,<br />

J.W. (2006) TTSV1, a new virus-like particle isolated from<br />

<strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeote Thermoproteus<br />

tenax. Virology 351: 280–290.<br />

Anderson, I., Rodriguez, J., Susanti, D., Porat, I., Reich, C.,<br />

Ulrich, L.E., et al. (2008) Genome sequence of Thermofilum<br />

pendens reveals an exceptional loss of biosyn<strong>the</strong>tic<br />

pathways without genome reduction. J Bacteriol 190:<br />

2957–2965.<br />

Andersson, A.F., and Banfield, J.F. (2008) Virus population<br />

dynamics and acquired virus resistance <strong>in</strong> natural microbial<br />

communities. Science 320: 1047–1050.<br />

Barrangou, R., Fremaux, C., Deveau, H., Richards, M.,<br />

Boyaval, P., Mo<strong>in</strong>eau, S., Romero D.A. and Horvath, P.<br />

(2007) <strong>CRISPR</strong> provides acquired resistance aga<strong>in</strong>st<br />

viruses <strong>in</strong> prokaryotes. Science 315: 1709–1712.<br />

Bettstetter, M., Peng, X., Garrett, R.A., and Prangishvili, D.<br />

(2003) AFV1, a novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic<br />

archaea of <strong>the</strong> genus Acidianus. Virology 315: 68–79.<br />

Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />

Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />

of <strong>the</strong> Kamchatka pen<strong>in</strong>sula. Res Microbiol 159:<br />

358–366.<br />

Diez, B., Anton, J., Guixa-Boixereu, N., Pedros-Alio, C., and<br />

Rodriguez-Valera, F. (2000) Pulse-field gel electrophoresis<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


of virus assemblages present <strong>in</strong> a hypersal<strong>in</strong>e environment.<br />

Int Microbiol 3: 159–164.<br />

Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />

R.A. (2004) Genomic comparison of archaeal conjugative<br />

plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />

Greve, B, Jensen, S., Phan, H., Brügger, K., Zillig, W., She,<br />

Q., and Garrett, R.A. (2005) Novel plasmids pTAU4,<br />

pORA1 and pTIK4 from Sulfolobus neozealandicus.<br />

<strong>Archaea</strong> 1: 319–325.<br />

Här<strong>in</strong>g, M., Peng, X., Brügger, K., Rachel, R., Stetter, K.O.,<br />

Garrett, R.A., and Prangishvili, D. (2004) Morphology and<br />

genome organisation of <strong>the</strong> virus PSV of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

archaeal genera Pyrobaculum and Thermoproteus:a<br />

novel virus family, <strong>the</strong> Globuloviridae. Virology 323: 233–<br />

242.<br />

Janekovic, D., Wunderl, S., Holz, I., Zillig, W., Gierl, A., and<br />

Neumann, H. (1983) TTV1, TTV2 and TTV3, a family of<br />

viruses of <strong>the</strong> extremely <strong>the</strong>rmophilic anaerobic, sulphur<br />

reduc<strong>in</strong>g, archaebacterium Thermoproteus tenax. Mol Gen<br />

Genet 192: 39–45.<br />

Krupovic, M., and Bamford, D.H. (2008) <strong>Archaea</strong>l proviruses<br />

TKV4 and MVV extend <strong>the</strong> PRD1-adenovirus l<strong>in</strong>eage to<br />

<strong>the</strong> phylum Euryarchaeota. Virology 375: 292–300.<br />

Krupovic, M., Forterre, P., and Bamford, D.H. (2010) Comparative<br />

analysis of <strong>the</strong> mosaic genomes of tailed archaeal<br />

viruses and proviruses suggests a common <strong>the</strong>mes for<br />

virion architecture and assembly with tailed viruses of bacteria.<br />

J Mol Biol 397: 144–160.<br />

Lawrence, C.M., Menon, S., Eilers, B.J., Bothner, B., Khayat,<br />

R., Douglas, T., and Young, M.J. (2009) Structural and<br />

functional studies of archaeal viruses. J Biol Chem 284:<br />

12599–12603.<br />

Lillestøl, R.K., Redder, P., Garrett, R.A., and Brügger, K.<br />

(2006) A putative viral defence mechanism <strong>in</strong> archaeal<br />

cells. <strong>Archaea</strong> 2: 59–72.<br />

Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H.,<br />

Christiansen, J., and Garrett, R.A. (2009) <strong>CRISPR</strong> families<br />

of <strong>the</strong> crenarchaeal genus Sulfolobus: bidirectional transcription<br />

and dynamic properties. Mol Microbiol 72: 259–<br />

272.<br />

Lipps, G., We<strong>in</strong>ierzl, A.O., von Scheven, G., Buchen, C. and<br />

Cramer, P. (2004) Structure of a bifunctional DNA primasepolymerase.<br />

Nat Struct Mol Biol 11: 157–162.<br />

Mochizuki, T., Yoshida, T., Tanaka, R., Forterre, P., Sako, Y.,<br />

and Prangishvili, D.(2010) Diversity of viruses of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

archaeal genus Aeropyrum, and isolation of<br />

<strong>the</strong> Aeropyrum pernix bacilliform virus 1, APBV1, <strong>the</strong> first<br />

representative of <strong>the</strong> family ‘Clavaviridae’. Virology 402:<br />

347–352.<br />

Neumann, H., and Zillig, W. (1990a) Structural variability <strong>in</strong><br />

<strong>the</strong> genome of Thermoproteus tenax virus TTV1. Mol Gen<br />

Genet 222: 435–437.<br />

Neumann, H., and Zillig, W. (1990b) The TTV1-encoded viral<br />

prote<strong>in</strong> TPX: primary structure of <strong>the</strong> gene and <strong>the</strong> prote<strong>in</strong>.<br />

Nucleic Acids Res 18: 195.<br />

Oren, A., Bratbak, G., and Hendal, M. (1997) Occurrence of<br />

virus-like particles <strong>in</strong> <strong>the</strong> Dead Sea. Extremophiles 1: 143–<br />

149.<br />

Ortmann, A.C., Wiedenheft, B., Douglas, T., and Young, M.<br />

(2006) Hot crenarchaeal viruses reveal deep evolutionary<br />

connections. Nat Rev Microbiol 4: 520–528.<br />

Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2929<br />

Peng, X. (2008) Evidence for <strong>the</strong> horizontal transfer of an<br />

<strong>in</strong>tegrase gene from a fusellovirus to a pRN-like plasmid<br />

with<strong>in</strong> a s<strong>in</strong>gle stra<strong>in</strong> of Sulfolobus and <strong>the</strong> implications for<br />

plasmid survival. Microbiology 154: 383–391.<br />

Peng, X., Holz, I., Zillig, W., Garrett, R. A., and She, Q.<br />

(2000) Evolution of <strong>the</strong> family of pRN plasmids and <strong>the</strong>ir<br />

<strong>in</strong>tegrase-mediated <strong>in</strong>sertion <strong>in</strong>to <strong>the</strong> chromosome of <strong>the</strong><br />

Crenarchaeon Sulfolobus solfataricus. J Mol Biol 303:<br />

449–454.<br />

Peng, X., Kessler, A., Phan, H., Garrett, R.A., and Prangishvili,<br />

D. (2004) Multiple variants of <strong>the</strong> archaeal DNA rudivirus<br />

SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel mechanism of<br />

genome variation. Mol Microbiol 54: 366–375.<br />

Porter, K., Russ, B.E.,and Dyall-Smith, M.L. (2007) Virus–<br />

host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10:<br />

418–424.<br />

Prangishvili, D., Forterre, P., and Garrett, R.A. (2006a)<br />

Viruses of <strong>the</strong> archaea: a unify<strong>in</strong>g view. Nat Rev Microbiol<br />

4: 837–838.<br />

Prangishvili, D., Garrett, R.A., and Koon<strong>in</strong>, E.V. (2006b)<br />

Evolutionary genomics of archaeal viruses: unique viral<br />

genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–<br />

67.<br />

Prangishvili, D., Vestergaard, G., Här<strong>in</strong>g, M., Aramayo, R.,<br />

Basta, T., Rachel, R., and Garrett, R.A. (2006c) Structural<br />

and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />

virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />

cycle. J Mol Biol 359: 1203–1216.<br />

Rachel, R., Bettstetter, M., Hedlund, B.P., Här<strong>in</strong>g, M.,<br />

Kessler, A., Stetter, K.O., and Prangishvili, D. (2002)<br />

Remarkable morphological diversity of viruses and viruslike<br />

particles <strong>in</strong> terrestrial hot environments. Arch Virol 147:<br />

2419–2429.<br />

Redder, P., Peng, X., Brügger, K., Shah, S.A., Roesch, F.,<br />

Greve, B., She, Q., Schleper, C., Forterre, P., Garrett, R.A.,<br />

and Prangishvili, D. (2009) Four newly isolated fuselloviruses<br />

from extreme geo<strong>the</strong>rmal environments reveal<br />

unusual morphologies and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation<br />

mechanism. Environ Microbiol 11: 2849–2862.<br />

Saebø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K., and<br />

Rognes, T. (2005) PARALIGN: rapid and sensitive<br />

sequence similarity searches powered by parallel comput<strong>in</strong>g<br />

technology. Nucleic Acids Res 33: 535–539.<br />

Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />

of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />

of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />

<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Trans Biochem Soc 37: 23–<br />

28.<br />

Snyder, J.C., Spuhler, J., Wiedenheft, B., Roberto, F.F.,<br />

Douglas, T., and Young, M.J. (2004) Effects of cultur<strong>in</strong>g on<br />

<strong>the</strong> population structure of a hyper<strong>the</strong>rmophilic virus.<br />

Microbiol Ecol 48: 561–566.<br />

Torar<strong>in</strong>sson, E., Klenk, H.-P., and Garrett, R.A. (2005) Divergent<br />

transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>.<br />

Environ Microbiol 7: 47–54.<br />

Van der Oost, J., Jore, M.M., Westra, E.R., Lundgren, M.,<br />

and Brouns, S.J. (2009) <strong>CRISPR</strong>-based adaptive and heritable<br />

immunity <strong>in</strong> prokaryotes. Trends Biochem Sci 34:<br />

401–407.<br />

Vestergaard, G., Aramayo, R., Basta, T., Här<strong>in</strong>g, M., Peng,<br />

X., Brügger, K., Chen, L., Rachel, R., Boisset, N., Garrett,<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


2930 R. A. Garrett et al.<br />

R.A., and Prangishvili, D. (2008a) Structure of <strong>the</strong> Acidianus<br />

filamentous virus 3 and comparative genomics of<br />

related archaeal lipothrixviruses Acidianus. J Virol 82:<br />

371–381.<br />

Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />

M., Phan, H., Briegel, A., Rachel, R., Garrett, R.A., and<br />

Prangishvili, D. (2008b) SRV, a new rudiviral isolate from<br />

Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />

with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J Bacteriol<br />

190: 6837–6845.<br />

©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930


<strong>CRISPR</strong>/Cas and Cmr modules, mobility and evolution of adaptive <strong>immune</strong><br />

<strong>system</strong>s<br />

Abstract<br />

Shiraz A. Shah 1 , Roger A. Garrett* ,1<br />

<strong>Archaea</strong> Centre, Department of Biology, Copenhagen University, DK2200 Copenhagen N, Denmark<br />

Received 17 May 2010; accepted 22 July 2010<br />

Available onl<strong>in</strong>e 21 September 2010<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>eries of archaea and bacteria provide an adaptive and effective defence mechanism directed<br />

specifically aga<strong>in</strong>st viruses and plasmids. Present data suggest that both <strong>CRISPR</strong>/Cas and Cmr modules can behave like <strong>in</strong>tegral genetic<br />

elements. They tend to be located <strong>in</strong> <strong>the</strong> more variable regions of chromosomes and are displaced by genome shuffl<strong>in</strong>g mechanisms <strong>in</strong>clud<strong>in</strong>g<br />

transposition. <strong>CRISPR</strong> loci may be broken up and dispersed <strong>in</strong> chromosomes by transposons with <strong>the</strong> potential for creat<strong>in</strong>g genetic novelty. Both<br />

<strong>CRISPR</strong>/Cas and Cmr modules appear to exchange readily between closely related organisms where <strong>the</strong>y may be subjected to strong selective<br />

pressure. It is likely that this process occurs primarily via conjugative plasmids or chromosomal conjugation. It is <strong>in</strong>ferred that <strong>in</strong>terdoma<strong>in</strong><br />

transfer between archaea and bacteria has occurred, albeit very rarely, despite <strong>the</strong> significant barriers imposed by <strong>the</strong>ir differ<strong>in</strong>g conjugative,<br />

transcriptional and translational mechanisms. There are parallels between <strong>the</strong> <strong>CRISPR</strong> crRNAs and eukaryal siRNAs, most notably to germ cell<br />

piRNAs which are directed, with <strong>the</strong> help of effector prote<strong>in</strong>s, to silence or destroy transposons. No homologous prote<strong>in</strong>s are identifiable at<br />

a sequence level between eukaryal siRNA prote<strong>in</strong>s and those of archaeal or bacterial <strong>CRISPR</strong>/Cas and Cmr modules.<br />

Ó 2010 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.<br />

Keywords: <strong>CRISPR</strong>/Cas; <strong>CRISPR</strong>/Cmr; crRNA; Evolution; Mobile elements; siRNA<br />

1. Introduction<br />

The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s provide <strong>the</strong><br />

basis for an adaptive and a heriditable <strong>immune</strong> <strong>system</strong> directed<br />

aga<strong>in</strong>st <strong>the</strong> DNA and RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g elements.<br />

The former consists of <strong>CRISPR</strong> loci and physically l<strong>in</strong>ked<br />

cassettes of cas genes which toge<strong>the</strong>r appear to constitute<br />

<strong>in</strong>tegral genetic modules. The cmr genes of Cmr modules are<br />

also clustered and are sometimes l<strong>in</strong>ked directly to <strong>the</strong><br />

<strong>CRISPR</strong>/Cas modules. The <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong><br />

occurs <strong>in</strong> most archaea and about 70% of <strong>the</strong>se also carry Cmr<br />

modules, whereas only about 40% of bacteria conta<strong>in</strong><br />

<strong>CRISPR</strong>/Cas modules and about 30% of <strong>the</strong>se exhibit Cmr<br />

modules. Moreover, <strong>the</strong> archaea <strong>CRISPR</strong> loci consist of<br />

* Correspond<strong>in</strong>g author. Tel.: þ45 35322010.<br />

E-mail address: garrett@bio.ku.dk (R.A. Garrett).<br />

1 The two authors contributed equally to <strong>the</strong> work.<br />

Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

0923-2508/$ - see front matter Ó 2010 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.<br />

doi:10.1016/j.resmic.2010.09.001<br />

www.elsevier.com/locate/resmic<br />

clusters of spacer-repeat units and can vary <strong>in</strong> size from one to<br />

more than a hundred spacer-repeat units where each unit is<br />

about 60e90 bp with repeats and spacers of, on average, 30 bp<br />

and 40 bp, respectively (reviewed <strong>in</strong> Karg<strong>in</strong>ov and Hannon,<br />

2010). The <strong>CRISPR</strong> loci are preceded by non-prote<strong>in</strong> cod<strong>in</strong>g<br />

leader regions of about 150e550 bp (Tang et al., 2002; Jansen<br />

et al., 2002; Lillestøl et al., 2006, 2009), and <strong>the</strong>y are generally<br />

physically l<strong>in</strong>ked to a group of cas genes encod<strong>in</strong>g Cas<br />

prote<strong>in</strong>s of diverse functions (Jansen et al., 2002; Haft et al.,<br />

2005; Makarova et al., 2006).<br />

Critical for <strong>the</strong> function<strong>in</strong>g of <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are <strong>the</strong><br />

spacer sequences which derive from foreign <strong>in</strong>vad<strong>in</strong>g elements<br />

(Mojica et al., 2005; Pourcel et al., 2005; Bolot<strong>in</strong> et al., 2005;<br />

Barrangou et al., 2007). Whole transcripts are produced from<br />

<strong>CRISPR</strong> loci which <strong>in</strong>itiate with<strong>in</strong> <strong>the</strong> leader sequence adjacent<br />

to <strong>the</strong> first repeat (Lillestøl et al., 2009), and <strong>the</strong>y are<br />

subsequently processed <strong>in</strong> <strong>the</strong> repeat regions to yield endproducts<br />

correspond<strong>in</strong>g to s<strong>in</strong>gle spacer crRNAs (Tang et al.,<br />

2002, 2005; Lillestøl et al., 2006). Regulation of formation


28 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

of <strong>the</strong> whole <strong>CRISPR</strong> transcript is probably required to prevent<br />

<strong>in</strong>terference from promoter and term<strong>in</strong>ator regions which are<br />

randomly taken up <strong>in</strong> <strong>the</strong> spacers (Shah et al., 2009). The<br />

process<strong>in</strong>g is effected by specific Cas or Cmr prote<strong>in</strong>s which,<br />

at least for <strong>the</strong> latter, generate two discrete crRNAs each<br />

carry<strong>in</strong>g 8 bp of repeat at <strong>the</strong> 5 0 -end and lack<strong>in</strong>g 2 nt and 8 nt,<br />

from <strong>the</strong> 3 0 -end of each spacer (Hale et al., 2009). Comb<strong>in</strong>ations<br />

of prote<strong>in</strong>s <strong>the</strong>n transport <strong>the</strong> processed crRNAs to target<br />

and <strong>in</strong>activate <strong>in</strong>vad<strong>in</strong>g genetic elements for both <strong>CRISPR</strong>/Cas<br />

and <strong>CRISPR</strong>/Cmr <strong>system</strong>s (Brouns et al., 2008; Hale et al.,<br />

2008, 2009; Carte et al., 2008). Base pair<strong>in</strong>g mismatches<br />

occurr<strong>in</strong>g between <strong>the</strong> 5 0 8 nt repeat sequence of <strong>the</strong> crRNA<br />

and <strong>the</strong> sequence adjacent to <strong>the</strong> targeted protospacer of <strong>the</strong><br />

<strong>in</strong>vad<strong>in</strong>g DNA are essential for subsequent degradation of <strong>the</strong><br />

latter and for ensur<strong>in</strong>g that <strong>the</strong> chromosomal <strong>CRISPR</strong> locus,<br />

itself, is not targeted (Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2010).<br />

Cas and Cmr prote<strong>in</strong>s are phylogenetically and functionally<br />

very diverse and are <strong>in</strong>volved <strong>in</strong> at least two mechanistic pathways<br />

which target <strong>in</strong>vad<strong>in</strong>g genetic elements via <strong>the</strong> crRNAs.<br />

The <strong>CRISPR</strong>/Cas <strong>system</strong> specifically targets DNA (Marraff<strong>in</strong>i<br />

and Son<strong>the</strong>imer, 2008; Shah et al., 2009), while <strong>the</strong> <strong>CRISPR</strong>/<br />

Cmr <strong>system</strong> targets RNA (Hale et al., 2009). The two pathways<br />

require <strong>the</strong> products of <strong>the</strong> cas gene cassette adjo<strong>in</strong><strong>in</strong>g a <strong>CRISPR</strong><br />

locus or <strong>the</strong> products of <strong>the</strong> Cmr module which is ei<strong>the</strong>r directly<br />

l<strong>in</strong>ked to a <strong>CRISPR</strong>/Cas module or lies separately on <strong>the</strong> chromosome<br />

(Fig. 1) (Jansen et al., 2002; Makarova et al., 2006).<br />

Although most bacterial <strong>CRISPR</strong>/Cas modules are unpaired,<br />

different comb<strong>in</strong>ations of <strong>CRISPR</strong>/Cas and Cmr modules,<br />

<strong>in</strong>clud<strong>in</strong>g paired <strong>CRISPR</strong> loci, are common amongst <strong>the</strong> crenarchaea<br />

(Fig. 1D and E). Phylogenetic studies have demonstrated<br />

that homologs of a few Cas prote<strong>in</strong>s occur widely<br />

throughout <strong>the</strong> archaeal and bacterial doma<strong>in</strong>s, while o<strong>the</strong>rs are<br />

predom<strong>in</strong>antly archaeal or bacterial <strong>in</strong> character (Haft et al.,<br />

2005; Makarova et al., 2006).<br />

This article will consider <strong>the</strong> follow<strong>in</strong>g issues relat<strong>in</strong>g to <strong>the</strong><br />

mobility and evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>, generally<br />

us<strong>in</strong>g crenarchaeal <strong>CRISPR</strong> <strong>system</strong>s as representative examples:<br />

(1) Whe<strong>the</strong>r <strong>CRISPR</strong>/Cas modules constitute <strong>in</strong>tegral<br />

genetic units. (2) Phylogenetic relationships between <strong>CRISPR</strong>/<br />

Cas and Cmr modules. (3) Diversification and degeneration of<br />

<strong>CRISPR</strong>/Cas modules. (4) Mobilisation and loss of <strong>CRISPR</strong>/<br />

Cas modules. (5) Transfer of <strong>CRISPR</strong>/Cas modules between<br />

organisms. (6) Co-evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> <strong>in</strong> <strong>the</strong><br />

archaeal and bacterial doma<strong>in</strong>s. (7) A possible common<br />

ancestry with <strong>the</strong> diverse eukaryal siRNA <strong>system</strong>s.<br />

2. Methods<br />

Am<strong>in</strong>o acid sequences of Cas1 prote<strong>in</strong>s were collected from<br />

all publicly available archaeal and bacterial genomes by runn<strong>in</strong>g<br />

an <strong>in</strong>-house-constructed Cas1-specific HMM aga<strong>in</strong>st NCBI’s<br />

“non-redundant” prote<strong>in</strong> database. All sequences were extracted<br />

and an all-aga<strong>in</strong>st-all Smi<strong>the</strong>Waterman sequence comparison<br />

was made us<strong>in</strong>g <strong>the</strong> FASTA package (Pearson, 2000). After<br />

tak<strong>in</strong>g <strong>in</strong>to account <strong>the</strong> distribution of <strong>the</strong> result<strong>in</strong>g<br />

Smi<strong>the</strong>Waterman scores, all match<strong>in</strong>g sequence pairs were<br />

assigned weights between 0 and 1 with 0 correspond<strong>in</strong>g to<br />

aSmi<strong>the</strong>Waterman score of 200 or less, and 1 correspond<strong>in</strong>g to<br />

1200 or more. This was used as an <strong>in</strong>put for Markov cluster<strong>in</strong>g<br />

(MCL) (Enright et al., 2002) with <strong>the</strong> default options (<strong>in</strong>flation<br />

factor ¼ 2) as an <strong>in</strong>put for BioLayout (Goldovsky et al., 2005).<br />

Repeat sequences were clustered by a similar approach, but us<strong>in</strong>g<br />

Smi<strong>the</strong>Waterman DNA sequence alignments. Leader sequences<br />

were clustered us<strong>in</strong>g <strong>the</strong> same approach but with an MCL <strong>in</strong>flation<br />

factor of 1.2 due to <strong>the</strong>ir very low sequence conservation.<br />

With <strong>the</strong> exception of <strong>the</strong> genomes of Sulfolobus islandicus<br />

stra<strong>in</strong>s HVE10/4 and Rey15A and Acidianus brierleyi from our<br />

own lab, all o<strong>the</strong>r genomes are publicly available with <strong>the</strong><br />

accession numbers NC_009135, NC_009975, NC_009637,<br />

NC_005791, NC_013769, NC_012589, NC_012588,<br />

NC_012632, NC_012726, NC_012622, NC_012623, 4023466<br />

Fig. 1. Scheme show<strong>in</strong>g different arrangements of <strong>CRISPR</strong>/Cas and Cmr modules. A. Typical monomeric <strong>CRISPR</strong>/Cas structure. B. L<strong>in</strong>ked Cmr and <strong>CRISPR</strong>/Cas<br />

modules. C. Separated Cmr and <strong>CRISPR</strong>/Cas modules. D. Paired family I <strong>CRISPR</strong>/Cas modules carry<strong>in</strong>g <strong>in</strong>verted <strong>CRISPR</strong> loci. Typical gene contents and order<br />

for E. a paired crenarchaeal family I <strong>CRISPR</strong>/Cas module, and F. a crenarchaeal Cmr module.


(JGI project) CP001800, NC_002754. Dot-plots were constructed<br />

us<strong>in</strong>g <strong>the</strong> MUMmer package (Kurtz et al., 2004).<br />

<strong>CRISPR</strong> clusters were found us<strong>in</strong>g publicly available<br />

software (Bland et al., 2007) and Cmr modules were found<br />

us<strong>in</strong>g HMMs constructed <strong>in</strong>-house. The core genomes of<br />

Sulfolobus solfataricus and S. islandicus stra<strong>in</strong>s were determ<strong>in</strong>ed<br />

by f<strong>in</strong>d<strong>in</strong>g all orthologous genes occurr<strong>in</strong>g only once <strong>in</strong><br />

all <strong>the</strong> genomes. Orthologs were found by perform<strong>in</strong>g an allaga<strong>in</strong>st-all<br />

sequence similarity search for all <strong>the</strong> encoded<br />

prote<strong>in</strong>s with subsequent cluster<strong>in</strong>g us<strong>in</strong>g MCL (Enright et al.,<br />

2002). A multiple alignment was made of <strong>the</strong> DNA sequence<br />

correspond<strong>in</strong>g to each ortholog (Edgar, 2004). All multiple<br />

alignments, with gaps removed, were concatenated and <strong>the</strong><br />

result<strong>in</strong>g alignment was used to build a phylogenetic tree<br />

(Thompson et al., 1994). The length of each family I leader<br />

was determ<strong>in</strong>ed us<strong>in</strong>g sequence alignments of different leaders<br />

before construct<strong>in</strong>g <strong>the</strong> leader tree.<br />

3. Results and discussion<br />

3.1. Do <strong>CRISPR</strong>/Cas modules constitute <strong>in</strong>tegral genetic<br />

units?<br />

Several studies have detected a broad phylogenetic correlation<br />

between selected Cas prote<strong>in</strong>s and repeat sequences of<br />

<strong>CRISPR</strong> loci, with <strong>the</strong> reservation that <strong>the</strong> repeats are of<br />

limited and variable size (Haft et al., 2005; Kun<strong>in</strong> et al., 2007).<br />

For <strong>the</strong> Sulfolobales, phylogenetic analyses of sequences of<br />

repeats, leaders and Cas1 prote<strong>in</strong>s demonstrated that <strong>the</strong><br />

<strong>CRISPR</strong>/Cas modules could be classified <strong>in</strong>to at least three<br />

dist<strong>in</strong>ct families (Lillestøl et al., 2009). Here we extend this<br />

analysis and present comparative results for <strong>the</strong> Cas1 prote<strong>in</strong>,<br />

<strong>the</strong> leader and <strong>the</strong> repeat sequences us<strong>in</strong>g unsupervised cluster<strong>in</strong>g.<br />

MCL classifies nodes <strong>in</strong>to clusters based on pairwise<br />

distances to o<strong>the</strong>r nodes (Enright et al., 2002). Here, <strong>the</strong> nodes<br />

comprise <strong>the</strong> sequences of Cas1, <strong>the</strong> repeat and <strong>the</strong> leader and<br />

<strong>the</strong> distances correspond to <strong>the</strong> sequence alignment scores<br />

between <strong>the</strong>m. This approach is preferable to <strong>the</strong> use of<br />

phylogenetic trees for <strong>the</strong> follow<strong>in</strong>g reasons. Firstly, <strong>the</strong><br />

problem of del<strong>in</strong>eat<strong>in</strong>g boundaries between neighbour<strong>in</strong>g<br />

families is determ<strong>in</strong>ed by <strong>the</strong> algorithm itself, avoid<strong>in</strong>g <strong>the</strong><br />

potential error and bias of manual def<strong>in</strong>ition. Moreover, more<br />

than 1000 Cas1 sequences are available <strong>in</strong> public sequence<br />

databases and <strong>the</strong>y cannot be readily presented <strong>in</strong> phylogenetic<br />

trees, whereas <strong>the</strong>y can be visualised <strong>in</strong> a two- or threedimensional<br />

space us<strong>in</strong>g <strong>the</strong> Biolayout program (Goldovsky<br />

et al., 2005). F<strong>in</strong>ally, leader sequences share significant<br />

sequence similarity with<strong>in</strong>, but not across, families such that<br />

all leaders cannot be represented <strong>in</strong> one phylogenetic tree.<br />

Thus, MCL cluster<strong>in</strong>g is <strong>the</strong> best approach for automated<br />

classification of <strong>CRISPR</strong> leader sequences, and by us<strong>in</strong>g <strong>the</strong><br />

same method for Cas1 and repeat sequences, potential<br />

<strong>in</strong>consistencies aris<strong>in</strong>g from us<strong>in</strong>g different methodologies are<br />

avoided.<br />

The results are illustrated <strong>in</strong> Fig. 2 for <strong>the</strong> Sulfolobales and<br />

<strong>the</strong>y show closely similar cluster<strong>in</strong>g patterns for <strong>the</strong> Cas1, leader<br />

and repeat sequences of <strong>the</strong> <strong>CRISPR</strong>/Cas families I to IV<br />

S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

(Fig. 2BeD), consistent with earlier results (Lillestøl et al.,<br />

2009), and <strong>the</strong>y strongly suggest that <strong>the</strong> four <strong>CRISPR</strong>/Cas<br />

families have evolved <strong>in</strong>dependently and that <strong>the</strong>y do <strong>in</strong>deed<br />

constitute discrete genetic modules. The results <strong>in</strong> Fig. 2A reveal<br />

that each of <strong>the</strong> Sulfolobales families IeIVare components of an<br />

earlier def<strong>in</strong>ed group of families, CASS1 þ5 þ 6 þ 7(Haft et al.,<br />

2005; Makarova et al., 2006) that <strong>in</strong> Fig. 2A can be seen to merge<br />

<strong>in</strong>to a superfamily.<br />

For bacteria, a comparative genomic analysis of stra<strong>in</strong>s<br />

of Streptococcus <strong>the</strong>rmophilus also revealed a putative<br />

co-evolution of Cas prote<strong>in</strong>s and <strong>CRISPR</strong> loci with<strong>in</strong> <strong>the</strong><br />

<strong>CRISPR</strong>/Cas modules (Horvath et al., 2008), and a more<br />

extensive study of <strong>CRISPR</strong> loci <strong>in</strong> 47 genomes of a variety of<br />

genera and species of lactic acid bacteria revealed 8 different<br />

classes of <strong>CRISPR</strong>/Cas modules with evidence for a phylogenetic<br />

congruence between Cas1 prote<strong>in</strong> sequences, <strong>the</strong> repeat<br />

sequences, and <strong>the</strong> cas gene content and synteny but, with one<br />

partial exception, no phylogenetic l<strong>in</strong>k was detected between<br />

<strong>the</strong> leader regions and <strong>the</strong> rest of <strong>the</strong> <strong>CRISPR</strong>/Cas modules<br />

(Horvath et al., 2009). Whe<strong>the</strong>r <strong>the</strong> latter reflects a real<br />

difference <strong>in</strong> <strong>the</strong> significance of <strong>the</strong> leader between <strong>the</strong>se<br />

bacteria and <strong>the</strong> crenarchaea requires fur<strong>the</strong>r clarification.<br />

Amongst crenarchaea, <strong>the</strong>re is a preference for paired<br />

<strong>CRISPR</strong> loci which are <strong>in</strong>verted with respect to one ano<strong>the</strong>r,<br />

generally (see below) result<strong>in</strong>g <strong>in</strong> <strong>in</strong>ternalised leader regions<br />

and some cas genes located between <strong>the</strong> leaders (Fig. 1D and<br />

E) (Lillestøl et al., 2009). Moreover, for <strong>the</strong> Sulfolobales, at<br />

least, <strong>the</strong> paired modules belong to <strong>the</strong> same family and share<br />

a s<strong>in</strong>gle set of cas genes. Family I <strong>CRISPR</strong>/Cas modules are<br />

<strong>the</strong> most common amongst <strong>the</strong> Sulfolobales and o<strong>the</strong>r crenarchaea<br />

and <strong>the</strong>y are also <strong>the</strong> most conserved <strong>in</strong> structure. The<br />

cas genes are partitioned, with one group located between <strong>the</strong><br />

leaders and ano<strong>the</strong>r ly<strong>in</strong>g externally at one end of <strong>the</strong> module<br />

(Fig. 1D and E). This separation may be functionally significant<br />

with <strong>the</strong> <strong>in</strong>ternal cas genes adjacent to both leader regions<br />

encod<strong>in</strong>g prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g and <strong>in</strong>sertion of<br />

DNA spacer-repeat units, while <strong>the</strong> external cas genes encode<br />

RNA process<strong>in</strong>g and guid<strong>in</strong>g prote<strong>in</strong>s. There are fewer identified<br />

examples of paired family II and III <strong>CRISPR</strong>/Cas<br />

modules and <strong>the</strong>y appear to be less conserved <strong>in</strong> <strong>the</strong>ir genetic<br />

organisation than <strong>the</strong> family I modules and, at this stage, it is<br />

premature to propose a consensus structure. A similar familyspecific<br />

cas gene content and synteny has also been observed<br />

for <strong>CRISPR</strong>/Cas modules of lactic acid bacteria (Horvath<br />

et al., 2009). Presumably, <strong>the</strong> pair<strong>in</strong>g of <strong>the</strong> <strong>CRISPR</strong>/Cas<br />

modules reflects a compromise between limit<strong>in</strong>g <strong>the</strong> sizes of<br />

<strong>in</strong>dividual <strong>CRISPR</strong> loci and avoid<strong>in</strong>g <strong>the</strong> necessity of<br />

produc<strong>in</strong>g very long transcripts while us<strong>in</strong>g only one set of cas<br />

genes. Moreover, if one <strong>CRISPR</strong> locus becomes <strong>in</strong>activated as<br />

a result of, for example, mutations at <strong>the</strong> leader-repeat junction,<br />

<strong>the</strong> o<strong>the</strong>r locus will still be active.<br />

3.2. Phylogenetic relationships between <strong>CRISPR</strong>/Cas<br />

and Cmr modules<br />

The Cmr module has been implicated <strong>in</strong> direct<strong>in</strong>g processed<br />

crRNAs to target <strong>the</strong> RNA of <strong>in</strong>vad<strong>in</strong>g genetic<br />

29


30 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

Fig. 2. Results of MCL cluster<strong>in</strong>g of components of <strong>CRISPR</strong>/Cas modules visualised us<strong>in</strong>g BioLayout (Goldovsky et al., 2005). A. Cluster<strong>in</strong>g of all Cas1 prote<strong>in</strong>s<br />

found <strong>in</strong> public databases where 5 large clusters and 11 smaller ones emerge and are colour-coded. Sequences with<strong>in</strong> a given cluster show as little as 25% am<strong>in</strong>o<br />

acid sequence identity. Three of <strong>the</strong> large clusters correspond directly to previously def<strong>in</strong>ed families, labelled CASS2 to 4 (Haft et al., 2005; Makarova et al., 2006).<br />

B to D. Cluster<strong>in</strong>g of Sulfolobales <strong>CRISPR</strong>/Cas families I, II, III and IV: B - Cas1 prote<strong>in</strong>s; C leaders where leaders from <strong>the</strong> same family share about 70%<br />

nucleotide sequence identity and little or no nucleotide sequence conservation occurs between different families; D - repeats which show about 80% sequence<br />

identity with<strong>in</strong> a given family. The results for <strong>the</strong> four families <strong>in</strong> B to D show similar patterns. Colour-cod<strong>in</strong>g for <strong>the</strong> Sulfolobales <strong>CRISPR</strong>/Cas families: I - blue,<br />

II - purple, III - yellow, and IV - green. Family IV represents <strong>the</strong> <strong>CRISPR</strong>/Cas modules <strong>in</strong> Metallosphaera sedula and Acidianus brierleyi which were previously<br />

unclassified (Lillestøl et al., 2009).<br />

elements, whe<strong>the</strong>r RNA genomes, transcripts, or both,<br />

rema<strong>in</strong>s unclear (Hale et al., 2009). The cmr genes are<br />

apparently co-transcribed <strong>in</strong> a dist<strong>in</strong>ct cassette which is<br />

sometimes physically l<strong>in</strong>ked to <strong>the</strong> <strong>CRISPR</strong>/Cas module<br />

(Fig. 1). It occurs less widely than <strong>CRISPR</strong>/Cas modules, and<br />

is particularly prevalent <strong>in</strong> <strong>the</strong>rmophilic archaea and bacteria.<br />

Comparison of phylogenetic trees for <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />

Cmr modules, based on sequences of a Cas1 or Cas3 prote<strong>in</strong>s<br />

(<strong>the</strong> former is not present <strong>in</strong> all <strong>CRISPR</strong>/Cas modules) and<br />

a predicted polymerase, respectively, revealed two major<br />

branches for <strong>the</strong> Cmr modules, carry<strong>in</strong>g dist<strong>in</strong>ctive gene<br />

syntenies, but show<strong>in</strong>g little congruence with <strong>the</strong> Cas1/Cas3based<br />

tree (Makarova et al., 2006). This suggests that despite<br />

<strong>the</strong>ir be<strong>in</strong>g <strong>in</strong>terdependent mechanistically and sometimes<br />

physically coupled, <strong>the</strong> DNA- and RNA-directed <strong>system</strong>s<br />

have evolved <strong>in</strong>dependently. Both module types tend to be<br />

located <strong>in</strong> variable genomic regions and <strong>the</strong>ir positions, and<br />

copy numbers, vary even for <strong>the</strong> closely related Sulfolobus<br />

species (see below).<br />

3.3. Diversification and degeneration of <strong>CRISPR</strong>/Cas<br />

modules<br />

<strong>CRISPR</strong> loci vary considerably <strong>in</strong> size extend<strong>in</strong>g from<br />

a s<strong>in</strong>gle spacer bordered by repeats to a maximum, to date, of<br />

375 spacers (Lillestøl et al., 2006; Grissa et al., 2008). All such<br />

<strong>CRISPR</strong> loci that have been tested, <strong>in</strong>clud<strong>in</strong>g those lack<strong>in</strong>g<br />

leader regions, have been shown to produce transcripts which<br />

are processed (Tang et al., 2002, 2005; Brouns et al., 2008;<br />

Carte et al., 2008; Lillestøl et al., 2006, 2009). There is<br />

evidence from studies of both archaea and bacteria that<br />

<strong>CRISPR</strong> loci commonly undergo deletions without impair<strong>in</strong>g<br />

overall <strong>CRISPR</strong>/Cas functionality, and that <strong>the</strong> deletions can<br />

range <strong>in</strong> size from s<strong>in</strong>gle to several repeat-spacer units,<br />

presumably result<strong>in</strong>g from recomb<strong>in</strong>ation at <strong>the</strong> identical<br />

direct repeats. There is a tendency to lose <strong>the</strong> central and<br />

downstream regions of <strong>the</strong> <strong>CRISPR</strong> loci far<strong>the</strong>st from <strong>the</strong><br />

leader region, where <strong>the</strong> earliest spacer <strong>in</strong>serts are located, and<br />

which are likely to be less important for <strong>the</strong> <strong>immune</strong> <strong>system</strong>,


on average, than <strong>the</strong> more recently <strong>in</strong>serted spacers (Lillestøl<br />

et al., 2006, 2009; Tyson and Banfield, 2007; Deveau et al.,<br />

2008; Horvath et al., 2008). However, <strong>in</strong> addition to <strong>the</strong><br />

spacer-repeat units added at <strong>the</strong> leader-repeat junction<br />

(Pourcel et al., 2005; Lillestøl et al., 2006, 2009), <strong>the</strong>re are<br />

a few putative examples of duplications of spacer-repeat units,<br />

or small groups <strong>the</strong>reof, occurr<strong>in</strong>g <strong>in</strong> mycobacteria and<br />

methanoarchaea (Van Embden et al., 2000; Lillestøl et al.,<br />

2006). Moreover, it has also been claimed, for two out of<br />

four derivatives of S. <strong>the</strong>rmophilus stra<strong>in</strong> SMQ-301, that<br />

a s<strong>in</strong>gle new spacer-repeat unit was <strong>in</strong>serted <strong>in</strong>ternally with<strong>in</strong><br />

<strong>the</strong> <strong>CRISPR</strong> locus at <strong>the</strong> exact position where seven spacerrepeat<br />

units had been deleted, suggest<strong>in</strong>g that <strong>the</strong> <strong>in</strong>sertiondeletion<br />

events had occurred concurrently (Deveau et al.,<br />

2008). A related phenomenon occurs <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci of<br />

S. solfataricus stra<strong>in</strong>s. Pairwise alignments of <strong>CRISPR</strong> locus<br />

A of stra<strong>in</strong>s P1, P2 and 98/2 <strong>in</strong> Fig. 3 show shared spacers<br />

(shaded), as well as different spacers adjo<strong>in</strong><strong>in</strong>g <strong>the</strong> leader<br />

region and considered to have been added after <strong>the</strong> stra<strong>in</strong>s<br />

diverged. Deletions are apparent when pairs of <strong>CRISPR</strong> locus<br />

A are compared, but <strong>the</strong>re is one site <strong>in</strong> <strong>the</strong> P1 locus where six<br />

spacer-repeat units (a) have been replaced by four (b) from <strong>the</strong><br />

<strong>CRISPR</strong> locus B, presumably <strong>in</strong> a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />

(Fig. 3).<br />

Earlier studies suggested that mobile elements or <strong>in</strong>tegrative<br />

elements rarely target <strong>CRISPR</strong>/Cas modules <strong>in</strong> ei<strong>the</strong>r<br />

archaea or bacteria (Van Embden et al., 2000; Haft et al.,<br />

2005; Lillestøl et al., 2006). Moreover, <strong>in</strong> <strong>the</strong> three closely<br />

related stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2, which are<br />

rich <strong>in</strong> active transposable elements and where extensive<br />

genomic shuffl<strong>in</strong>g has been observed (Brügger et al., 2004;<br />

Redder and Garrett, 2006), no IS <strong>in</strong>sertions were detected <strong>in</strong><br />

<strong>the</strong>ir extensive <strong>CRISPR</strong> loci (350e450 spacer-repeat units)<br />

(Fig. 3). Thus, although <strong>the</strong>y do occur occasionally <strong>in</strong>tergenically<br />

<strong>in</strong> <strong>the</strong> cas and cmr gene clusters, <strong>the</strong>re appears to be<br />

a strong selective pressure to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong><br />

loci <strong>in</strong> crenarchaea. Never<strong>the</strong>less, recent studies of environmental<br />

bacterial samples suggest that transpositions occur<br />

commonly <strong>in</strong> some <strong>system</strong>s. In a study of two biofilms<br />

carry<strong>in</strong>g acidophilic Leptospirillum group II bacteria, for one<br />

biofilm about 20% of <strong>the</strong> partially sequenced <strong>CRISPR</strong> loci<br />

carried IS elements (Tyson and Banfield, 2007) and <strong>in</strong> a recent<br />

study of many lactic acid bacterial stra<strong>in</strong>s, several <strong>CRISPR</strong><br />

loci and cas genes cassettes were found to be <strong>in</strong>terrupted by IS<br />

S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

elements, with some flank<strong>in</strong>g ei<strong>the</strong>r end of a <strong>CRISPR</strong> locus<br />

(Horvath et al., 2009). Therefore, IS elements are likely to<br />

generate changes <strong>in</strong> active <strong>CRISPR</strong> loci possibly particularly<br />

<strong>in</strong> biofilms, or environments with low virus and/or plasmid<br />

levels.<br />

Thus, IS elements, or o<strong>the</strong>r transposable elements, may<br />

<strong>in</strong>duce shorten<strong>in</strong>g and/or degeneration of <strong>CRISPR</strong> loci by<br />

<strong>in</strong>sert<strong>in</strong>g <strong>in</strong>to <strong>CRISPR</strong> loci and caus<strong>in</strong>g transposition of<br />

spacer-repeat clusters to o<strong>the</strong>r chromosomal sites. Many<br />

chromosomes, with or without <strong>CRISPR</strong>/Cas modules, carry<br />

short <strong>CRISPR</strong>-like clusters lack<strong>in</strong>g associated leader regions<br />

and cas genes (Grissa et al., 2008). Their repeats are often<br />

phylogenetically divergent from <strong>the</strong> <strong>CRISPR</strong> loci <strong>in</strong> a given<br />

genome, or <strong>in</strong> closely related genomes. Although <strong>the</strong>re is no<br />

consensus view as to <strong>the</strong>ir orig<strong>in</strong>(s) or function(s), if <strong>the</strong>y are<br />

preceded by promoters, <strong>the</strong>ir transcripts can, <strong>in</strong> pr<strong>in</strong>ciple, be<br />

processed and activated by Cas and/or Cmr prote<strong>in</strong>s, if<br />

present. For example, Sulfolobus conjugative plasmids<br />

carry<strong>in</strong>g <strong>CRISPR</strong>-like loci lack cas genes and leader<br />

sequences (She et al., 1998; Greve et al., 2004) but for at least<br />

one of <strong>the</strong>m, pKEF9, <strong>the</strong> repeat cluster is transcribed and <strong>the</strong><br />

RNA is processed, which <strong>in</strong>dicates that <strong>the</strong> active crRNAs can<br />

be produced <strong>in</strong>tracellularly if a complementary set of cas<br />

genes (or cmr genes) is present <strong>in</strong> <strong>the</strong> host (Lillestøl et al.,<br />

2009). S<strong>in</strong>ce three of <strong>the</strong> six spacers <strong>in</strong> <strong>the</strong> pKEF9 repeat<br />

cluster have good sequence matches to archaeal fuselloviruses<br />

(2) and a rudivirus (1), it was proposed that <strong>the</strong> genetic<br />

elements may also exploit <strong>the</strong> host’s <strong>CRISPR</strong>/Cas (or<br />

<strong>CRISPR</strong>/Cmr) <strong>immune</strong> <strong>system</strong>, to compete with co-<strong>in</strong>vad<strong>in</strong>g<br />

foreign elements (Lillestøl et al., 2009). This hypo<strong>the</strong>sis is<br />

consistent with <strong>the</strong> demonstration that <strong>in</strong>fection of an Acidianus<br />

stra<strong>in</strong>, carry<strong>in</strong>g <strong>the</strong> conjugative plasmid pAH1 (lack<strong>in</strong>g<br />

a <strong>CRISPR</strong> locus), with <strong>the</strong> lipothrixvirus AFV1, led to <strong>in</strong>hibition<br />

of plasmid replication (Basta et al., 2009).<br />

3.4. Mobilisation and loss of <strong>CRISPR</strong>/Cas modules<br />

Genome analyses of closely related members of <strong>the</strong> Sulfolobales<br />

revealed <strong>CRISPR</strong>/Cas modules at different positions<br />

<strong>in</strong> genomes which show high levels of gene synteny, rais<strong>in</strong>g<br />

<strong>the</strong> question as to whe<strong>the</strong>r <strong>the</strong>y have moved with<strong>in</strong> <strong>the</strong><br />

genome, or been lost and/or ga<strong>in</strong>ed. There are also differences<br />

<strong>in</strong> <strong>the</strong> contents of <strong>the</strong> <strong>CRISPR</strong>/Cas module families <strong>in</strong> <strong>the</strong><br />

sequenced genomes. For example, S. solfataricus carries<br />

Fig. 3. Pairwise comparison of repeat-spacer units of <strong>CRISPR</strong> locus A of three stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2. Shaded spacer-repeat units which are<br />

l<strong>in</strong>ked are identical <strong>in</strong> sequence between pairs of <strong>CRISPR</strong> loci. Six spacer-repeat units, <strong>in</strong>dicated by a, are deleted from stra<strong>in</strong> P1, while four spacer-repeat units,<br />

denoted b, have apparently been acquired from <strong>CRISPR</strong> locus B (Lillestøl et al., 2009). Leader regions are <strong>in</strong>dicated by L.<br />

31


32 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

family I and II modules while Sulfolobus acidocaldarius<br />

carries modules of family II and III (Lillestøl et al., 2009).<br />

To determ<strong>in</strong>e whe<strong>the</strong>r <strong>CRISPR</strong>/Cas modules are readily<br />

mobilised, we <strong>in</strong>vestigated <strong>the</strong> presence or absence of<br />

<strong>CRISPR</strong>/Cas modules <strong>in</strong> genomes of pairs of closely related<br />

Sulfolobus, show<strong>in</strong>g >99% DNA sequence identity, respectively<br />

(Fig. 4). The Sulfolobus stra<strong>in</strong>s (She et al., 2001a; Reno<br />

et al., 2009) exhibit differences <strong>in</strong> <strong>the</strong> numbers of modules, for<br />

example, for <strong>the</strong> pair of closely related S. islandicus stra<strong>in</strong>s<br />

HVE10/4 and REY15A; <strong>the</strong> former carries two <strong>CRISPR</strong>/Cas<br />

modules and one Cmr module, whereas <strong>the</strong> latter has one<br />

<strong>CRISPR</strong>/Cas module and two Cmr modules. <strong>CRISPR</strong> loci of<br />

<strong>the</strong> five pairs of S. islandicus stra<strong>in</strong>s, <strong>in</strong> contrast to those of <strong>the</strong><br />

S. solfataricus stra<strong>in</strong>s (Fig. 3), share no common spacers.<br />

However each stra<strong>in</strong> carries one paired family I <strong>CRISPR</strong>/Cas<br />

module so that it was possible to test whe<strong>the</strong>r <strong>the</strong> module had<br />

persisted s<strong>in</strong>ce <strong>the</strong> 10 stra<strong>in</strong>s diverged. The genomic position<br />

of <strong>the</strong> module was compared between each stra<strong>in</strong> pair and was<br />

shown not to be conserved <strong>in</strong> position for 4 out of 5 pairs<br />

(Fig. 4). However, <strong>the</strong> displacements, even for <strong>the</strong> most closely<br />

related stra<strong>in</strong>s, could be attributed to <strong>the</strong> genomic region<br />

carry<strong>in</strong>g <strong>the</strong> module be<strong>in</strong>g variable and hav<strong>in</strong>g undergone<br />

complex rearrangements, ra<strong>the</strong>r than <strong>the</strong> module itself hav<strong>in</strong>g<br />

been mobilised. At present, <strong>the</strong>re are <strong>in</strong>sufficient closely<br />

related genomes available, for archaea and bacteria, which<br />

carry <strong>CRISPR</strong>/Cas modules to test for <strong>the</strong> generality of <strong>the</strong>se<br />

A B C<br />

D E<br />

results, although <strong>in</strong> an earlier study of two Thermatoga<br />

genomes, <strong>CRISPR</strong> loci were found to be located close to<br />

variable sites where chromosomal <strong>in</strong>versions had occurred<br />

(DeBoy et al., 2006).<br />

There are examples of <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g lost from<br />

genomes. For example, a variant stra<strong>in</strong> of S. solfataricus P2<br />

(P2A) was characterised that had lost four of <strong>the</strong> six <strong>CRISPR</strong>/<br />

Cas modules (A, B, C and D) which were physically l<strong>in</strong>ked, <strong>in</strong><br />

total 124 kb, apparently via a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />

between two border<strong>in</strong>g IS elements (Redder and Garrett, 2006),<br />

and S. solfataricus 98/2 lacks two whole clusters (C and F)<br />

(Lillestøl et al., 2009). Border<strong>in</strong>g IS elements also have <strong>the</strong><br />

potential to generate transposons carry<strong>in</strong>g whole <strong>CRISPR</strong>/Cas<br />

or Cmr modules and, rarely, paired family II <strong>CRISPR</strong>/Cas<br />

modules are bordered by identical <strong>in</strong>verted leaders (e.g.,<br />

<strong>CRISPR</strong> loci A and B of S. solfataricus) which could recomb<strong>in</strong>e,<br />

lead<strong>in</strong>g to loss of <strong>the</strong> whole module. Examples of closely related<br />

stra<strong>in</strong>s apparently los<strong>in</strong>g <strong>CRISPR</strong>/Cas modules have also been<br />

reported for some bacteria (e.g., Godde and Bickerton, 2006;<br />

Horvath et al., 2008).<br />

For S. solfataricus P2A, loss of <strong>CRISPR</strong>/Cas modules was<br />

attributed to its be<strong>in</strong>g a laboratory stra<strong>in</strong> where <strong>the</strong> <strong>CRISPR</strong>/<br />

Cas <strong>immune</strong> <strong>system</strong> had become an unnecessary burden on <strong>the</strong><br />

cell’s energy resources <strong>in</strong> <strong>the</strong> absence of <strong>in</strong>vad<strong>in</strong>g genetic<br />

elements. Possibly <strong>in</strong> niches relatively poor <strong>in</strong> viruses and<br />

plasmids, <strong>in</strong>clud<strong>in</strong>g numerous bacterial endosymbionts, <strong>the</strong>re<br />

Fig. 4. Dot-plots show<strong>in</strong>g <strong>the</strong> degree of variability <strong>in</strong> gene syteny at <strong>the</strong> genomic sites of <strong>the</strong> <strong>CRISPR</strong> loci for closely related pairs of Sulfolobus stra<strong>in</strong>s (AeE). At<br />

<strong>the</strong> top and right sides of each plot, I, II, III <strong>in</strong>dicates <strong>the</strong> position of a <strong>CRISPR</strong>/Cas family and C denotes a Cmr module. For <strong>the</strong> Sulfolobus stra<strong>in</strong>s, <strong>CRISPR</strong>/Cas<br />

modules and Cmr modules are <strong>in</strong>variably located with<strong>in</strong> an approximately 0.75 Mb variable region conta<strong>in</strong><strong>in</strong>g many IS elements. In general, <strong>the</strong> gene synteny<br />

border<strong>in</strong>g <strong>the</strong> modules, and <strong>the</strong> genomic locations of <strong>the</strong> modules, have changed, possibly due to transpositional activity.


is a tendency to offload <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>. For example,<br />

many human/animal pathogens <strong>in</strong>clud<strong>in</strong>g Borrelia, Brucella,<br />

Buchnera, Burkholderia, Chlamydia and Rikketsia lack<br />

<strong>CRISPR</strong> loci while o<strong>the</strong>rs, <strong>in</strong>clud<strong>in</strong>g Pseudomonas stra<strong>in</strong>s and<br />

Staphylococcus aureus, ei<strong>the</strong>r lack <strong>CRISPR</strong> loci or carry<br />

apparently degenerate copies. This may partly expla<strong>in</strong> why<br />

about 60% of bacteria lack <strong>the</strong> <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr<br />

<strong>immune</strong> <strong>system</strong>s (Grissa et al., 2008; Mojica et al., 2009).<br />

3.5. Transfer of <strong>CRISPR</strong>/Cas modules between<br />

organisms<br />

Various l<strong>in</strong>es of evidence suggest that <strong>CRISPR</strong> loci have<br />

been transferred between organisms. For example, <strong>the</strong> variety<br />

and comb<strong>in</strong>ations of different families of <strong>CRISPR</strong>/Cas modules<br />

that occur <strong>in</strong> closely related crenarchaeal genomes, with<br />

a similar pattern for <strong>the</strong> lactic acid bacterial genomes (Horvath<br />

et al., 2009; Lillestøl et al., 2009; Shah et al., 2009). This<br />

underl<strong>in</strong>es that exchange does occur between closely related<br />

organisms. O<strong>the</strong>r evidence derives from an analysis of <strong>the</strong><br />

euryarchaeon Pyrococcus furiosus, where a 155 kb fragment<br />

bordered by <strong>CRISPR</strong> locus and a repeat shows significantly<br />

different properties of G þ C content, third codon position and<br />

codon usage from <strong>the</strong> rest of <strong>the</strong> genome (Portillo and<br />

Gonzalez, 2009). Similarly, <strong>the</strong> lactic acid bacterium Bifidobacterium<br />

adolescentis was shown to carry a cas gene cassette<br />

with a much lower G þ C content (47%) than <strong>the</strong> average<br />

chromosomal G þ C content (59.2%) (Horvath et al., 2009).<br />

In order to exam<strong>in</strong>e <strong>the</strong> degree to which <strong>CRISPR</strong>/Cas<br />

modules are subject to structural changes, we exam<strong>in</strong>ed paired<br />

family I <strong>CRISPR</strong>/Cas modules <strong>in</strong> several closely related Sulfolobus<br />

stra<strong>in</strong>s. Phylogenetic trees were constructed for <strong>the</strong><br />

external and <strong>in</strong>ternal cas gene cassettes and <strong>the</strong> leader region<br />

(Fig. 1E) and <strong>the</strong>y were compared with a tree of <strong>the</strong> core<br />

genomes (Fig. 5A). The tree of <strong>the</strong> external cas gene cassette<br />

is similar to <strong>the</strong> core genome tree, suggest<strong>in</strong>g that <strong>the</strong>se genes<br />

were reta<strong>in</strong>ed <strong>in</strong> <strong>the</strong> genome after divergence of <strong>the</strong> stra<strong>in</strong>s<br />

(Fig. 5C). However, <strong>the</strong> trees for <strong>the</strong> <strong>in</strong>ternal cas gene cassette<br />

located between <strong>the</strong> two leaders (Fig. 5D) and <strong>the</strong> leader<br />

regions (Fig. 5B), match one ano<strong>the</strong>r fairly closely, and <strong>the</strong>y<br />

also match a tree derived from <strong>the</strong> repeat sequences (Lillestøl<br />

et al., 2009). This <strong>in</strong>dicates that <strong>the</strong> external cas genes, putatively<br />

<strong>in</strong>volved <strong>in</strong> RNA process<strong>in</strong>g and crRNA mobility, have<br />

been reta<strong>in</strong>ed with<strong>in</strong> <strong>the</strong> stra<strong>in</strong>s, whereas <strong>the</strong> <strong>in</strong>ternal cas gene<br />

cassettes, which are functionally implicated <strong>in</strong> spacer addition<br />

at <strong>the</strong> leader-repeat junction, seem to co-evolve, and be<br />

mobilised with, <strong>the</strong> <strong>CRISPR</strong> loci.<br />

Mechanisms of transfer of <strong>CRISPR</strong>/Cas modules are less<br />

clear and may be diverse. <strong>CRISPR</strong>/Cas loci can vary <strong>in</strong> size<br />

from about 7 kb for a cas gene cassette, a leader region and<br />

a small <strong>CRISPR</strong> locus, to 25 kb or more for <strong>the</strong> paired family I<br />

crenarchaeal <strong>CRISPR</strong>/Cas modules. Indirect evidence for <strong>the</strong><br />

transfer of <strong>CRISPR</strong>/Cas modules on conjugative plasmids<br />

arose from <strong>the</strong> observation that a few bacterial conjugative<br />

plasmids from Thermus <strong>the</strong>rmophilus, Synechocystis and<br />

Shewanella, carried <strong>CRISPR</strong> loci, sometimes associated with<br />

afewcas genes (Godde and Bickerton, 2006) and, moreover,<br />

S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

small <strong>CRISPR</strong> loci have been detected <strong>in</strong> two crenarchaeal<br />

conjugative plasmids (She et al., 1998; Peng et al., 2003;<br />

Greve et al., 2004). Although, for <strong>the</strong> latter, no physical<br />

proximity of <strong>in</strong>tegrated conjugative plasmids and <strong>CRISPR</strong> loci<br />

occurs with<strong>in</strong> Sulfolobus chromosomes (Chen et al., 2005;<br />

Kawarabayashi et al., 2001). To date, <strong>CRISPR</strong> loci have not<br />

been detected <strong>in</strong> viral genomes, although <strong>the</strong>y do occur with<strong>in</strong><br />

prophages of <strong>the</strong> human pathogen Clostridium difficile<br />

(Sebaihia et al., 2006).<br />

At least for <strong>the</strong> paired crenarchaeal <strong>CRISPR</strong>/Cas modules,<br />

<strong>the</strong>y were considered to be too large to be borne on extrachromosomal<br />

elements (Lillestøl et al., 2009). Ano<strong>the</strong>r more<br />

likely mechanism for transferr<strong>in</strong>g such large <strong>CRISPR</strong>/Cas<br />

modules between closely related organisms is via chromosomal<br />

conjugation. The archaea-specific <strong>in</strong>tegration mechanism,<br />

generat<strong>in</strong>g a partitioned <strong>in</strong>tegrase gene, provides<br />

a mechanism favour<strong>in</strong>g encaptur<strong>in</strong>g genetic elements <strong>in</strong><br />

chromosomes (Muskhelishvili et al., 1993; She et al., 2001b)<br />

and some Sulfolobus species that carry encaptured <strong>in</strong>tegrated<br />

conjugative plasmids are also capable of conjugat<strong>in</strong>g <strong>the</strong>ir<br />

chromosomal DNA (Aagaard et al., 1995; Grogan, 1996).<br />

Possibly unknown transmission mechanisms may operate, for<br />

example, with<strong>in</strong> biofilms.<br />

3.6. Co-evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> <strong>in</strong> <strong>the</strong><br />

archaeal and bacterial doma<strong>in</strong>s<br />

Ever s<strong>in</strong>ce <strong>the</strong> earliest studies on <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>,<br />

<strong>the</strong> prevail<strong>in</strong>g view has been that <strong>the</strong> archaeal and bacterial<br />

<strong>system</strong>s are closely related. This view was underp<strong>in</strong>ned by <strong>the</strong><br />

similar order<strong>in</strong>g of repeat-spacer units <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci and<br />

by extensive comparative sequence studies of selected Cas<br />

prote<strong>in</strong>s (Haft et al., 2005; Godde and Bickerton, 2006;<br />

Makarova et al., 2006). Moreover, it has been fur<strong>the</strong>r re<strong>in</strong>forced<br />

by <strong>the</strong> mechanism of elongation of <strong>CRISPR</strong> loci at <strong>the</strong><br />

leader-repeat junction as well as by process<strong>in</strong>g and maturation<br />

mechanisms of crRNAs <strong>in</strong> both doma<strong>in</strong>s (Tang et al., 2002,<br />

2005; Brouns et al., 2008; Hale et al., 2008, 2009).<br />

Never<strong>the</strong>less, <strong>the</strong>re are dist<strong>in</strong>ctive features of <strong>the</strong> two<br />

<strong>system</strong>s. <strong>CRISPR</strong> loci are much more common amongst<br />

archaea and tend to be larger, more complex and more labile<br />

(Lillestøl et al., 2006; Grissa et al., 2008). In addition, most<br />

repeat sequences of bacterial <strong>CRISPR</strong> loci carry <strong>in</strong>verted<br />

repeat motifs which can generate hairp<strong>in</strong> structures <strong>in</strong> transcripts;<br />

<strong>the</strong>se are less common amongst archaeal repeats<br />

which, <strong>in</strong> turn, suggests that different RNA process<strong>in</strong>g signals<br />

occur with<strong>in</strong> repeat regions of <strong>the</strong> transcripts (Lillestøl et al.,<br />

2006; Kun<strong>in</strong> et al., 2007). Moreover, phylogenetic relationships<br />

based on Smi<strong>the</strong>Waterman alignments show that most<br />

families of archaeal and bacterial repeat sequences exhibit<br />

m<strong>in</strong>imal overlap (Kun<strong>in</strong> et al., 2007). A similar pattern arises<br />

from sequence alignments of Cas prote<strong>in</strong>s where phylogenetic<br />

trees of Cas prote<strong>in</strong>s show many archaeal genes cluster<strong>in</strong>g <strong>in</strong><br />

separate groups (Fig. 2A) (Haft et al., 2005; Godde and<br />

Bickerton, 2006; Makarova et al., 2006). In addition, <strong>the</strong><br />

average synteny of <strong>the</strong> cas and cmr genes is quite conserved<br />

with<strong>in</strong>, but not between, major phyla (Haft et al., 2005). There<br />

33


34 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

Fig. 5. Phylogenetic trees of S. solfataricus and S. islandicus stra<strong>in</strong>s based on: (A) nucleotide sequence alignments of core genomes of <strong>the</strong> host organisms,<br />

(B) leader sequences, (C) concatenated external cas genes, and (D) concatenated <strong>in</strong>ternal cas genes for paired family I <strong>CRISPR</strong> loci. Only bootstrap values less<br />

than 100% are given. All 12 stra<strong>in</strong>s were too closely related to be dist<strong>in</strong>guished on <strong>the</strong> basis of 16S rDNA sequences. For S. islandicus stra<strong>in</strong>s LS, LD, YG, YN and<br />

M14 (Reno et al., 2009), it is evident that s<strong>in</strong>ce <strong>the</strong>y diverged from <strong>the</strong>ir closest relative, identical changes have occurred <strong>in</strong> both copies of <strong>the</strong> leader, possibly due<br />

to <strong>the</strong> whole <strong>CRISPR</strong>/Cas module, or a part of it, hav<strong>in</strong>g been replaced. In contrast, <strong>the</strong> external cas cassette appears to have resided on <strong>the</strong> genome s<strong>in</strong>ce all stra<strong>in</strong>s<br />

diverged because of <strong>the</strong> similarity of trees A and C. The <strong>in</strong>ternal cas cassette tree (D) follows that of <strong>the</strong> leader consistent with tight functional coupl<strong>in</strong>g.<br />

is also a <strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, of elusive function<br />

that has only been detected amongst <strong>the</strong> crenarchaea (Peng<br />

et al., 2003). O<strong>the</strong>r mechanistic differences may surface as<br />

<strong>the</strong> <strong>system</strong>s are studied more widely and <strong>in</strong> more depth.<br />

Importantly, however, crenarchaeal viruses have radically<br />

different virusehost relationships from those of bacteria and<br />

eukarya (Prangishvili et al., 2006; Bize et al., 2009). Consistent<br />

with this, <strong>the</strong>re are putative archaeal virus-specific anti-<br />

<strong>CRISPR</strong> <strong>system</strong>s (Peng et al., 2004; Vestergaard et al., 2008;<br />

Garrett et al., 2010) and bacteria-specific <strong>CRISPR</strong> regulat<strong>in</strong>g<br />

<strong>system</strong>s (Pühl et al., 2010). Therefore, it is likely that <strong>the</strong><br />

<strong>CRISPR</strong>/Cas and Cmr <strong>system</strong>s have ma<strong>in</strong>ta<strong>in</strong>ed and/or<br />

undergone doma<strong>in</strong>-specific adaptations dur<strong>in</strong>g evolution.<br />

Assum<strong>in</strong>g that a <strong>CRISPR</strong>/Cas-like <strong>system</strong> evolved prior to<br />

<strong>the</strong> separation of <strong>the</strong> archaeal and bacterial l<strong>in</strong>eages, at a time<br />

when one assumes <strong>the</strong> activity of exchange of genetic<br />

elements was rife, we are left with two ma<strong>in</strong> scenarios for <strong>the</strong>ir<br />

subsequent development: (1) that <strong>the</strong> <strong>system</strong>s have rema<strong>in</strong>ed<br />

relatively conserved, and separated, and have gradually<br />

developed specific archaeal, or bacterial, characteristics; or<br />

(2) <strong>the</strong>re has been periodic <strong>in</strong>terdoma<strong>in</strong> exchange, and <strong>the</strong>reby<br />

co-evolution of <strong>the</strong> archaeal and bacterial <strong>system</strong>s.<br />

Clearly, cross<strong>in</strong>g doma<strong>in</strong> boundaries would be a very<br />

complex process given <strong>the</strong> basic differences between archaea<br />

and bacteria <strong>in</strong> <strong>the</strong>ir transcriptional <strong>in</strong>itiation, elongation and<br />

term<strong>in</strong>ation mechanisms, and <strong>the</strong>ir translational <strong>in</strong>itiation<br />

mechanisms (Torar<strong>in</strong>sson et al., 2005; Santangelo et al., 2009)<br />

and would be very unlikely to occur for modern cells. Transfer<br />

by conjugation would also be unlikely given <strong>the</strong> differ<strong>in</strong>g<br />

conjugative <strong>system</strong>s and <strong>the</strong> different membrane and cell wall


structures of archaea and bacteria (Greve et al., 2004; Veith<br />

et al., 2009). For <strong>the</strong> <strong>CRISPR</strong>/Cas and Cmr modules, <strong>in</strong>terdoma<strong>in</strong><br />

transfer would seriously compromise both expression<br />

of <strong>the</strong> numerous essential cas and cmr genes as well as transcription<br />

of <strong>the</strong> <strong>CRISPR</strong> loci. In this context, <strong>the</strong> <strong>in</strong>fluential<br />

claim of <strong>the</strong> large uptake of function<strong>in</strong>g archaeal genes <strong>in</strong> <strong>the</strong><br />

genome of <strong>the</strong> bacterium Thermotoga maritima (24% of <strong>the</strong><br />

total <strong>in</strong>clud<strong>in</strong>g a <strong>CRISPR</strong> locus) (Nelson et al., 1999), was<br />

always highly controversial, not least because it would have<br />

required <strong>the</strong> wholesale reprogramm<strong>in</strong>g of a large part of <strong>the</strong><br />

chimeric genome for transcriptional and translational signals.<br />

A recent reevaluation of this genome, toge<strong>the</strong>r with those of<br />

four o<strong>the</strong>r members of <strong>the</strong> Thermatogales, has provided<br />

a much more nuanced and cautious view of <strong>the</strong> phylogenetic<br />

orig<strong>in</strong>s of <strong>the</strong>se bacteria (Zhaxybayeva et al., 2009), <strong>the</strong>reby<br />

underl<strong>in</strong><strong>in</strong>g <strong>the</strong> perils of <strong>in</strong>ferr<strong>in</strong>g phylogeny from BLAST<br />

sequence searches. On <strong>the</strong> o<strong>the</strong>r hand, co-evolution of <strong>the</strong><br />

archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s would only<br />

require cross-doma<strong>in</strong> events to succeed very rarely, after<br />

which <strong>the</strong> transferred <strong>system</strong> could be under strong selective<br />

pressure. Some limited <strong>in</strong>terdoma<strong>in</strong> transfer would be<br />

consistent with <strong>the</strong> phylogenetic trees produced for Cas1 or<br />

Cas3 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/Cas modules and Cmr2 of <strong>the</strong><br />

Cmr module (Haft et al., 2005; Godde and Bickerton, 2006;<br />

Makarova et al., 2006). <strong>Archaea</strong>-specific Cas prote<strong>in</strong>s (Haft<br />

et al., 2005) may be associated with <strong>CRISPR</strong>/Cas or Cmr<br />

<strong>system</strong>s that have evolved more <strong>in</strong>dependently <strong>in</strong> environments<br />

of high temperature, extremes of pH or hypersal<strong>in</strong>e<br />

conditions where bacterial levels tend to be relatively low, and<br />

gradually become functionally <strong>in</strong>compatible with those liv<strong>in</strong>g<br />

<strong>in</strong> less extreme, bacteria-rich environments where limited<br />

genetic exchange between archaea and bacteria is more likely.<br />

3.7. A common ancestry with <strong>the</strong> diverse eukaryal siRNA<br />

<strong>system</strong>s?<br />

Diverse small <strong>in</strong>terference RNA <strong>system</strong>s (siRNA) are<br />

widespread <strong>in</strong> eukarya. Thus, <strong>in</strong> plants, small RNAs are<br />

important for antiviral defence and regulation of transposons<br />

and similar functions are common amongst <strong>in</strong>vertebrates<br />

(Hannon, 2002; J<strong>in</strong>ek and Doudna, 2009). Moreover, <strong>the</strong>y<br />

have been implicated <strong>in</strong> controll<strong>in</strong>g repeat and transposon<br />

contents of somatic nuclei <strong>in</strong> protozoa (Mochizuki and<br />

Gorovsky, 2004). Although some mechanisms are conf<strong>in</strong>ed<br />

to certa<strong>in</strong> eukaryal l<strong>in</strong>eages, <strong>the</strong>y all essentially provide<br />

a mechanism for discrim<strong>in</strong>at<strong>in</strong>g and target<strong>in</strong>g “foreign”<br />

genetic elements or transposons. Moreover, <strong>the</strong>re are broad<br />

mechanistic similarities between <strong>the</strong> eukaryal siRNA <strong>system</strong>s<br />

and <strong>the</strong> DNA- and RNA-target<strong>in</strong>g <strong>CRISPR</strong> <strong>system</strong>s. They all<br />

have to discrim<strong>in</strong>ate foreign DNA from self-DNA, and target<br />

nucleic acids which both show little sequence similarity and<br />

can undergo cont<strong>in</strong>ual sequence change.<br />

There is a limited parallel between <strong>the</strong> <strong>CRISPR</strong>/Cmr RNAtarget<strong>in</strong>g<br />

and eukaryal antiviral <strong>system</strong>s. The latter cut and<br />

process <strong>in</strong>vad<strong>in</strong>g dsRNAviruses <strong>in</strong>to small 21e22 bp dsRNAs by<br />

an endonuclease (Dicer), and <strong>the</strong>se are converted <strong>in</strong>to ssRNAs<br />

by <strong>the</strong> Argonaute prote<strong>in</strong>eRISC complex. The prote<strong>in</strong>eRNA<br />

S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

complex locates and anneals to a viral mRNA carry<strong>in</strong>g<br />

a complementary sequence which is <strong>the</strong>n <strong>in</strong>activated by ano<strong>the</strong>r<br />

endonuclease (Slicer). However, <strong>the</strong> <strong>in</strong>itial process<strong>in</strong>g step<br />

<strong>in</strong>volv<strong>in</strong>g <strong>the</strong> Dicer endonuclease seems to be quite different <strong>in</strong><br />

<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>.<br />

The closest parallel to <strong>the</strong> crRNAs and <strong>CRISPR</strong> loci<br />

amongst <strong>the</strong> eukaryal siRNA <strong>system</strong>s are <strong>the</strong> Argonaute Piwi<strong>in</strong>teract<strong>in</strong>g<br />

RNAs (piRNAs) processed from piRNA cluster<br />

transcripts which also do not require a Dicer-like endonuclease<br />

(Lillestøl et al., 2009; Karg<strong>in</strong>ov and Hannon, 2010). This<br />

eukaryal <strong>system</strong> has been studied primarily <strong>in</strong> <strong>in</strong>sects, fish and<br />

mammals and strong evidence has been provided for its<br />

<strong>in</strong>volvement <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g germl<strong>in</strong>e <strong>in</strong>tegrity and development<br />

(Arav<strong>in</strong> et al., 2008; Klattenhoff and Theurkauf, 2008).<br />

The piRNA clusters are rich <strong>in</strong> transposons and repeatsequence<br />

elements and occur at specific chromosomal sites, as<br />

for <strong>the</strong> <strong>CRISPR</strong> loci. The piRNA clusters <strong>in</strong>crease <strong>the</strong>ir<br />

<strong>in</strong>formational capacity by <strong>the</strong> <strong>in</strong>sertion of transposon<br />

sequences which provide novel sequence content and become<br />

fixed <strong>in</strong> <strong>the</strong> piRNA clusters by selection. Thus, cont<strong>in</strong>ual<br />

expansion of piRNA clusters occurs, as for <strong>CRISPR</strong> loci, but<br />

<strong>the</strong> process is passive ra<strong>the</strong>r than directed. Moreover, as for <strong>the</strong><br />

<strong>CRISPR</strong>/Cas <strong>system</strong>, <strong>the</strong> newly <strong>in</strong>corporated DNA derives<br />

exclusively from genetic elements that are to be targeted. Both<br />

piRNA clusters and <strong>CRISPR</strong> loci yield large transcripts prior<br />

to process<strong>in</strong>g <strong>in</strong>to smaller RNAs. The processed piRNAs are<br />

24e30 nt <strong>in</strong> length while <strong>the</strong> crRNAs lie <strong>in</strong> <strong>the</strong> range 39e45<br />

nt. piRNAs complex with <strong>the</strong> Argonaute Piwi/RISC prote<strong>in</strong><br />

complex, similarly to crRNAs assembl<strong>in</strong>g <strong>in</strong> Cas or Cmr<br />

prote<strong>in</strong> complexes, and <strong>the</strong>y target and control mobile<br />

endogenous genetic elements primarily <strong>in</strong> germ cells. To date,<br />

piRNA complexes have been exclusively associated with target<strong>in</strong>g<br />

RNAs but this may reflect <strong>the</strong> fact that retrotransposons<br />

predom<strong>in</strong>ate <strong>in</strong> those germ cells under study.<br />

No homologous prote<strong>in</strong>s have been detected from sequence<br />

analyses between prote<strong>in</strong>s of <strong>the</strong> eukaryl siRNA <strong>system</strong>s and<br />

those of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>, although similarities may appear<br />

at a tertiary structural level. Moreover, despite Argonaute<br />

Piwi-like doma<strong>in</strong> prote<strong>in</strong>s occurr<strong>in</strong>g <strong>in</strong> many archaea and<br />

bacteria (Cerutti et al., 2000), <strong>the</strong>y have not been implicated <strong>in</strong><br />

crRNA-target<strong>in</strong>g. There is also very limited evidence for<br />

a functional target<strong>in</strong>g overlap between <strong>the</strong> two <strong>system</strong>s. A few<br />

sequence matches have been observed between archaeal and<br />

bacterial <strong>CRISPR</strong> spacers and transposons, consistent with <strong>the</strong><br />

<strong>CRISPR</strong>/Cas <strong>system</strong> target<strong>in</strong>g mobile elements (Lillestøl,<br />

et al., 2006; Held and Whitaker, 2009; Mojica et al., 2009;<br />

Shah et al., 2009). However, those reported have generally<br />

been carried on virus or plasmid genomes <strong>in</strong>clud<strong>in</strong>g, for<br />

example, spacer matches to each of <strong>the</strong> four transposase genes<br />

carried by <strong>the</strong> bicaudavirus ATV (Shah et al., 2009), but <strong>the</strong>se<br />

transposase genes/IS elements are presumably <strong>in</strong>dist<strong>in</strong>guishable<br />

from any o<strong>the</strong>r viral/plasmid genomic target if <strong>the</strong>y carry<br />

appropriate sequence motifs adjacent to protospacer sites.<br />

Moreover, <strong>in</strong> <strong>the</strong> archaeon S. solfataricus P2, which carries<br />

about 350 putative mobile elements (Brügger et al., 2002),<br />

<strong>the</strong>re is evidence that chromosomal transpositional activity is<br />

regulated, at least partly, by antisense RNAs (Tang et al.,<br />

35


36 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

2005), and very few close sequence matches were found to any<br />

of <strong>the</strong> 417 <strong>CRISPR</strong> spacers (Lillestøl et al., 2006; Shah et al.,<br />

2009).<br />

F<strong>in</strong>ally, <strong>the</strong> piRNA <strong>system</strong>, like <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />

<strong>CRISPR</strong>/Cmr <strong>system</strong>s, may be very ancient. Evolution of<br />

genomic parasites occurred concurrently with <strong>the</strong> emergence<br />

of self replicat<strong>in</strong>g genomes. Thus, <strong>the</strong> development of adaptive<br />

and heritable <strong>system</strong>s would be important for ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />

fitness.<br />

4. Conclusion<br />

The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>eries<br />

provide an effective defence mechanism <strong>in</strong> most archaea and<br />

some bacteria. The <strong>system</strong> is dynamic and hereditable,<br />

although <strong>the</strong> benefit for <strong>the</strong> cell <strong>in</strong> evolutionary terms is<br />

transitional because DNA from extrachromosomal elements<br />

taken up as spacers <strong>in</strong> <strong>CRISPR</strong> loci has a rapid turnover and is<br />

lost aga<strong>in</strong> via recomb<strong>in</strong>ation at repeats and/or transpositional<br />

events. Current evidence suggests that <strong>CRISPR</strong>/Cas and Cmr<br />

modules behave like <strong>in</strong>tegral genetic elements. They tend to be<br />

located <strong>in</strong> <strong>the</strong> most variable regions of chromosomes and are<br />

frequently displaced as a result of genome shuffl<strong>in</strong>g, <strong>in</strong>clud<strong>in</strong>g<br />

possibly transposition of whole modules. <strong>CRISPR</strong> loci may be<br />

broken up and dispersed <strong>in</strong> chromosomes with <strong>the</strong> potential for<br />

creat<strong>in</strong>g genetic novelty. Small leaderless <strong>CRISPR</strong>-like loci<br />

are commonly found <strong>in</strong> chromosomes and <strong>in</strong> plasmids, and<br />

some can be transcribed, but it rema<strong>in</strong>s unclear whe<strong>the</strong>r <strong>the</strong>y<br />

derive from <strong>CRISPR</strong> loci or whe<strong>the</strong>r <strong>the</strong>y have o<strong>the</strong>r orig<strong>in</strong>s<br />

and/or o<strong>the</strong>r functions. The <strong>CRISPR</strong>/Cas and Cmr modules<br />

appear to exchange readily between closely related organisms<br />

where <strong>the</strong>y may be subjected to strong selective pressure. It is<br />

likely that this can occur via conjugative plasmids or chromosomal<br />

conjugation. While universal phylogenetic trees for<br />

Cas1/Cas3 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/Cas module and Cmr2 of<br />

<strong>the</strong> Cmr module suggest that <strong>in</strong>terdoma<strong>in</strong> transfers between<br />

archaea and bacteria have occurred, <strong>the</strong> relatively large<br />

number of archaea-specific Cas/Cmr prote<strong>in</strong>s suggests that<br />

<strong>the</strong>se may have been very rare events, consistent with <strong>the</strong><br />

<strong>in</strong>compatibility of <strong>the</strong> transcription, translation and conjugative<br />

<strong>system</strong>s.<br />

There are parallels to <strong>the</strong> eukaryal siRNAs, most notably<br />

for <strong>the</strong> germ cell piRNAs, which are also directed by effector<br />

prote<strong>in</strong>s to silence or destroy <strong>in</strong>vad<strong>in</strong>g foreign DNA and<br />

transposons. While some common effector prote<strong>in</strong>s are<br />

utilized <strong>in</strong> different eukaryal siRNA <strong>system</strong>s, no homologous<br />

prote<strong>in</strong>s are identifiable between <strong>the</strong> eukaryal siRNA prote<strong>in</strong>s<br />

and those of <strong>the</strong> archaeal and bacterial <strong>CRISPR</strong>/Cas and Cmr<br />

modules. Possibly very distant phylogenetic relationships will<br />

appear as more crystal structures of <strong>the</strong> siRNA and crRNA<br />

effector prote<strong>in</strong>s are determ<strong>in</strong>ed.<br />

Acknowledgements<br />

Research at <strong>the</strong> <strong>Archaea</strong> Centre was supported by grants<br />

from <strong>the</strong> Danish Natural Science Research Council and <strong>the</strong><br />

Danish Foundation for Basic Research. We appreciate helpful<br />

discussions with Qunx<strong>in</strong> She, Soley Gudbergsdottir, L<strong>in</strong>g<br />

Deng, Guo Li and Xu Peng.<br />

References<br />

Aagaard, C., Dalgaard, J., Garrett, R.A., 1995. Inter-cellular mobility and<br />

hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron confers selective advantage over<br />

<strong>in</strong>tron-cells of Sulfolobus acidocaldarius. Proc. Natl. Acad. Sci. U.S.A. 92,<br />

12285e12289.<br />

Arav<strong>in</strong>, A.A., Hannon, G.J., Brennecke, J., 2008. The Piwi-piRNA pathway<br />

provides an adaptive defense <strong>in</strong> <strong>the</strong> transposon arms race. Science 318,<br />

761e764.<br />

Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />

Mo<strong>in</strong>eau, S., Romero, D.A., Horvath, P., 2007. <strong>CRISPR</strong> provides acquired<br />

resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315, 1709e1712.<br />

Basta, T., Smyth, J., Forterre, P., Prangishvili, D., Peng, X., 2009. Novel<br />

archaeal plasmid pAH1 and its <strong>in</strong>teraction with <strong>the</strong> lipothrixvirus AFV1.<br />

Mol. Microbiol. 71, 23e34.<br />

Bize, A., Karlsson, E.A., Ekefjärd, K., Quax, T.E., P<strong>in</strong>a, M., Prevost, M.C.,<br />

Forterre, P., Tenaillon, O., Bernander, R., Prangishvili, D., 2009. A unique<br />

virus release mechanism <strong>in</strong> <strong>the</strong> <strong>Archaea</strong>. Proc. Natl. Acad. Sci. U.S.A. 106,<br />

11306e11311.<br />

Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C.,<br />

Hugenholtz, P., 2007. <strong>CRISPR</strong> Recognition Tool (CRT): a tool for automatic<br />

detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats.<br />

BMC Bio<strong>in</strong>form. 8, 209.<br />

Bolot<strong>in</strong>, A., Qu<strong>in</strong>quis, B., Sorok<strong>in</strong>, A., Ehrlich, S.D., 2005. Clustered regularly<br />

<strong>in</strong>terspaced short pal<strong>in</strong>drome repeats (<strong>CRISPR</strong>s) have spacers of extrachromosomal<br />

orig<strong>in</strong>. Microbiology 151, 2551e2561.<br />

Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R., Slijkhuis, R.J.,<br />

Snijders, A.P., Dickman, M.J., Makarova, K.S., Koon<strong>in</strong>, E.V., van der<br />

Oost, J., 2008. Small <strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes.<br />

Science 321, 960e964.<br />

Brügger, K., Redder, P., She, Q., Confalonieri, F., Zivanovic, Y., Garrett, R.A., 2002.<br />

Mobile elements <strong>in</strong> archaeal genomes. FEMS Microbiol. Lett. 206, 131e141.<br />

Brügger, K., Torar<strong>in</strong>sson, E., Chen, L., Garrett, R.A., 2004. Shuffl<strong>in</strong>g of<br />

Sulfolobus genomes by autonomous and non-autonomous mobile<br />

elements. Biochem. Soc. Trans. 32, 179e183.<br />

Carte, J., Wang, R., Li, H., Terns, R.M., Terns, M.P., 2008. Cas6 is an<br />

endoribonuclease that generates guide RNAs for <strong>in</strong>vader defense <strong>in</strong><br />

prokaryotes. Genes Dev. 22, 3489e3496.<br />

Cerutti, L., Mian, N., Bateman, A., 2000. Doma<strong>in</strong>s <strong>in</strong> gene silenc<strong>in</strong>g and cell<br />

differentiation prote<strong>in</strong>s: <strong>the</strong> novel PAZ doma<strong>in</strong> and redef<strong>in</strong>ition of <strong>the</strong> Piwi<br />

doma<strong>in</strong>. Trends Biochem. Sci. 25, 481e482.<br />

Chen, L., Brügger, K., Skovgaard, M., Redder, P., She, Q., Torar<strong>in</strong>sson, E.,<br />

Greve, B., Awayez, M., Zibat, A., Klenk, H.F., Garrett, R.A., 2005. The<br />

genome of Sulfolobus acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota.<br />

J. Bacteriol. 187, 4992e4999.<br />

DeBoy, R.T., Mongod<strong>in</strong>, E.F., Emerson, J.B., Nelson, K.E., 2006. Chromosome<br />

evolution <strong>in</strong> <strong>the</strong> Thermotogales: large-scale <strong>in</strong>versions and stra<strong>in</strong><br />

diversification of <strong>CRISPR</strong> sequences. J. Bacteriol. 188, 2364e2374.<br />

Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J., Fremaux, C.,<br />

Boyaval, P., Romero, D.A., Horvath, P., Mo<strong>in</strong>eau, S., 2008. Phage response<br />

to <strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus. J. Bacteriol.<br />

190, 1390e1400.<br />

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy<br />

and high throughput. Nucleic Acids Res. 32, 1792e1797.<br />

Enright, A.J., Van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm<br />

for large-scale detection of prote<strong>in</strong> families. Nucleic Acids Res. 30,<br />

1575e1584.<br />

Garrett, R.A., Prangishvili, D., Shah, S.A., Reuter, M., Stetter, K., Peng, X.,<br />

2010. Metagenomic analyses of novel viruses, plasmids, and <strong>the</strong>ir variants,<br />

from an environmental sample of hyper<strong>the</strong>rmophilic neutrophiles cultured <strong>in</strong><br />

a bioreactor. Environ. Microbiol., doi:10.1111/j.1462-2920.2010.02266.x.<br />

Godde, J.S., Bickerton, A., 2006. The repetitive DNA elements called<br />

<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />

among prokaryotes. J. Mol. Evol. 62, 718e729.


Goldovsky, L., Cases, I., Enright, A.J., Ouzounis, C.A., 2005. BioLayout<br />

(Java): versatile network visualisation of structural and functional relationships.<br />

Appl. Bio<strong>in</strong>form. 4, 71e74.<br />

Greve, B., Jensen, S., Brügger, K., Zillig, W., Garrett, R.A., 2004. Genomic<br />

comparison of archaeal conjugative plasmids from Sulfolobus. <strong>Archaea</strong> 1,<br />

231e239.<br />

Grissa, I., Vergnaud, G., Pourcel, C., 2008. <strong>CRISPR</strong>compar: a website to<br />

compare clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats. Nucleic<br />

Acids Res. 36, 145e148.<br />

Grogan, D.W., 1996. Exchange of genetic markers at extremely high<br />

temperatures <strong>in</strong> <strong>the</strong> archaeaon Sulfolobus acidocaldarius. J. Bacteriol. 178,<br />

3207e3211.<br />

Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F., Nelson, K.E., 2005. A guild of<br />

45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/<br />

Cas subtypes exist <strong>in</strong> prokaryotic genomes. PloS Comput. Biol. 1,<br />

474e483.<br />

Hale, C., Kleppe, K., Terns, R.M., Terns, M.P., 2008. Prokaryotic silenc<strong>in</strong>g<br />

(psi)RNAs <strong>in</strong> Pyrococcus furiosus. RNA 14, 1e8.<br />

Hale, C.R., Zhao, P., Olson, S., Duff, M.O., Graveley, B.R., Wells, L.,<br />

Terns, R.M., Terns, M.P., 2009. RNA-guided RNA cleavage by a <strong>CRISPR</strong><br />

RNA-Cas prote<strong>in</strong> complex. Cell 139, 945e956.<br />

Hannon, G.J., 2002. RNA <strong>in</strong>terference. Nature 418, 244e251.<br />

Held, N.L., Whitaker, R.J., 2009. Viral biogeography revealed by signatures <strong>in</strong><br />

Sulfolobus islandicus genomes. Environ. Microbiol. 11, 457e466.<br />

Horvath, P., Romero, D.A., Coûté-Monvois<strong>in</strong>, A.-C., Richards, M., Deveau, H.<br />

, Mo<strong>in</strong>eau, S., Boyaval, P., Fremaux, C., Barrangou, R., 2008. Diversity,<br />

activity, and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />

J. Bacteriol. 190, 1401e1412.<br />

Horvath, P., Coûté-Monvois<strong>in</strong>, A.-C., Romero, D.A., Boyaval, P., Fremaux, C.,<br />

Barrangou, R., 2009. Comparative analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid<br />

bacteria genomes. Int. J. Food Microbiol. 131, 62e70.<br />

Jansen, R., Embden, J.D., Gaastra, W., Schouls, L.M., 2002. Identification of<br />

genes that are associated with DNA repeats <strong>in</strong> prokaryotes. Mol. Microbiol.<br />

43, 1565e1575.<br />

J<strong>in</strong>ek, M., Doudna, J.A., 2009. A three dimensional view of <strong>the</strong> molecular<br />

mach<strong>in</strong>ery of RNA <strong>in</strong>terference. Nature 457, 405e412.<br />

Karg<strong>in</strong>ov, F.V., Hannon, G.J., 2010. The <strong>CRISPR</strong> <strong>system</strong>: small RNA-guided<br />

defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7e19.<br />

Kawarabayashi, Y., H<strong>in</strong>o, Y., Horikawa, H., J<strong>in</strong>-no, K., Takahashi, M.,<br />

Sek<strong>in</strong>e, M., Baba, S., Ankai, A., Kosugi, H., Hosoyama, A., Fukui, S.,<br />

Nagai, Y., Nishijima, K., Otsuka, R., Nakazawa, H., Takamiya, M.,<br />

Kato, Y., Yoshizawa, T., Tanaka, T., Kudoh, Y., Yamazaki, J., Kushida, N.,<br />

Oguchi, A., Aoki, K., Masuda, S., Yanagii, M., Nishimura, M.,<br />

Yamagishi, A., Oshima, T., Kikuchi, H., 2001. Complete genome sequence<br />

of an aerobic <strong>the</strong>rmoacidophilic crenarchaeon, Sulfolobus tokodaii stra<strong>in</strong>7.<br />

DNA Res. 8, 123e140.<br />

Klattenhoff, C., Theurkauf, W., 2008. Biogenesis and germl<strong>in</strong>e functions of<br />

piRNAs. Development 135, 3e9.<br />

Kun<strong>in</strong>, V., Sorek, R., Hugenholtz, P., 2007. Evolutionary conservation of<br />

sequence and secondary structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol. 8,<br />

R611eR617.<br />

Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M.,<br />

Antonescu, C., Salzberg, S.L., 2004. Versatile and open software for<br />

compar<strong>in</strong>g large genomes. Genome Biol. 5, R12.<br />

Lillestøl, R.K., Redder, P., Garrett, R.A., Brügger, K., 2006. A putative viral<br />

defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59e72.<br />

Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H., Christiansen, J.,<br />

Garrett, R.A., 2009. <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genius Sulfolobus:<br />

bidirectional transcription and dynamic properties. Mol. Microbiol.<br />

72, 259e272.<br />

Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I., Koon<strong>in</strong>, E.V., 2006.<br />

A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes:<br />

computational analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />

analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms of action.<br />

Biol. Direct 1, 7.<br />

Marraff<strong>in</strong>i, L.A., Son<strong>the</strong>imer, E.J., 2008. <strong>CRISPR</strong> <strong>in</strong>terference limits horizontal<br />

gene transfer <strong>in</strong> Staphylococci by target<strong>in</strong>g DNA. Science 322,<br />

1843e1845.<br />

S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

Marraff<strong>in</strong>i, L.A., Son<strong>the</strong>imer, E.J., 2010. Self versus non-self discrim<strong>in</strong>ation<br />

dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463, 568e571.<br />

Mochizuki, K., Gorovsky, M.A., 2004. Small RNAs <strong>in</strong> genome rearrangements<br />

<strong>in</strong> Tetrahymena. Curr. Op<strong>in</strong>. Genet. Dev. 14, 181e187.<br />

Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., Soria, E., 2005. Interven<strong>in</strong>g<br />

sequences of regularly spaced prokaryotic repeats derive from<br />

foreign genetic elements. J. Mol. Evol. 60, 174e182.<br />

Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., Almendros, C., 2009.<br />

Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic <strong>CRISPR</strong><br />

<strong>system</strong>. Microbiology 155, 733e740.<br />

Muskhelishvili, G., Palm, P., Zillig, W., 1993. SSV1-encoded site-specific<br />

recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol. Gen. Genet. 237,<br />

334e342.<br />

Nelson, K.E., Clayton, E., Gill, S.R., Gw<strong>in</strong>n, M.L., Dodson, R.J., Haft, D.H.,<br />

Hickey, E.K., Peterson, J.D., Nelson, W.C., Ketchum, K.A., et al., 1999.<br />

Evidence for lateral gene transfer between archaea and bacteria from<br />

genome sequence of Thermotoga maritima. Nature 399, 323e329.<br />

Pearson, W.R., 2000. Flexible sequence similarity search<strong>in</strong>g with <strong>the</strong> FASTA3<br />

program package. Methods Mol. Biol. 132, 185e219.<br />

Peng, X., Brügger, K., Shen, B., Chen, L., She, Q., Garrett, R.A., 2003. Genusspecific<br />

prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> large clusters of DNA repeats (short regularly<br />

spaced repeats) present <strong>in</strong> Sulfolobus genomes. J. Bacteriol. 185,<br />

2410e2417.<br />

Peng, X., Kessler, A., Phan, H., Garrett, R.A., Prangishvili, D., 2004. Multiple<br />

variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel<br />

mechanism of genomic variation. Mol. Microbiol. 54, 366e375.<br />

Portillo, M.C., Gonzalez, J.M., 2009. <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong> <strong>the</strong>rmococcales:<br />

evidence for associated horizontal gene transfer <strong>in</strong> Pyrococcus furiosus. J.<br />

Appl. Genet. 50, 421e430.<br />

Pourcel, C., Salvignol, G., Vergnaud, G., 2005. <strong>CRISPR</strong> elements <strong>in</strong> Yers<strong>in</strong>ia<br />

pestis acquire new repeats by preferential uptake of bacteriophage DNA,<br />

and provide additional tools for evolutionary studies. Microbiology 151,<br />

653e663.<br />

Prangishvili, D., Forterre, P., Garrett, R.A., 2006. Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />

a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837e848.<br />

Pühl, Ü., Wurm, R., Arslan, Z., Geissen, R., Hofmann, N., Wagner, R., 2010.<br />

Identification and characterisation of E. coli <strong>CRISPR</strong>-cas promoters and<br />

<strong>the</strong>ir silenc<strong>in</strong>g by H-NS. Mol. Microbiol. 75, 1495e1512.<br />

Redder, P., Garrett, R.A., 2006. Mutations and rearrangements <strong>in</strong> <strong>the</strong> genome<br />

of Sulfolobus solfataricus P2. J. Bacteriol. 188, 4198e4206.<br />

Reno, M.L., Hel, N.L., Fields, C.J., Burke, P.V., Whitaker, R.J., 2009.<br />

Biogeography of <strong>the</strong> Sulfolobus islandicus pan-genome. Proc. Natl. Acad.<br />

Sci. U.S.A. 106, 8605e8610.<br />

Santangelo, T.J., Cubonová, L., Sk<strong>in</strong>ner, K.M., Reeve, J.N., 2009. <strong>Archaea</strong>l<br />

<strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191, 7102e7108.<br />

Sebaihia, M., Wren, B.W., Mullany, P., Fairwea<strong>the</strong>r, N.F., M<strong>in</strong>ton, N., Stabler, R.<br />

, Thomson, N.R., Roberts, A.P., Cerdeño-Tárraga, A.M., Wang, H., et al.,<br />

2006. The multidrug resistant human pathogen Clostridium difficile has<br />

a highly mobile mosaic genome. Nat. Genet. 38, 779e786.<br />

Shah, S.A., Hansen, N.R., Garrett, R.A., 2009. Distributions of <strong>CRISPR</strong> spacer<br />

matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles and<br />

implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem. Soc. Trans. 37,<br />

23e28.<br />

She, Q., Phan, H., Garrett, R.A., Albers, S.-V., Stedman, K.M., Zillig, W.,<br />

1998. Genetic profile of pNOB8 from Sulfolobus: <strong>the</strong> first conjugative<br />

plasmid from an archaeon. Extremophiles 2, 417e425.<br />

She, Q., S<strong>in</strong>gh, R.K., Confalonieri, F., Zivanovic, Y., Gordon, P., Allard, G.,<br />

Awayez, M.J., Chan-Weiher, C.C., Clausen, I.G., Curtis, B.A., et al.,<br />

2001a. The complete genome of <strong>the</strong> crenarchaeon Sulfolobus solfataricus<br />

P2. Proc. Natl. Acad. Sci. U.S.A. 98, 7835e7840.<br />

She, Q., Peng, X., Zillig, W., Garrett, R.A., 2001b. Gene capture events <strong>in</strong><br />

archaeal chromosomes. Nature 409, 478.<br />

Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>, M.-L., Huber, H.,<br />

Drungowski, M., Elge, T., Brosius, J., Hüttenhofer, A., 2002. Identification<br />

of 86 candidates for small non-messenger RNAs from <strong>the</strong> archaeon<br />

Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99, 7536e7541.<br />

Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K., Garrett, R.,<br />

Bachellerie, J.P., Hüttenhofer, A., et al., 2005. Identification of novel non-<br />

37


38 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />

cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />

solfataricus. Mol. Microbiol. 55, 469e481.<br />

Thompson, J.D., Higg<strong>in</strong>s, D.G., Gibson, T.J., 1994. CLUSTAL W: improv<strong>in</strong>g<br />

<strong>the</strong> sensitivity of progressive multiple sequence alignment through<br />

sequence weight<strong>in</strong>g, position-specific gap penalties and weight matrix<br />

choice. Nucleic Acids Res. 22, 4673e4680.<br />

Torar<strong>in</strong>sson, E., Klenk, H.P., Garrett, R.A., 2005. Divergent transcriptional and<br />

translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7, 47e54.<br />

Tyson, G.W., Banfield, J.F., 2007. Rapidly evolv<strong>in</strong>g <strong>CRISPR</strong>s implicated <strong>in</strong> acquired<br />

resistance of microorganisms to viruses. Environ. Microbiol. 10, 200e208.<br />

Van Embden, J.D.A., Van Gorkom, T., Kremer, K., Jansen, R., Van Der<br />

Zeijst, B.A.M., Schouls, L.M., 2000. Genetic variation and evolutionary<br />

orig<strong>in</strong> of <strong>the</strong> direct repeat locus of Mycobacterium tuberculosis complex<br />

bacteria. J. Bacteriol. 182, 2393e2401.<br />

Veith, A., Kl<strong>in</strong>gl, A., Zolghadr, B., Lauber, K., Mentele, R., Lottspeich, F.,<br />

Rachel, R., Albers, S.V., Kletz<strong>in</strong>, A., 2009. Acidianus, Sulfolobus and<br />

Metallosphaera surface layers: structure, composition and gene expression.<br />

Mol. Microbiol. 73, 58e72.<br />

Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />

Briegel, A., Rachel, R., Garrett, R.A., Prangishvili, D., 2008. SRV, a new<br />

rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />

with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />

6837e6845.<br />

Zhaxybayeva, O., Swi<strong>the</strong>rs, K.S., Lapierre, P., Fournier, G.P., Bickhart, D.M.,<br />

DeBoy, R.T., Nelson, K.E., Nesbø, C.L., Doolittle, W.F., Gogarten, J.P.,<br />

Noll, K.M., 2009. On <strong>the</strong> chimeric nature, <strong>the</strong>rmophilic orig<strong>in</strong>, and phylogenetic<br />

placement of <strong>the</strong> Thermotogales. Proc. Natl. Acad. Sci. U.S.A. 106,<br />

5865e5870.


JOURNAL OF BACTERIOLOGY, Apr. 2011, p. 1672–1680 Vol. 193, No. 7<br />

0021-9193/11/$12.00 doi:10.1128/JB.01487-10<br />

Copyright © 2011, American Society for Microbiology. All Rights Reserved.<br />

Genome Analyses of Icelandic Stra<strong>in</strong>s of Sulfolobus islandicus, Model<br />

Organisms for Genetic and Virus-Host Interaction Studies <br />

Li Guo, 1 † Kim Brügger, 2 † Chao Liu, 2 † Shiraz A. Shah, 2 † Huajun Zheng, 3 Yongqiang Zhu, 3<br />

Shengyue Wang, 3 Reidun K. Lillestøl, 2 Lanm<strong>in</strong>g Chen, 2 Jeremy Frank, 2 David Prangishvili, 4<br />

Lars Paul<strong>in</strong>, 5 Qunx<strong>in</strong> She, 2 ‡ Li Huang, 1 ‡* and Roger A. Garrett 2 ‡*<br />

State Key Laboratory of Microbial Resources, Institute of Microbiology, Ch<strong>in</strong>ese Academy of Sciences, No. 1 West Beichen Road,<br />

Chaoyang District, Beij<strong>in</strong>g 100101, Ch<strong>in</strong>a 1 ; <strong>Archaea</strong> Centre, Department of Biology, Copenhagen University, Ole Maaløes Vej 5,<br />

DK-2200N Copenhagen, Denmark 2 ;ShanghaiMOSTKeyLaboratoryofDiseaseandHealthGenomics,<br />

Ch<strong>in</strong>ese National Human Genome Center at Shanghai, Shanghai 201203, Ch<strong>in</strong>a 3 ; Molecular Biology of<br />

<strong>the</strong> Gene <strong>in</strong> Extremophiles Unit, Institut Pasteur, rue Dr Roux 25, 75724 Paris Cedex, France 4 ; and<br />

DNA Sequenc<strong>in</strong>g and Genomics Laboratory, Institute of Biotechnology, University of<br />

Hels<strong>in</strong>ki, 00790 Hels<strong>in</strong>ki, F<strong>in</strong>land 5<br />

Received 10 December 2010/Accepted 16 January 2011<br />

The genomes of two Sulfolobus islandicus stra<strong>in</strong>s obta<strong>in</strong>ed from Icelandic solfataras were sequenced and<br />

analyzed. Stra<strong>in</strong> REY15A is a host for a versatile genetic toolbox. It exhibits a genome of m<strong>in</strong>imal size, is stable<br />

genetically, and is easy to grow and manipulate. Stra<strong>in</strong> HVE10/4 shows a broad host range for exceptional<br />

crenarchaeal viruses and conjugative plasmids and was selected for study<strong>in</strong>g <strong>the</strong>ir life cycles and host<br />

<strong>in</strong>teractions. The genomes of stra<strong>in</strong>s REY15A and HVE10/4 are 2.5 and 2.7 Mb, respectively, and each genome<br />

carries a variable region of 0.5 to 0.7 Mb where major differences <strong>in</strong> gene content and gene order occur. These<br />

<strong>in</strong>clude gene clusters <strong>in</strong>volved <strong>in</strong> specific metabolic pathways, multiple copies of VapBC antitox<strong>in</strong>-tox<strong>in</strong> gene<br />

pairs, and <strong>in</strong> stra<strong>in</strong> HVE10/4, a 50-kb region rich <strong>in</strong> glycosyl transferase genes. The variable region also<br />

conta<strong>in</strong>s most of <strong>the</strong> <strong>in</strong>sertion sequence (IS) elements and high proportions of <strong>the</strong> orphan orfB elements and<br />

SMN1 m<strong>in</strong>iature <strong>in</strong>verted-repeat transposable elements (MITEs), as well as <strong>the</strong> clustered regular <strong>in</strong>terspaced<br />

short pal<strong>in</strong>dromic repeat (<strong>CRISPR</strong>)-based <strong>immune</strong> <strong>system</strong>s, which are complex and diverse <strong>in</strong> both stra<strong>in</strong>s,<br />

consistent with <strong>the</strong>m hav<strong>in</strong>g been mobilized both <strong>in</strong>tra- and <strong>in</strong>tercellularly. In contrast, <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong><br />

genomes are highly conserved <strong>in</strong> <strong>the</strong>ir prote<strong>in</strong> and RNA gene syntenies, closely resembl<strong>in</strong>g those of o<strong>the</strong>r S.<br />

islandicus and Sulfolobus solfataricus stra<strong>in</strong>s, and <strong>the</strong>y exhibit only m<strong>in</strong>or remnants of a few genetic elements,<br />

ma<strong>in</strong>ly conjugative plasmids, which have <strong>in</strong>tegrated at a few tRNA genes lack<strong>in</strong>g <strong>in</strong>trons. This provides a<br />

possible rationale for <strong>the</strong> presence of <strong>the</strong> <strong>in</strong>trons.<br />

Iceland has been a rich source of hyper<strong>the</strong>rmophilic crenarchaea<br />

over <strong>the</strong> past 3 decades and especially of acido<strong>the</strong>rmophilic<br />

members of <strong>the</strong> order Sulfolobales. Many Sulfolobus islandicus<br />

stra<strong>in</strong>s (“Island” is German for “Iceland”) have also<br />

yielded many novel viruses show<strong>in</strong>g varied and sometimes<br />

unique morphologies and exceptional genome contents. These<br />

properties are consistent with <strong>the</strong>se viruses constitut<strong>in</strong>g an<br />

archaeal l<strong>in</strong>eage dist<strong>in</strong>ct from those of bacteria and eukarya,<br />

and <strong>the</strong>y have now been classified <strong>in</strong>to several new viral families<br />

(38, 63). In addition, a family of conjugative plasmids has<br />

been characterized, with most members deriv<strong>in</strong>g from Iceland,<br />

which appear to conjugate by a mechanism unique to <strong>the</strong><br />

archaeal doma<strong>in</strong> (18, 37).<br />

Although <strong>the</strong> availability of genome sequences of Sulfolobus<br />

* Correspond<strong>in</strong>g author. Mail<strong>in</strong>g address for R. A. Garrett: <strong>Archaea</strong><br />

Centre, Department of Biology, Copenhagen University, Ole Maaløes<br />

Vej 5, DK-2200N Copenhagen, Denmark. Phone: 045-353-22010. Fax:<br />

045-353-22128. E-mail: garrett@bio.ku.dk. Mail<strong>in</strong>g address for L.<br />

Huang: State Key Laboratory of Microbial Resources, Institute of<br />

Microbiology, Ch<strong>in</strong>ese Academy of Sciences, No. 1 West Beichen<br />

Road, Chaoyang District, Beij<strong>in</strong>g 100101, Ch<strong>in</strong>a. Phone: 086-10-<br />

64807430. Fax: 086-10-64807429. E-mail: huangl@sun.im.ac.cn.<br />

† These authors contributed equally.<br />

‡ The last three authors are jo<strong>in</strong>t senior authors.<br />

Published ahead of pr<strong>in</strong>t on 28 January 2011.<br />

1672<br />

stra<strong>in</strong>s and <strong>the</strong>ir genetic elements has yielded important <strong>in</strong>sights<br />

<strong>in</strong>to <strong>the</strong> biology of <strong>the</strong>se model crenarchaea, a major<br />

impediment to more detailed <strong>in</strong>sights has been <strong>the</strong> paucity of<br />

robust and versatile vector-host <strong>system</strong>s for genetic studies. A<br />

few Sulfolobus species have been successfully employed as<br />

hosts for such <strong>system</strong>s, <strong>in</strong>clud<strong>in</strong>g Sulfolobus solfataricus stra<strong>in</strong>s<br />

P1 and 98/2 (22, 58), Sulfolobus acidocaldarius (57), and S.<br />

islandicus stra<strong>in</strong> REY15A (54). To date, <strong>the</strong> genetic tools developed<br />

for <strong>the</strong> latter host are <strong>the</strong> most versatile and <strong>in</strong>clude<br />

<strong>the</strong> follow<strong>in</strong>g: (i) Sulfolobus-Escherichia coli shuttle vectors<br />

carry<strong>in</strong>g ei<strong>the</strong>r viral or plasmid replication orig<strong>in</strong>s (50); (ii)<br />

conventional and novel gene knockout methodologies (14, 62),<br />

and (iii) a D-arab<strong>in</strong>ose-<strong>in</strong>ducible expression <strong>system</strong> with a lacS<br />

reporter gene <strong>system</strong> (35). The S. islandicus <strong>system</strong> has also<br />

been employed successfully to demonstrate <strong>the</strong> dynamic character<br />

of <strong>the</strong> clustered regular <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />

repeat (<strong>CRISPR</strong>)-based <strong>immune</strong> <strong>system</strong>s of Sulfolobus when<br />

challenged with genetic elements carry<strong>in</strong>g match<strong>in</strong>g viral gene<br />

and protospacers ma<strong>in</strong>ta<strong>in</strong>ed under selection (20). These developments<br />

necessitated <strong>the</strong> determ<strong>in</strong>ation of <strong>the</strong> genome sequence<br />

of S. islandicus stra<strong>in</strong> REY15A as a prerequisite for<br />

successful exploitation of <strong>the</strong> genetic <strong>system</strong>s.<br />

A second Icelandic stra<strong>in</strong>, S. islandicus stra<strong>in</strong> HVE10/4, has<br />

been employed as a broad laboratory host for propagat<strong>in</strong>g<br />

diverse Sulfolobus viruses and conjugative plasmids (63) and


VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1673<br />

FIG. 1. (A) Dot plot of <strong>the</strong> two Icelandic genomes show<strong>in</strong>g <strong>the</strong> approximate levels of sequence synteny. The large variable regions extend from<br />

about 0.35 to 1.0 Mb. Transposase genes are denoted by black l<strong>in</strong>es along <strong>the</strong> axes. Putative orig<strong>in</strong>s of replication adjacent to <strong>the</strong> cdc6 and whiP<br />

genes are <strong>in</strong>dicated with red circles, while <strong>the</strong> families of <strong>the</strong> <strong>CRISPR</strong>/Cas (I and III) and Cmr (B) modules are <strong>in</strong>dicated by blue squares. (B) Dot<br />

plot of <strong>the</strong> S. islandicus REY15A and S. solfataricus P2 genomes.<br />

was selected for <strong>in</strong>-depth studies of <strong>the</strong>ir life cycles and host<br />

<strong>in</strong>teractions. This effort received added impetus with <strong>the</strong> demonstration<br />

that some genetic elements show exceptional and<br />

sometimes unique properties of <strong>the</strong>ir viral life cycles or conjugative<br />

mechanisms (3, 8, 18, 40). Therefore, <strong>the</strong> genome<br />

sequence of S. islandicus stra<strong>in</strong> HVE10/4 was also determ<strong>in</strong>ed.<br />

The genome sequences of two Icelandic stra<strong>in</strong>s, REY15A<br />

and HVE10/4, were analyzed and compared and contrasted<br />

with one ano<strong>the</strong>r and with genomes of o<strong>the</strong>r S. solfataricus and<br />

S. islandicus stra<strong>in</strong>s isolated from different geographical locations,<br />

<strong>in</strong>clud<strong>in</strong>g Naples, Italy; Kamchatka, Russia; Lassen Volcanic<br />

National Park; and Yellowstone National Park (44, 53).<br />

MATERIALS AND METHODS<br />

Genome sequenc<strong>in</strong>g. S. islandicus stra<strong>in</strong>s REY15A and HVE10/4 were colony<br />

purified three times and cultured essentially as described earlier (11). Total DNA<br />

was extracted from <strong>the</strong> cells us<strong>in</strong>g phenol-chloroform and fur<strong>the</strong>r purified by<br />

CsCl density-gradient centrifugation. For stra<strong>in</strong> REY15A, sequenc<strong>in</strong>g of shotgun<br />

libraries with a 454 GS FLX sequenator yielded 324,123 reads with 31-fold<br />

genome coverage. For stra<strong>in</strong> HVE10/4, DNA was sonicated to yield fragments <strong>in</strong><br />

<strong>the</strong> size range of 1.5 to 4.0 kb, and clone libraries were generated <strong>in</strong> pUC18 us<strong>in</strong>g<br />

<strong>the</strong> SmaI site. Sequenc<strong>in</strong>g was performed on MegaBace 1000 sequenators to<br />

yield approximately 3-fold sequence coverage, and <strong>the</strong> sequenc<strong>in</strong>g data were<br />

comb<strong>in</strong>ed with a sequenc<strong>in</strong>g run us<strong>in</strong>g a 454 FLX sequenator to yield approximately<br />

10- to 15-fold coverage. The genome sequences were assembled us<strong>in</strong>g <strong>the</strong><br />

phred/phrap/consed package, contigs were l<strong>in</strong>ked by comb<strong>in</strong>atorial PCR us<strong>in</strong>g<br />

primers match<strong>in</strong>g to each contig end, and <strong>the</strong> PCR products were sequenced to<br />

close <strong>the</strong> gaps. Rema<strong>in</strong><strong>in</strong>g ambiguous sequence regions <strong>in</strong> <strong>the</strong> genome were<br />

identified and resolved by generat<strong>in</strong>g and sequenc<strong>in</strong>g PCR products. Both genomes<br />

were annotated automatically and ref<strong>in</strong>ed manually.<br />

Sequence analyses. Open read<strong>in</strong>g frames (ORFs) were predicted with Glimmer<br />

(13). Frameshifts were detected and checked by sequenc<strong>in</strong>g after manual<br />

annotation, and <strong>the</strong> rema<strong>in</strong><strong>in</strong>g frameshifts were considered to be au<strong>the</strong>ntic.<br />

Functional assignments of ORFs are based on searches aga<strong>in</strong>st GenBank (http:<br />

//www.ncbi.nlm.nih.gov/) and <strong>the</strong> Conserved Doma<strong>in</strong> Database (CDD) (www<br />

.ncbi.nlm.nih.gov/cdd/). tRNA genes were located with tRNAscan-SE (26). Potential<br />

noncod<strong>in</strong>g RNAs were predicted by comparison with <strong>the</strong> untranslated<br />

RNAs characterized for S. solfataricus and S. acidocaldarius, <strong>in</strong> terms of sequence<br />

similarity and gene context (see Results). Putative <strong>in</strong>sertion sequence<br />

(IS) elements were identified by BLASTN search aga<strong>in</strong>st <strong>the</strong> IS F<strong>in</strong>der database<br />

(http://www-is.biotoul.fr/). All annotations were manually curated us<strong>in</strong>g Artemis<br />

software (47).<br />

RESULTS<br />

Genome general properties. Genomes of <strong>the</strong> two Icelandic<br />

stra<strong>in</strong>s were sequenced us<strong>in</strong>g a comb<strong>in</strong>ation of sequenc<strong>in</strong>g<br />

strategies. S. islandicus REY15A was determ<strong>in</strong>ed primarily by<br />

454 sequenc<strong>in</strong>g, while stra<strong>in</strong> HVE10/4 was obta<strong>in</strong>ed by a comb<strong>in</strong>ation<br />

of Sanger and 454 sequenc<strong>in</strong>g at approximately 30fold<br />

and 10-fold coverage, respectively. Prote<strong>in</strong>-cod<strong>in</strong>g genes<br />

were annotated <strong>in</strong> Artemis (47), where start codons for s<strong>in</strong>gle<br />

genes and first genes of Sulfolobus operons were generally<br />

located 25 to 30 bp downstream from <strong>the</strong> archaeal hexameric<br />

TATA-like box and only genes with<strong>in</strong> operons were preceded<br />

by Sh<strong>in</strong>e-Dalgarno motifs, of which GGUG predom<strong>in</strong>ates<br />

(56). Where alternative start codons were juxtapositioned, we<br />

selected <strong>the</strong> most probable on <strong>the</strong> basis of its position relative<br />

to <strong>the</strong> putative promoter and/or Sh<strong>in</strong>e-Dalgarno motifs or experimental<br />

data from closely related organisms.<br />

Dot plots of <strong>the</strong> two genomes demonstrate long sections of<br />

gene synteny. One region of about 0.5 to 0.7 Mb exhibits<br />

extensive gene shuffl<strong>in</strong>g, and <strong>the</strong>re is a smaller region with a<br />

200-kb <strong>in</strong>version bordered by shuffled genes (Fig. 1A). Some of<br />

<strong>the</strong> m<strong>in</strong>or irregularities <strong>in</strong> <strong>the</strong> dot plot were attributable to<br />

<strong>in</strong>sertion or <strong>in</strong>tegration events. The synteny is ma<strong>in</strong>ta<strong>in</strong>ed, to a<br />

large degree, when each genome is compared to that of S.<br />

solfataricus P2, despite <strong>the</strong> occurrence of a large <strong>in</strong>version <strong>in</strong><br />

<strong>the</strong> latter, and this is illustrated <strong>in</strong> a dot plot for <strong>the</strong> genomes<br />

of stra<strong>in</strong> REY15A and S. solfataricus P2 (Fig. 1B). This extensive<br />

gene synteny is surpris<strong>in</strong>g, given <strong>the</strong> high level of transpositional<br />

activity occurr<strong>in</strong>g <strong>in</strong> S. solfataricus (Table 1) (7, 30, 41).<br />

A similar pattern was also observed when o<strong>the</strong>r pairs of S.<br />

islandicus genomes from different geographical locations were


1674 GUO ET AL. J. BACTERIOL.<br />

TABLE 1. Summary of genetic properties obta<strong>in</strong>ed from genomes of two Icelandic S. islandicus stra<strong>in</strong>s and o<strong>the</strong>r available S. solfataricus and S. islandicus stra<strong>in</strong>s<br />

Genetic properties obta<strong>in</strong>ed from genomes of b :<br />

Characteristic<br />

SsolP2 Ssol98/2 REY15A HVE10/4 LD8.5 LS2.15 M16.4 M16.27 M14.25 YG57.14 YN15.51<br />

Orig<strong>in</strong> Naples, Italy Unknown Reykjanes, Hvergaardi, Lassen, Lassen, Kamchatka, Kamchatka, Kamchatka, Yellowstone, Yellowstone,<br />

Iceland Iceland USA USA Russia Russia Russia USA USA<br />

GenBank accession no. AE006641 CP001402 CP002425 CP002426 CP001731 CP001399 CP001402 CP001401 CP001400 CP001403 CP001404<br />

Genome size (Mb) 3.0 2.7 2.5 2.7 2.7 2.7 2.6 2.7 2.6 2.7 2.8<br />

No. of:<br />

Conserved genes (total, 1,679) 675 656 765 847 837 842 848 823 797 869 840<br />

Unique s<strong>in</strong>gle genes (total, 1,346) 190 138 118 114 209 100 100 75 49 113 140<br />

Transporters (total, 15) 11 11 13 14 11 11 12 11 14 12 10<br />

VapBC antitox<strong>in</strong>-tox<strong>in</strong>s 20 18 16 18 21 24 21 21 21 20 19<br />

Glycosyl transferases (50-kb Absent Absent Absent 15 Absent 15 15 15 15 Absent 15<br />

region)<br />

Conserved noncod<strong>in</strong>g RNAs 123 81 44 42 42 44 39 42 43 42 42<br />

Transposases/IS elements 168 158 75 65 68 60 34 47 45 103 130<br />

MITEs (families) 155 (6) 133 (6) 9 (2) 11 (2) 4 (1) 4 (1) 10 (2) 5 (2) 7 (2) 5 (1) 5 (1)<br />

D, 2 B (B) B 2 B B B 2B, D B B, D B, D 3 B E<br />

I, II (2 I) II (2 I) I I, III I, III (I) I (II) I I, II I, II I I<br />

Cmr family(ies) a<br />

<strong>CRISPR</strong>/Cas family(ies) a<br />

a Letters and numbers <strong>in</strong> paren<strong>the</strong>ses for <strong>the</strong> Cmr and <strong>CRISPR</strong>/Cas modules families (25, 48) denote <strong>the</strong> numbers and families of putatively defective modules generally lack<strong>in</strong>g essential genes.<br />

b Lassen, Lassen Volcanic National Park; Yellowstone, Yellowstone National Park.<br />

FIG. 2. Neighbor-jo<strong>in</strong><strong>in</strong>g tree based on a gene content matrix, <strong>in</strong>clud<strong>in</strong>g<br />

<strong>the</strong> conserved, core, and unique genes for each available S.<br />

islandicus and S. solfataricus genome (Table 1). The branch lengths<br />

represent <strong>the</strong> number of differences between <strong>the</strong> stra<strong>in</strong>s <strong>in</strong> terms of <strong>the</strong><br />

presence or absence of <strong>in</strong>dividual genes. The data for <strong>the</strong> tree were<br />

prepared us<strong>in</strong>g methods described earlier (44, 48). Only bootstrap<br />

values below 100% for <strong>the</strong> <strong>in</strong>dividual branches are given.<br />

compared (48), consistent with a high level of conservation of<br />

gene synteny for all <strong>the</strong> S. solfataricus and S. islandicus genomes.<br />

A phylogenetic tree derived from <strong>the</strong> available genomes<br />

clusters toge<strong>the</strong>r S. islandicus stra<strong>in</strong>s from different geographical<br />

locations (44), with S. solfataricus stra<strong>in</strong>s P2 and 98/2 be<strong>in</strong>g<br />

more distantly related (Fig. 2 and Table 1). The nucleotide<br />

sequence identity for <strong>the</strong> concatenated core genes of <strong>the</strong> two S.<br />

islandicus genomes (Fig. 1A) is 99.6%, and between all <strong>the</strong> S.<br />

islandicus genomes, it is about 99%. The relatively long<br />

branches for <strong>in</strong>dividual stra<strong>in</strong>s (Fig. 2) arise ma<strong>in</strong>ly from differences<br />

<strong>in</strong> gene content of <strong>the</strong> large variable regions (Fig. 1A).<br />

The degree of sequence identity between <strong>the</strong> concatenated<br />

core genes of <strong>the</strong> S. islandicus and S. solfataricus genomes is<br />

about 90% (Fig. 2).<br />

Three orig<strong>in</strong>s of chromosome replication, demonstrated experimentally<br />

for S. solfataricus and S. acidocaldarius (27, 46),<br />

are well conserved with respect to both <strong>the</strong> DNA sequence and<br />

flank<strong>in</strong>g gene organization <strong>in</strong> both of <strong>the</strong> genomes, albeit with<br />

<strong>the</strong> orig<strong>in</strong> oriC2 be<strong>in</strong>g <strong>in</strong>verted relative to <strong>the</strong> genomes of S.<br />

solfataricus P2 and S. islandicus stra<strong>in</strong> YN1551 (Fig. 1B). Orig<strong>in</strong><br />

oriC1 lies immediately upstream of cdc6-1, oriC2 is close to<br />

cdc6-3, while oriC3 is positioned downstream of <strong>the</strong> whiP gene<br />

(Fig. 1A). The two cdc6 genes and <strong>the</strong> whiP gene encode<br />

putative replication <strong>in</strong>itiators (45).<br />

Large variable region. The genomes carry two types of variable<br />

regions. The large region, constitut<strong>in</strong>g 20 to 25% of each


VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1675<br />

tRNA<br />

TABLE 2. Integration events at tRNA genes show<strong>in</strong>g <strong>the</strong> sizes and orig<strong>in</strong>s of <strong>the</strong> residual <strong>in</strong>tegrated genes<br />

Intron<br />

present<br />

genome, extends approximately from positions encompass<strong>in</strong>g<br />

0.3 to 0.8 Mb and 0.3 to 1.0 Mb for stra<strong>in</strong>s REY15A and<br />

HVE10/4, respectively (Fig. 1A). The o<strong>the</strong>r class is represented<br />

ma<strong>in</strong>ly by regions downstream from tRNA genes, where<br />

<strong>in</strong>tegration events have occurred (Table 1; also, see below).<br />

The large variable region conta<strong>in</strong>s about 60% of <strong>the</strong> potentially<br />

transposable IS elements and most of <strong>the</strong> nonautonomous<br />

mobile elements, as well as many degenerate copies of <strong>the</strong><br />

former (Fig. 1A). It carries some gene clusters, which are<br />

present <strong>in</strong> one or more of <strong>the</strong> Sulfolobus genomes, <strong>in</strong>clud<strong>in</strong>g<br />

operons and gene cassettes associated with metabolic pathways,<br />

and it conta<strong>in</strong>s <strong>the</strong> diverse <strong>CRISPR</strong>/Cas and Cmr modules<br />

(Table 1; also, see below). It generally lacks essential<br />

genes; for example, no tRNA genes or replication orig<strong>in</strong>s are<br />

present, and thus, it appears to constitute a region where<br />

nonessential genes are collected, <strong>in</strong>terchanged, and exchanged<br />

<strong>in</strong>tercellularly and where genetic <strong>in</strong>novation occurs.<br />

Integration sites. tRNA gene <strong>in</strong>tegration events <strong>in</strong> Sulfolobus<br />

genomes predom<strong>in</strong>antly <strong>in</strong>volve conjugative plasmids and<br />

fuselloviruses, and <strong>the</strong>se were also <strong>the</strong> genetic elements most<br />

commonly isolated from acidic hot spr<strong>in</strong>gs <strong>in</strong> Iceland (63).<br />

Most <strong>in</strong>tegration events occur via an archaea-specific mechanism,<br />

whereby a viral/plasmid <strong>in</strong>tegrase gene recomb<strong>in</strong>es <strong>in</strong>to a<br />

host tRNA gene and partitions (32). The capture of a genetic<br />

element <strong>in</strong> a chromosome leaves a trace because <strong>the</strong> <strong>in</strong>tN<br />

fragment overlapp<strong>in</strong>g <strong>the</strong> tRNA gene is generally ma<strong>in</strong>ta<strong>in</strong>ed,<br />

even if <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> genetic element degenerates or is<br />

deleted (51, 52) (Table 2).<br />

For stra<strong>in</strong>s REY15A and HVE10/4, remnants of <strong>in</strong>tegrated<br />

elements adjo<strong>in</strong> eight and five tRNA genes, respectively<br />

(Table 2). Most of <strong>the</strong> <strong>in</strong>tegrated genes derive from conjugative<br />

plasmids, and fuselloviral genes were detected only at<br />

tRNA Thr [GGT] <strong>in</strong> each stra<strong>in</strong>, with an <strong>in</strong>tegrated region of<br />

unknown orig<strong>in</strong> at tRNA Met [CAT] <strong>in</strong> stra<strong>in</strong> REY15A. All of<br />

<strong>the</strong> <strong>in</strong>tegrated elements are highly degenerate, with IS elements<br />

or m<strong>in</strong>iature <strong>in</strong>verted-repeat transposable elements (MITEs) <strong>in</strong>serted<br />

downstream from <strong>the</strong> tRNA genes (Table 2). Given <strong>the</strong><br />

possibility of multiple <strong>in</strong>tegrations of genetic elements occurr<strong>in</strong>g<br />

at a given tRNA gene, it is difficult to analyze unambiguously <strong>the</strong><br />

orig<strong>in</strong>s of residual <strong>in</strong>tegrated genes (42).<br />

In contrast to <strong>the</strong> two Icelandic stra<strong>in</strong>s, <strong>the</strong> o<strong>the</strong>r S. solfataricus<br />

and S. islandicus genomes carry <strong>in</strong>tact genetic elements<br />

bordered by <strong>in</strong>tN and <strong>in</strong>tC fragments that are all potentially<br />

excisable (44, 52). They each show evidence of 2 to 7 tRNA<br />

gene <strong>in</strong>tegration events, <strong>in</strong> which <strong>the</strong> most conserved sites are<br />

tRNA Pro [GGG] and tRNA Ala [GGC], with less common events<br />

Integration event a<br />

REY15A HVE10/4<br />

Conserved?<br />

Val—TAC No SiRe1242-1247 conj plasmid No <strong>in</strong>sert No<br />

Phe—GAA No SiRe1321-1323 conj plasmid SiH1399-1402 conj plasmid Yes<br />

Met—CAT Yes SiRe1465-1479, 12 kb IS, pNOB8 <strong>in</strong>tegrase, unknown No <strong>in</strong>sert No<br />

Glu—TTC No SiRe1484-1490 conj plasmid SiH1561-1574 conj plasmid Yes<br />

Ala—GGC No <strong>in</strong>tN fragment <strong>in</strong>tN fragment Yes<br />

Thr—GGT No SiRe2413-2417 IS/MITEs SSV SiH2464-24672 SSV Partly<br />

Pro—GGG Yes <strong>in</strong>tN fragment <strong>in</strong>tN fragment Yes<br />

His—GTG No SiRe1787-1792, 7 kb IS No <strong>in</strong>sert No<br />

a SSV, sp<strong>in</strong>dle-shaped fusellovirus; conj, conjugative; <strong>in</strong>t, <strong>in</strong>tegrase.<br />

at tRNA Leu [GAG] and different alleles of tRNA Arg (Table 2).<br />

For <strong>the</strong> <strong>in</strong>tegrated tRNA genes of <strong>the</strong> Icelandic stra<strong>in</strong>s, <strong>the</strong>re<br />

was no significant correlation between <strong>the</strong> identity of <strong>the</strong><br />

tRNA anticodon and <strong>the</strong> frequency of codon usage or between<br />

<strong>the</strong> encoded am<strong>in</strong>o acid and <strong>the</strong> average number of am<strong>in</strong>o<br />

acids <strong>in</strong> <strong>the</strong> genome-encoded prote<strong>in</strong>s.<br />

Anti-<strong>in</strong>tegration role for tRNA <strong>in</strong>trons. Each genome carries<br />

45 tRNA genes and 2 to 3 pseudo-tRNA genes all located <strong>in</strong><br />

conserved regions. Sixteen of <strong>the</strong> tRNA genes conta<strong>in</strong> <strong>in</strong>trons<br />

immediately 3 to <strong>the</strong> anticodon, vary<strong>in</strong>g <strong>in</strong> size from 12 to 65<br />

bp, and <strong>in</strong> contrast to many archaeal tRNA genes, none were<br />

detected at o<strong>the</strong>r sites (29), although putatively degenerate<br />

<strong>in</strong>trons, lack<strong>in</strong>g <strong>the</strong> capacity to form splic<strong>in</strong>g sites, occur <strong>in</strong><br />

D-loop regions of tRNA Glu [CTC] and tRNA Glu [TTC]. Moreover,<br />

<strong>the</strong> tRNA genes and <strong>in</strong>trons are highly conserved <strong>in</strong><br />

sequence between <strong>the</strong> two genomes, and also with <strong>the</strong> o<strong>the</strong>r six<br />

S. islandicus genomes, with very few base changes occurr<strong>in</strong>g<br />

between <strong>the</strong> <strong>in</strong>trons of a given tRNA. This high level of tRNA<br />

and <strong>in</strong>tron sequence conservation extends to S. solfataricus P2,<br />

with only very m<strong>in</strong>or differences observed for about one-third<br />

of <strong>the</strong> genes, and it re<strong>in</strong>forces <strong>the</strong> concept that <strong>the</strong> RNA<br />

<strong>in</strong>trons are functionally important (5).<br />

Apossiblefunctionfor<strong>the</strong>tRNA<strong>in</strong>trons,suggestedby<br />

<strong>the</strong> above-described analyses, is that <strong>the</strong>y provide protection<br />

aga<strong>in</strong>st <strong>in</strong>tegration of genetic elements <strong>in</strong>to tRNA genes.<br />

Integration can be disadvantageous <strong>in</strong> that pre-tRNA transcription<br />

can be impaired. Only two <strong>in</strong>tron-carry<strong>in</strong>g tRNA<br />

genes showed evidence of <strong>in</strong>tegration events (Table 2). For<br />

<strong>the</strong> tRNA Met [CAT] gene copies, an <strong>in</strong>tact <strong>in</strong>tegrase gene is<br />

located downstream from <strong>the</strong> tRNA gene, while for <strong>the</strong><br />

tRNA Pro [GGG], an overlapp<strong>in</strong>g <strong>in</strong>tN fragment is present,<br />

but <strong>the</strong> overlapp<strong>in</strong>g sequence does not extend to <strong>the</strong> <strong>in</strong>tron,<br />

suggest<strong>in</strong>g that <strong>the</strong> <strong>in</strong>tron entered after <strong>the</strong> <strong>in</strong>tegration<br />

event. This is consistent with <strong>the</strong> latter <strong>in</strong>tegration event<br />

be<strong>in</strong>g <strong>the</strong> most conserved, and probably <strong>the</strong> most ancient,<br />

among Sulfolobus species.<br />

IS elements and <strong>the</strong> versatile orfB element. Each genome<br />

carries a limited range of IS element types, with some <strong>in</strong> multiple<br />

copies (Table 3). The IS elements are clustered <strong>in</strong> <strong>the</strong><br />

variable genomic region and also downstream from tRNA<br />

genes that have undergone <strong>in</strong>tegration events (Fig. 1A). Many<br />

of <strong>the</strong>se elements appear to be <strong>in</strong>tact, carry<strong>in</strong>g <strong>the</strong> <strong>in</strong>verted<br />

term<strong>in</strong>al repeats (ITRs) required for transposition, but exhibit<br />

fragmented transposase genes, which are unlikely to be restored<br />

by programmed translational frameshift<strong>in</strong>g, as was observed<br />

for some bacterial transposases of <strong>the</strong> IS1 and IS3


1676 GUO ET AL. J. BACTERIOL.<br />

Element Family<br />

TABLE 3. Properties of <strong>the</strong> IS elements, transposases, and MITEs <strong>in</strong> <strong>the</strong> Icelandic genomes a<br />

ORFs<br />

REY15A<br />

copies<br />

families (28). Although some of <strong>the</strong>se elements may be mobilizable<br />

by transposases act<strong>in</strong>g <strong>in</strong> trans, for over one-third of <strong>the</strong><br />

IS families present, <strong>the</strong>re is no encoded transposase (Table 3).<br />

Potentially, <strong>the</strong> most active elements are ISC1200 and ISC1234<br />

<strong>in</strong> both genomes and ISC1229 <strong>in</strong> stra<strong>in</strong> HVE10/4 (Table 3).<br />

The two Icelandic S. islandicus stra<strong>in</strong>s, toge<strong>the</strong>r with those<br />

from Kamchatka, Russia, carry <strong>the</strong> lowest number of IS elements<br />

(Table 1), many of which are <strong>in</strong>active.<br />

orfB elements of family IS605, toge<strong>the</strong>r with elements of <strong>the</strong><br />

IS6 family (Table 3), are considered to represent <strong>the</strong> few<br />

classes of transposable elements that are ancestral to <strong>the</strong> archaeal<br />

doma<strong>in</strong> (16). orfB occurs alone, or toge<strong>the</strong>r with a<br />

transposase gene, orfA, <strong>in</strong> <strong>the</strong> IS200/605 family of transposable<br />

elements. They lack ITRs, and both element types occur<br />

commonly <strong>in</strong> viruses and conjugative plasmids of <strong>the</strong> Sulfolobales<br />

(18, 40) (Table 3). Exceptionally, stra<strong>in</strong> REY15A and<br />

HVE10/4 genomes carry 11 and 16 nearly identical copies of<br />

<strong>the</strong> s<strong>in</strong>gle orfB elements <strong>in</strong> unconserved genomic positions,<br />

respectively. This is consistent with <strong>the</strong>se be<strong>in</strong>g <strong>the</strong> most active<br />

transposable elements <strong>in</strong> each genome (Table 3), although it<br />

rema<strong>in</strong>s uncerta<strong>in</strong> whe<strong>the</strong>r <strong>the</strong>y are autonomous or require an<br />

OrfA <strong>in</strong> trans for mobility (16). In addition, <strong>the</strong> orfB elements<br />

are exceptionally adaptable, because a fur<strong>the</strong>r 8 and 2 copies<br />

are physically coupled to copies of ISC1200 for stra<strong>in</strong>s<br />

REY15A and HVE10/4, respectively (Table 3), and are potentially<br />

cotransposable.<br />

Sulfolobus MITEs. Only two MITE types were detected <strong>in</strong><br />

multiple copies <strong>in</strong> each genome, SMN1 (320 bp) and SM3A<br />

(164 bp) (Table 3), and both of which are capable of nonautonomous<br />

transposition <strong>in</strong> different S. islandicus stra<strong>in</strong>s, facilitated<br />

by transposases of ISC1733 and ISC1058, respectively<br />

(2, 4, 43). All SMN1 copies are located immediately downstream<br />

from <strong>the</strong> sequence TTTAA, but none occur at conserved<br />

positions with<strong>in</strong> <strong>the</strong> two genomes. Clearly, <strong>the</strong> SMN1<br />

Intact TPases<br />

<strong>in</strong> REY15A<br />

No. of:<br />

HVE10/<br />

4 copies<br />

Intact TPases<br />

<strong>in</strong> HVE10/4<br />

ISC796 IS1 1 5 0 4 1 1<br />

ISC1043 ISL3 1 1 0 1 0 1<br />

ISC1048 IS630 1 10 0 12 0 3<br />

ISC1058 IS5 1 2 0 1 0 1<br />

ISC1078 IS630 1 1 0 1 0 0<br />

ISC1190 IS110 1 3 1 1 0 0<br />

ISC1200 ISH3 1 22 11 7 3 0<br />

ISC1205 ISCNY 2 3 0 4 2 1<br />

ISC1229 IS110 1 4 2 10 9 1<br />

ISC1234 IS5 1 8 6 7 5 1<br />

ISC1332 IS256 2 1 1 1 1 0<br />

ISC1395 IS630 1/2 2 1 0 0 0<br />

ISC1733 IS200/IS605 2 8 8 2 2 1<br />

ISC1921 IS607 2 1 1 0 0 0<br />

ISSis1 (pARN4) IS6 1 4 2 5 4 0<br />

ISSto2 IS6 1 2 1 2 2 0<br />

OrfB IS605 1 19 (8) 18 18 (2) 17 0<br />

SM3A 0 2 0 2 0 2<br />

SMN1 0 7 7 9 9 0<br />

Conserved<br />

genome<br />

positions<br />

a The nomenclature used for IS elements and MITEs follows that which was used previously (2, 6, 16). For <strong>the</strong> OrfB elements, <strong>the</strong> numbers <strong>in</strong> paren<strong>the</strong>ses <strong>in</strong>dicate<br />

<strong>the</strong> numbers of copies that are physically l<strong>in</strong>ked to ISC1200 elements. TPase, transposase.<br />

MITEs are active <strong>in</strong> both of <strong>the</strong> genomes, as is ISC1733, which<br />

encodes <strong>the</strong> mobiliz<strong>in</strong>g transposase (Table 3), and <strong>the</strong>y appear<br />

to be cleanly excised when mobilized, <strong>in</strong> agreement with <strong>the</strong><br />

results of an earlier <strong>in</strong>duced excision <strong>in</strong> <strong>the</strong> S. islandicus stra<strong>in</strong><br />

REN1H1 (2). Although most SMN1 copies lie <strong>in</strong> <strong>in</strong>tergenic<br />

regions, and may or may not affect regulatory signals, some<br />

appear to <strong>in</strong>activate or alter genes. Thus, <strong>in</strong> stra<strong>in</strong> REY15A,<br />

an AAA ATPase (SiRe0883) and a hypo<strong>the</strong>tical gene<br />

(SiRe0925) have <strong>in</strong>curred <strong>in</strong>sertions <strong>in</strong> <strong>the</strong>ir promoters, and <strong>in</strong><br />

stra<strong>in</strong> HVE10/4, SMN1 copies partially overlap with two genes<br />

(SiH0773/2472), generat<strong>in</strong>g altered ORF sequences.<br />

In contrast, <strong>the</strong> two SM3A copies are conserved <strong>in</strong> position<br />

<strong>in</strong> each genome, consistent with <strong>the</strong> mobiliz<strong>in</strong>g transposase<br />

encoded <strong>in</strong> ISC1058 be<strong>in</strong>g degenerate <strong>in</strong> both genomes. Never<strong>the</strong>less,<br />

each SM3A copy reta<strong>in</strong>s <strong>the</strong> conserved 8-bp <strong>in</strong>verted<br />

term<strong>in</strong>al repeat of <strong>the</strong> ISC1058 element (and unconserved<br />

9-bp direct repeats result<strong>in</strong>g from <strong>the</strong> transposition event) and<br />

can potentially be mobilized if a transposase-encod<strong>in</strong>g<br />

ISC1058 element enters <strong>the</strong> cell. Their ma<strong>in</strong>tenance as <strong>in</strong>tact<br />

elements may result from one SM3A copy overlapp<strong>in</strong>g with <strong>the</strong><br />

start of a conserved C/D box RNA gene (3), which may alter its<br />

transcriptional properties, while <strong>the</strong> o<strong>the</strong>r lies between promoters<br />

of two conserved prote<strong>in</strong> genes and may <strong>in</strong>fluence <strong>the</strong>ir<br />

relative transcriptional levels. SM3A occurs <strong>in</strong> a few copies <strong>in</strong><br />

each of <strong>the</strong> sequenced S. islandicus genomes, whereas SMN1 is<br />

limited to <strong>the</strong> Icelandic and three Kamchatka stra<strong>in</strong>s, where it<br />

occurs <strong>in</strong> 1 to 5 copies (Table 1).<br />

Stra<strong>in</strong>-specific metabolic pathways. Each Icelandic stra<strong>in</strong><br />

shows a few specific metabolic properties. Thus, <strong>the</strong> REY15/A<br />

stra<strong>in</strong> carries an operon (SiRe0441-0445) encod<strong>in</strong>g enzymes<br />

implicated <strong>in</strong> nitrate reduction and nitrite extrusion, suggest<strong>in</strong>g<br />

that it can use nitrate as a term<strong>in</strong>al electron acceptor for<br />

anaerobic respiration. The operon is located <strong>in</strong> <strong>the</strong> variable<br />

region and has been observed previously only for two o<strong>the</strong>r


VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1677<br />

archaea, S. islandicus stra<strong>in</strong>s M.14.25 and M.16.27. The larger<br />

genome of stra<strong>in</strong> HVE10/4 exclusively carries a urease operon<br />

(SiH0978-0983) predicted to encode enzymes <strong>in</strong>volved <strong>in</strong> <strong>the</strong><br />

hydrolysis of urea to NH 4 and CO 2 and previously found only<br />

<strong>in</strong> <strong>the</strong> archaea Sulfolobus tokodaii, Metallosphaera sedula, and<br />

Cenarchaeum symbiosum. Moreover, uniquely for a Sulfolobus<br />

species, stra<strong>in</strong> HVE10/4 also carries several genes predicted to<br />

encode hydrogenases and hydrogenase maturation enzymes<br />

(SiH0883-0892) <strong>in</strong> <strong>the</strong> variable region, which suggests that <strong>the</strong><br />

stra<strong>in</strong> may be able to grow anaerobically.<br />

A 50-kb region of stra<strong>in</strong> HVE10/4 <strong>in</strong> <strong>the</strong> variable region<br />

(SiH0447-0489) is bordered by IS elements and carries 15<br />

predicted glycosyl transferase genes (group 1 and family 2),<br />

constitut<strong>in</strong>g about half of <strong>the</strong> genome copies, <strong>in</strong>terspersed almost<br />

exclusively with genes of unknown function and a gene<br />

encod<strong>in</strong>g a predicted polysaccharide biosyn<strong>the</strong>sis enzyme. It is<br />

well established that Sulfolobus S-layer prote<strong>in</strong>s SlaA and SlaB<br />

(SiRe1612/1 and SiH1691/0, respectively) are heavily glycosylated<br />

(36), but <strong>the</strong> relatively low GC content of <strong>the</strong> region<br />

suggests that it has been <strong>in</strong>serted and has an alternative unknown<br />

function. The genome region is absent from stra<strong>in</strong><br />

REY15A and from some of <strong>the</strong> o<strong>the</strong>r S. islandicus stra<strong>in</strong>s<br />

(Table 1).<br />

Transporters. Sulfolobus stra<strong>in</strong>s utilize different sugars and<br />

carbohydrates as carbon and energy sources (19), consistent<br />

with <strong>the</strong>ir cod<strong>in</strong>g capacity for solute ABC transporters. A total<br />

of 15 different ABC transporters were identified, of which<br />

stra<strong>in</strong> REY15A carries 12 and stra<strong>in</strong> HVE10/4 conta<strong>in</strong>s 14. Of<br />

<strong>the</strong>se, 11 ABC transporters are present <strong>in</strong> S. solfataricus P2<br />

(53), 6 <strong>in</strong> S. tokodaii (23), but only 3 <strong>in</strong> S. acidocaldarius (9).<br />

The o<strong>the</strong>r S. islandicus genomes each carry 10 to 14 ABC<br />

transporters (44) (Table 1). In both of <strong>the</strong> Icelandic genomes,<br />

many ABC transporter genes are located <strong>in</strong> <strong>the</strong> variable region<br />

(Fig. 1A) and are often flanked by transposons, consistent with<br />

<strong>the</strong>ir be<strong>in</strong>g subjected to loss or ga<strong>in</strong> events.<br />

The ABC transporters are diverse, and some of <strong>the</strong>ir solute<br />

specificities have been identified for o<strong>the</strong>r Sulfolobus stra<strong>in</strong>s<br />

(15, 24). Cellobiose, maltose, and arab<strong>in</strong>ose transporters are<br />

present <strong>in</strong> both of <strong>the</strong> Icelandic genomes and most o<strong>the</strong>r sequenced<br />

S. solfataricus and S. islandicus genomes, although a<br />

few S. islandicus stra<strong>in</strong>s lack one of <strong>the</strong> <strong>system</strong>s, as follows: <strong>the</strong><br />

arab<strong>in</strong>ose <strong>system</strong> is absent from stra<strong>in</strong> YG5714, while <strong>the</strong> maltose<br />

<strong>system</strong> is not present <strong>in</strong> stra<strong>in</strong>s YN1551 and LD215. Strik<strong>in</strong>gly,<br />

<strong>the</strong> transporter of glucose, <strong>the</strong> preferred carbon source<br />

for many microbes, is present only <strong>in</strong> <strong>the</strong> Icelandic stra<strong>in</strong>s, S.<br />

islandicus stra<strong>in</strong>s M1415 and YG5714, and <strong>in</strong> S. solfataricus P2.<br />

The lack of specific ABC transporters suggests ei<strong>the</strong>r that<br />

glucose is an uncommon nutrient <strong>in</strong> hot environments or that<br />

ano<strong>the</strong>r ABC transporter can facilitate glucose transport. One<br />

ABC transporter encoded <strong>in</strong> <strong>the</strong> variable region of stra<strong>in</strong><br />

HVE10/4 (SiH0899-0903), flanked by IS elements, appears to<br />

be unique <strong>in</strong> public sequence databases.<br />

Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s. Four of <strong>the</strong> eight families of antitox<strong>in</strong>-tox<strong>in</strong><br />

complexes characterized for free-liv<strong>in</strong>g bacteria also<br />

occur <strong>in</strong> archaea, of which <strong>the</strong> VapBC family is by far <strong>the</strong> most<br />

abundant (34) and is <strong>the</strong> ma<strong>in</strong> antitox<strong>in</strong>-tox<strong>in</strong> family that we<br />

detected <strong>in</strong> <strong>the</strong> Sulfolobus stra<strong>in</strong>s. The Icelandic stra<strong>in</strong>s REY15A<br />

and HVE10/4 carry 17 and 18 vapBC gene pairs, respectively<br />

(Table 1), as well as 2 vapC-like gene copies coupled to o<strong>the</strong>r<br />

genes. They are distributed throughout <strong>the</strong> genomes, with several<br />

TABLE 4. Summary of Sulfolobus conserved noncod<strong>in</strong>g RNA genes<br />

located <strong>in</strong> <strong>the</strong> two Icelandic genomes<br />

RNA Function/modification<br />

No. of <strong>in</strong>dicated RNAs <strong>in</strong><br />

genome of stra<strong>in</strong>:<br />

REY15A HVE10/4<br />

C/D box rRNA 18 16<br />

C/D box tRNA 4 4<br />

C/D box Unknown 3 3<br />

H/ACA box tRNA 2 2<br />

Noncod<strong>in</strong>g Unknown 31 27<br />

Total 58 53<br />

located <strong>in</strong> <strong>the</strong> variable region, and only five gene pairs are conserved<br />

<strong>in</strong> sequence and gene contexts <strong>in</strong> both stra<strong>in</strong>s (SiRe0698/<br />

SiH0636, SiRe2073/SiH2137, SiRe2171/SiH2227, SiRe2294/<br />

SiH2344, and SiRe2626/SiH2689). Sequence alignments and treebuild<strong>in</strong>g<br />

exercises demonstrated that <strong>the</strong> sequences of both<br />

antitox<strong>in</strong>s and tox<strong>in</strong>s with<strong>in</strong> each genome are very diverse and can<br />

be classified <strong>in</strong>to subtypes (data not shown), consistent with <strong>the</strong>ir<br />

functional diversity and target<strong>in</strong>g of different cellular sites. These<br />

data also <strong>in</strong>dicate, for given gene pairs, that <strong>the</strong> subtypes of VapB<br />

and VapC do not always correspond, imply<strong>in</strong>g that some gene<br />

pairs may have exchanged partners.<br />

Read<strong>in</strong>g frame shifts and mRNA <strong>in</strong>tron splic<strong>in</strong>g. Examples<br />

of translational read<strong>in</strong>g frame shifts yield<strong>in</strong>g s<strong>in</strong>gle polypeptides<br />

have been demonstrated experimentally for S. solfataricus<br />

P2 (10). For two of <strong>the</strong>se, a predicted transketolase<br />

(SiRe1696/8 and SiH1776/8) and a putative O-sialoglycoprote<strong>in</strong><br />

endopeptidase (SiRe1569/70 and SiH1648/9), <strong>the</strong> S. islandicus<br />

genes overlap <strong>in</strong> a similar way and are likely to undergo<br />

read<strong>in</strong>g frame shifts. In contrast to S. solfataricus P2, -fucosidase<br />

(SiRe2185 and SiH2241) is a s<strong>in</strong>gle gene, as is <strong>the</strong><br />

predicted dihydrolipoamide acyltransferase gene (SiH0582),<br />

located only <strong>in</strong> stra<strong>in</strong> HVE10/4. Very few transposase genes<br />

present <strong>in</strong> IS elements (Table 3) carry a s<strong>in</strong>gle read<strong>in</strong>g frame<br />

shift that could be expressed as a s<strong>in</strong>gle prote<strong>in</strong> via translational<br />

read<strong>in</strong>g frame shifts (28).<br />

Transcripts of <strong>the</strong> <strong>in</strong>tron-carry<strong>in</strong>g cbf5 genes (SiRe1607/8<br />

and SiH1686/7) have been demonstrated to be spliced by <strong>the</strong><br />

archaeal splic<strong>in</strong>g enzyme at <strong>the</strong> mRNA level <strong>in</strong> some crenarchaea<br />

(60). O<strong>the</strong>r mRNAs, <strong>in</strong>clud<strong>in</strong>g those encod<strong>in</strong>g <strong>the</strong> XPD<br />

helicase (SiRe1685/SiH1765), have been predicted to undergo<br />

splic<strong>in</strong>g, but experimental support is lack<strong>in</strong>g (5).<br />

Noncod<strong>in</strong>g RNAs. Many untranslated RNAs have been<br />

characterized for S. solfataricus and S. acidocaldarius us<strong>in</strong>g a<br />

variety of techniques, <strong>in</strong>clud<strong>in</strong>g prob<strong>in</strong>g cell extracts for RNA<br />

with K-turn b<strong>in</strong>d<strong>in</strong>g motifs and generat<strong>in</strong>g cDNA libraries of<br />

total cellular RNA extracts, as well as numerous antisense<br />

RNAs (33, 55, 59, 61). Most of <strong>the</strong>se RNAs were characterized<br />

for nucleotide length and partial sequence, and several were<br />

detected by more than one experimental approach. We have<br />

reanalyzed all <strong>the</strong>se different RNA entities and have annotated<br />

<strong>the</strong> S. islandicus RNA homologs which are conserved <strong>in</strong> both<br />

sequence and gene contexts. The total number of RNA genes<br />

and <strong>the</strong>ir putative functions are given (Table 4).<br />

As for o<strong>the</strong>r archaeal hyper<strong>the</strong>rmophiles, each genome carries<br />

many C/D box RNAs that methylate primarily rRNAs and<br />

tRNAs (Table 4). In stra<strong>in</strong>s REY15A and HVE10/4, 18 and 16


1678 GUO ET AL. J. BACTERIOL.<br />

FIG. 3. (A) Phylogenetic tree of Cmr2 and its homologues <strong>in</strong> all sequenced archaeal genomes generates 5 families, A to E. The two Icelandic stra<strong>in</strong>s<br />

carry family B Cmr modules, for which <strong>the</strong> gene order is shown. O<strong>the</strong>r sequenced S. islandicus and S. solfataricus stra<strong>in</strong>s also carry Cmr modules of family<br />

Band,lessfrequently,familiesDorE,as<strong>in</strong>dicatedon<strong>the</strong>tree.(B)Schematicrepresentationsof<strong>the</strong><strong>CRISPR</strong>/Cascassettes<strong>in</strong><strong>the</strong>twoIcelandicstra<strong>in</strong>s,<br />

toge<strong>the</strong>r with <strong>the</strong> contents of <strong>the</strong>ir <strong>CRISPR</strong> loci. Stra<strong>in</strong> REY15A carries a s<strong>in</strong>gle family I <strong>CRISPR</strong>/Cas cassette (blue), whereas HVE10/4 carries cassettes<br />

from families III and I (orange and blue, respectively). Compositions of <strong>the</strong> <strong>in</strong>dividual <strong>CRISPR</strong> loci are shown, where each triangle represents a<br />

spacer-repeat unit. Significant spacer matches to sequenced viruses and plasmids are color coded (red, rudiviruses; orange, lipothrixviruses; yellow,<br />

fuselloviruses; green, bicaudaviruses; turquoise, turreted icosahedral viruses; blue, conjugative plasmids; and violet, cryptic plasmids).<br />

C/D box RNAs target rRNAs, respectively, while 4 modify<br />

tRNAs and a fur<strong>the</strong>r 3 have unknown targets. Two copies of<br />

H/ACA RNA genes are present <strong>in</strong> each genome which, toge<strong>the</strong>r<br />

with <strong>the</strong> aPus7 prote<strong>in</strong> (SiRe1836 and SiH1908), generate pseudourid<strong>in</strong>e-35<br />

<strong>in</strong> pre-tRNA Tyr transcripts (31). Each of <strong>the</strong>se C/D<br />

box and H/ACA box RNA genes can be detected <strong>in</strong> <strong>the</strong> o<strong>the</strong>r<br />

available S. islandicus genomes, which underl<strong>in</strong>es <strong>the</strong>ir functional<br />

importance. Of <strong>the</strong>se, only three RNA genes characterized for<br />

o<strong>the</strong>r Sulfolobus stra<strong>in</strong>s, Sso-sR4, Sso-sR8, and Sso-92, were not<br />

located <strong>in</strong> any S. islandicus genomes (33, 55). For <strong>the</strong> numerous<br />

noncod<strong>in</strong>g RNAs of unknown function, similar contents were<br />

found for <strong>the</strong> two Icelandic stra<strong>in</strong>s (Table 4) and for <strong>the</strong> o<strong>the</strong>r S.<br />

islandicus stra<strong>in</strong>s, with only a few variations (Table 1), <strong>the</strong>reby<br />

underl<strong>in</strong><strong>in</strong>g <strong>the</strong>ir functional importance.<br />

Diversity of <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s. The<br />

<strong>CRISPR</strong>/Cas and Cmr modules all lie with<strong>in</strong> <strong>the</strong> large variable<br />

regions. They show marked heterogeneity <strong>in</strong> <strong>the</strong> number and<br />

family (25, 48) and are unconserved <strong>in</strong> position between <strong>the</strong><br />

genomes (Fig. 1A). Whereas REY15A carries one paired<br />

<strong>CRISPR</strong>/Cas module of <strong>the</strong> family I type and two family B<br />

Cmr modules, HVE10/4 conta<strong>in</strong>s two paired <strong>CRISPR</strong>/Cas<br />

modules of family I and III types and a s<strong>in</strong>gle family B Cmr<br />

module (48) (Fig. 3A and B). This diversity of <strong>CRISPR</strong>-based<br />

<strong>system</strong>s also extends to <strong>the</strong> o<strong>the</strong>r S. solfataricus and S. islandicus<br />

genomes (Table 1). Although <strong>the</strong> gene content and organization<br />

of <strong>the</strong> paired family I <strong>CRISPR</strong>/Cas modules are quite<br />

conserved among crenarchaea (48), exceptionally, for stra<strong>in</strong><br />

HVE10/4, <strong>the</strong> <strong>in</strong>ternal group of cas genes located between <strong>the</strong><br />

two leader regions is <strong>in</strong>verted (Fig. 3B), <strong>in</strong>dicative of a rearrangement<br />

hav<strong>in</strong>g occurred with<strong>in</strong> <strong>the</strong> module, possibly via <strong>the</strong><br />

identical <strong>in</strong>verted repeat sequences of <strong>the</strong> border<strong>in</strong>g leader<br />

regions (Fig. 3B).<br />

The <strong>CRISPR</strong> loci of stra<strong>in</strong> REY15A carry 115 and 93 spacerrepeat<br />

units centered at position 733,000, while those of<br />

HVE10/4 conta<strong>in</strong> 116 and 101 repeat-spacer units and 35 and<br />

14 repeat-spacer units centered at positions 364000 and<br />

745000, respectively (Fig. 1A). No spacer sequence identity<br />

was detected with<strong>in</strong>, or between, <strong>the</strong> two Icelandic stra<strong>in</strong>s or<br />

with <strong>the</strong> o<strong>the</strong>r S. solfataricus and S. islandicus genomes. None<br />

of <strong>the</strong> available fully sequenced S. islandicus genomes (Table<br />

1) have any spacers <strong>in</strong> common, <strong>in</strong> contrast to <strong>the</strong> S. solfataricus<br />

stra<strong>in</strong>s P1, P2, and 98/2, which all share many identical<br />

spacers (17, 25) despite <strong>the</strong>ir be<strong>in</strong>g as distant from one ano<strong>the</strong>r,<br />

phylogenetically, as <strong>the</strong> S. islandicus stra<strong>in</strong>s (Fig. 2).<br />

Thus, it seems that diversification of genomic <strong>CRISPR</strong> loci can<br />

occur ei<strong>the</strong>r by simple spacer turnover or by horizontal transfer<br />

of whole or partial <strong>CRISPR</strong>/cas cassettes. There is <strong>in</strong>creas<strong>in</strong>g<br />

evidence for <strong>the</strong> latter mechanism be<strong>in</strong>g <strong>the</strong> most common one<br />

<strong>in</strong> S. islandicus stra<strong>in</strong>s (17, 21).<br />

S<strong>in</strong>ce many of <strong>the</strong> characterized viruses and plasmids of Sulfolobus<br />

derive from Iceland, we analyzed <strong>the</strong> degree to which<br />

<strong>CRISPR</strong> spacer sequences of <strong>the</strong> Icelandic stra<strong>in</strong>s yielded significant<br />

matches to genetic element sequences us<strong>in</strong>g an earlier approach<br />

exam<strong>in</strong><strong>in</strong>g nucleotide and translated sequences of <strong>the</strong><br />

spacers (25, 49). Several significant sequence matches were detected<br />

for both of <strong>the</strong> genomes, primarily to rudiviruses, fuselloviruses,<br />

and conjugative plasmids, all of which are abundant <strong>in</strong><br />

Icelandic hot spr<strong>in</strong>gs (63), but also were detected <strong>in</strong> smaller numbers<br />

to o<strong>the</strong>r viruses and cryptic plasmids (Fig. 3B).<br />

DISCUSSION<br />

The genome analyses underl<strong>in</strong>e <strong>the</strong> potential importance of<br />

S. islandicus stra<strong>in</strong> REY15A as a model organism for molec-


VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1679<br />

ular genetic studies of <strong>the</strong> Sulfolobales, and crenarchaea <strong>in</strong><br />

general, for a variety of reasons. The genome size of 2.5 Mb is<br />

m<strong>in</strong>imal for a Sulfolobus species; moreover, <strong>the</strong> <strong>in</strong>cidence of<br />

mobile elements is relatively low (Table 1), and stable deletion<br />

mutants can be readily isolated (14, 20). Fur<strong>the</strong>rmore, <strong>the</strong> high<br />

<strong>in</strong>cidence of diverse ABC transporter <strong>system</strong>s (Table 1) may<br />

expla<strong>in</strong> why S. islandicus (and S. solfataricus) is most commonly<br />

isolated from enrichment cultures obta<strong>in</strong>ed from terrestrial<br />

acidic hot spr<strong>in</strong>gs, which is <strong>in</strong> contrast to, for example, S.<br />

acidocaldarius, which carries only three ABC transporters (9,<br />

44, 63).<br />

The relatively high <strong>in</strong>cidence of deletion mutants obta<strong>in</strong>ed<br />

from stra<strong>in</strong> REY15A occurs despite <strong>the</strong> presence of several<br />

transposable elements. However, <strong>in</strong> both of <strong>the</strong> Icelandic<br />

stra<strong>in</strong>s, many of <strong>the</strong> IS elements are degenerate or carry disrupted<br />

transposase genes (Table 3), consistent with <strong>the</strong> “copyand-paste”<br />

transpositional mechanism of most classes of Sulfolobus<br />

IS elements and <strong>the</strong>ir undetectably low reversibility<br />

rate (4, 41). The <strong>in</strong>ability to remove <strong>the</strong> elements by spontaneous<br />

deletion, which does occur <strong>in</strong> many bacteria (16), may<br />

also expla<strong>in</strong> <strong>the</strong> presence of antisense RNAs <strong>in</strong> Sulfolobus<br />

species to regulate transposase activity (55). The Icelandic<br />

stra<strong>in</strong>s do, however, carry many copies of orphan orfB elements<br />

and SMN1 MITEs, which are mobilized by a “cutand-paste”<br />

mechanism presumably through OrfA encoded<br />

<strong>in</strong> IS element ISC1733 (2, 16). The SMN1 MITEs appear to<br />

be specific to <strong>the</strong> Icelandic and Kamchatka stra<strong>in</strong>s (Table 1),<br />

and <strong>the</strong>y can generate genetic novelty, reversibly, by extend<strong>in</strong>g<br />

open read<strong>in</strong>g frames, <strong>in</strong> contrast to <strong>the</strong> o<strong>the</strong>r Sulfolobus<br />

MITEs, which carry many potential stop codons <strong>in</strong> all read<strong>in</strong>g<br />

frames (43). The absence of most of <strong>the</strong> known Sulfolobus<br />

MITEs, except SM3A, probably reflects <strong>the</strong> much lower<br />

diversity of <strong>the</strong> mobiliz<strong>in</strong>g transposases present (Table 3).<br />

Many of <strong>the</strong>se elements are located <strong>in</strong> <strong>the</strong> large variable<br />

region where genetic diversification occurs, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong><br />

uptake and loss of operons and gene cassettes and rearrangements<br />

of ma<strong>in</strong>ly nonessential genes. A similar variable<br />

genetic region <strong>in</strong> many genetic elements of Sulfolobus has<br />

also been observed (e.g., see reference 18).<br />

Many questions concern<strong>in</strong>g <strong>the</strong> exceptional molecular and<br />

cellular properties of crenarchaeal organisms rema<strong>in</strong> to be<br />

resolved. They <strong>in</strong>clude <strong>the</strong> functions of <strong>the</strong> multiple and highly<br />

diverse gene pairs encod<strong>in</strong>g VapBC antitox<strong>in</strong>-tox<strong>in</strong>s. For hyper<strong>the</strong>rmophilic<br />

Sulfolobus species, <strong>in</strong> particular, <strong>the</strong>ir presence<br />

and variety could be a prerequisite for adaptation to life<br />

under extreme, and sometimes rapidly vary<strong>in</strong>g, temperature<br />

and pH conditions, as well as to survival <strong>in</strong> nutrient-poor environments<br />

possibly by optimiz<strong>in</strong>g <strong>the</strong> quality control of gene<br />

expression (12, 34). They may also be related to <strong>the</strong> sulfolobic<strong>in</strong>s<br />

implicated <strong>in</strong> kill<strong>in</strong>g competitor Sulfolobus cells (39).<br />

The crystal structure of a VapC tox<strong>in</strong> from <strong>the</strong> crenarchaeal<br />

hyper<strong>the</strong>rmophile Pyrobaculum aerophilum implicated <strong>the</strong> prote<strong>in</strong><br />

<strong>in</strong> exonuclease activity (1), but <strong>the</strong> multiplicity and wide<br />

sequence diversity of <strong>the</strong> vapBC genes suggest that <strong>the</strong> tox<strong>in</strong>s<br />

target different cellular or molecular sites.<br />

Stra<strong>in</strong> HVE10/4 has been used as a host for a variety of<br />

genetic elements, ma<strong>in</strong>ly from Iceland, which were likely to be<br />

genetically close to <strong>the</strong> Icelandic host (63). The genome analyses<br />

provide few <strong>in</strong>sights <strong>in</strong>to why it is a good host, especially<br />

s<strong>in</strong>ce it appears to carry a type 1 restriction-modification sys-<br />

tem (SiH1435 to SiH1437). Moreover, <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />

<strong>CRISPR</strong>/Cmr modules of stra<strong>in</strong> HVE10/4 are relatively complex,<br />

as <strong>the</strong>y also are for stra<strong>in</strong> REY15A and o<strong>the</strong>r Sulfolobus<br />

stra<strong>in</strong>s. Their activities have also been demonstrated, at least<br />

for stra<strong>in</strong> REY15A, by challeng<strong>in</strong>g <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s<br />

with vector-borne match<strong>in</strong>g protospacers ma<strong>in</strong>ta<strong>in</strong>ed under<br />

selection, which produced deletions of <strong>the</strong> match<strong>in</strong>g spacers<br />

(20). The puzzle rema<strong>in</strong>s as to why <strong>the</strong> Sulfolobus <strong>CRISPR</strong>based<br />

<strong>system</strong>s are so complex, given that many of <strong>the</strong> viruses<br />

and plasmids coexist at low copy numbers and are nonlytic.<br />

One possibility is that <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> primarily has a<br />

regulatory role, with antisense crRNAs (<strong>CRISPR</strong> RNAs) target<strong>in</strong>g<br />

viral mRNAs. Whatever <strong>the</strong> reason, <strong>the</strong> genetic closeness<br />

of stra<strong>in</strong>s REY15A and HVE10/4 suggests that <strong>the</strong> former<br />

may also be a broad host for viruses and plasmids, with <strong>the</strong><br />

added advantage that genetic manipulation <strong>system</strong>s are now<br />

available, and our prelim<strong>in</strong>ary studies with fuselloviruses and<br />

conjugative plasmids support this supposition.<br />

ACKNOWLEDGMENTS<br />

This research was supported by grants from <strong>the</strong> National Natural<br />

Science Foundation of Ch<strong>in</strong>a (grants 306210165, 30730003, and<br />

30870058) to L.H., a grant from <strong>the</strong> Danish Research Council for<br />

Technology and Production (grant 09-062932) to Q.S., and grants from<br />

<strong>the</strong> Danish Natural Science Research Council (grant 272-08-0391) and<br />

Danish National Research Foundation to R.A.G.<br />

REFERENCES<br />

1. Arcus, V. L., K. Bäckbro, A. Roost, E. L. Daniel, and E. N. Baker. 2004.<br />

Distant structural homology leads to <strong>the</strong> functional characterisation of an<br />

archaeal PIN doma<strong>in</strong> as an exonuclease. J. Biol. Chem. 279:16471–16478.<br />

2. Berkner, S., and G. Lipps. 2007. An active nonautonomous mobile element<br />

<strong>in</strong> Sulfolobus islandicus REN1H1. J. Bacteriol. 189:2145–2149.<br />

3. Bize, A., et al. 2009. A unique virus release mechanism <strong>in</strong> archaea. Proc. Natl.<br />

Acad. Sci. U. S. A. 106:11306–11311.<br />

4. Blount, Z. D., and D. W. Grogan. 2005. New <strong>in</strong>sertion sequences of Sulfolobus:<br />

functional properties and implications for genome evolution <strong>in</strong> hyper<strong>the</strong>rmophilic<br />

archaea. Mol. Microbiol. 55:312–325.<br />

5. Brügger, K., X. Peng, and R. A. Garrett. 2007. Sulfolobus genomes: mechanisms<br />

of rearrangement and charge, p. 95–104. In R. A. Garrett and H.-P.<br />

Klenk (ed.), <strong>Archaea</strong>: evolution, physiology, and molecular biology. Blackwell<br />

Publish<strong>in</strong>g, Oxford, United K<strong>in</strong>gdom.<br />

6. Brügger, K., et al. 2002. Mobile elements <strong>in</strong> archaeal genomes. FEMS<br />

Microbiol. Lett. 206:131–141.<br />

7. Brügger, K., E. Torar<strong>in</strong>sson, P. Redder, L. Chen, and R. A. Garrett. 2004.<br />

Shuffl<strong>in</strong>g of Sulfolobus genomes by autonomous and non-autonomous mobile<br />

elements. Biochem. Soc. Trans. 32:179–183.<br />

8. Brumfield, S. K., et al. 2009. Particle assembly and ultrastructural features<br />

associated with <strong>the</strong> replication of <strong>the</strong> lytic archaeal virus Sulfolobus turreted<br />

icosahedral virus. J. Virol. 83:5964–5970.<br />

9. Chen, L., et al. 2005. The genome of Sulfolobus acidocaldarius, a model<br />

organism of <strong>the</strong> Crenarchaeota. J. Bacteriol. 187:4992–4999.<br />

10. Cobucci-Ponzano, B., et al. 2010. Functional characterisation and highthroughput<br />

proteomic analysis of <strong>in</strong>terrupted genes <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />

solfataricus. J. Proteome Res. 9:2496–2507.<br />

11. Contursi, P., et al. 2006. Characterisation of <strong>the</strong> Sulfolobus host-SSV2 virus<br />

<strong>in</strong>teraction. Extremophiles 10:615–627.<br />

12. Cooper, C. R., A. J. Daugherty, S. Tachdjian, P. H. Blum, and R. M. Kelly.<br />

2009. Role of vapBC tox<strong>in</strong>-antitox<strong>in</strong> loci <strong>in</strong> <strong>the</strong> <strong>the</strong>rmal stress response of<br />

Sulfolobus solfataricus. Biochem. Soc. Trans. 37:123–126.<br />

13. Delcher, A. L., K. A. Bratke, E. C. Powers, and S. L. Salzberg. 2007. Identify<strong>in</strong>g<br />

bacterial genes and endosymbiont DNA with Glimmer. Bio<strong>in</strong>formatics<br />

23:673–679.<br />

14. Deng, L., H. Zhu, Z. Chen, Y. X. Liang, and Q. She. 2009. Unmarked gene<br />

deletion and host-vector <strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon<br />

Sulfolobus islandicus. Extremophiles 13:735–746.<br />

15. Elfer<strong>in</strong>k, M. G., S. V. Albers, W. N. Kon<strong>in</strong>gs, and A. J. Driessen. 2001. Sugar<br />

transport <strong>in</strong> Sulfolobus solfataricus is mediated by two families of b<strong>in</strong>d<strong>in</strong>g<br />

prote<strong>in</strong>-dependent ABC transporters. Mol. Microbiol. 39:1494–1503.<br />

16. Filée, J., P. Siguier, and M. Chandler. 2007. Insertion sequence diversity <strong>in</strong><br />

archaea. Microbiol. Mol. Biol. Rev. 71:121–157.<br />

17. Garrett, R. A., et al. 2011. <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales:<br />

complexity and diversity. Biochem. Soc. Trans. 39:51–57.


1680 GUO ET AL. J. BACTERIOL.<br />

18. Greve, B., S. Jensen, K. Brügger, W. Zillig, and R. A. Garrett. 2004. Genomic<br />

comparison of archaeal conjugative plasmids from Sulfolobus. <strong>Archaea</strong><br />

1:231–239.<br />

19. Grogan, D. W. 1989. Phenotypic characterization of <strong>the</strong> archaebacterial<br />

genus Sulfolobus: comparison of five wild-type stra<strong>in</strong>s. J. Bacteriol. 171:6710–<br />

6719.<br />

20. Gudbergsdottir, S., et al. 2011. Dynamic properties of <strong>the</strong> Sulfolobus<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vector-borne<br />

viral and plasmid genes and protospacers. Mol. Microbiol. 79:35–49.<br />

21. Held, N. L., A. Herrera, H. Cadillo-Quiroz, and R. J. Whitaker. 2010.<br />

<strong>CRISPR</strong> associated diversity with<strong>in</strong> a population of Sulfolobus islandicus.<br />

PLoS One 5:e12988.<br />

22. Jonuscheit, M., E. Martusewitsch, K. M. Stedman, and C. Schleper. 2003. A<br />

reporter gene <strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus<br />

based on a selectable and <strong>in</strong>tegrative shuttle vector. Mol. Microbiol.<br />

48:1241–1252.<br />

23. Kawarabayasi, Y., et al. 2001. Complete genome sequence of an aerobic<br />

<strong>the</strong>rmoacidophilic crenarchaeon, Sulfolobus tokodaii stra<strong>in</strong> 7. DNA Res.<br />

8:123–140.<br />

24. Kon<strong>in</strong>g, S. M., S. V. Albers, W. N. Kon<strong>in</strong>gs, and A. J. Driessen. 2002. Sugar<br />

transport <strong>in</strong> (hyper)<strong>the</strong>rmophilic archaea. Res. Microbiol. 153:61–67.<br />

25. Lillestøl, R. K., et al. 2009. <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />

Sulfolobus: bidirectional transcription and dynamic properties. Mol. Microbiol.<br />

72:259–272.<br />

26. Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved<br />

detection of transfer RNA genes <strong>in</strong> genomic sequence. Nucleic Acids Res.<br />

25:955–964.<br />

27. Lundgren, M., A. Andersson, L. Chen, P. Nilsson, and R. Bernander. 2004.<br />

Three replication orig<strong>in</strong>s <strong>in</strong> Sulfolobus species: synchronous <strong>in</strong>itiation of<br />

chromosome replication and asynchronous term<strong>in</strong>ation. Proc. Natl. Acad.<br />

Sci. U. S. A. 101:7046–7051.<br />

28. Mahillon, J., and M. Chandler. 1998. Insertion sequences. Microbiol. Mol.<br />

Biol. Rev. 62:725–774.<br />

29. Marck, C., and H. Grosjean. 2003. Identification of BHB splic<strong>in</strong>g motifs <strong>in</strong><br />

i<strong>in</strong>tron-conta<strong>in</strong><strong>in</strong>g tRNAs from 18 archaea: evolutionary implications. RNA<br />

9:1516–1531.<br />

30. Martusewitsch, E., C. W. Sensen, and C. Schleper. 2000. High spontaneous<br />

mutation rate <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus is<br />

mediated by transposable elements. J. Bacteriol. 182:2574–2581.<br />

31. Muller, S., et al. 2009. Deficiency of <strong>the</strong> tRNA Tyr :35-synthase aPus7 <strong>in</strong><br />

<strong>Archaea</strong> of <strong>the</strong> Sulfolobales order might be rescued by <strong>the</strong> H/ACA sRNAguided<br />

mach<strong>in</strong>ery. Nucleic Acids Res. 37:1308–1322.<br />

32. Muskhelishvili, G., P. Palm, and W. Zillig. 1993. SSV1-encoded site-specific<br />

recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol. Gen. Genet. 273:334–342.<br />

33. Omer, A. D., M. Zago, A. Chang, and P. P. Dennis. 2006. Prob<strong>in</strong>g <strong>the</strong><br />

structure and function of an archaeal C/D-box methylation guide sRNA.<br />

RNA 12:1708–1720.<br />

34. Pandey, D. P., and K. Gerdes. 2005. Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly abundant<br />

<strong>in</strong> free-liv<strong>in</strong>g but lost from host-associated prokaryotes. Nucleic Acids Res.<br />

33:966–976.<br />

35. Peng, N., Q. Xia, Z. Chen, Y. X. Liang, and Q. She. 2009. An upstream<br />

activation element exert<strong>in</strong>g differential transcription activation on an<br />

archaeal promoter. Mol. Microbiol. 74:928–939.<br />

36. Peyfoon, E., et al. 2010. The S-layer glycoprote<strong>in</strong> of <strong>the</strong> crenarchaeote Sulfolobus<br />

acidocaldarius is glycosylated at multiple sites with chitobiose-l<strong>in</strong>ked<br />

N-glycans. <strong>Archaea</strong> pii:754101.<br />

37. Prangishvili, D., et al. 1998. Conjugation <strong>in</strong> archaea: frequent occurrence of<br />

conjugative plasmids <strong>in</strong> Sulfolobus. Plasmid 40:190–202.<br />

38. Prangishvili, D., P. P. Forterre, and R. A. Garrett. 2006. Viruses of <strong>the</strong><br />

<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 4:837–848.<br />

39. Prangishvili, D., et al. 2000. Sulfolobic<strong>in</strong>s, specific prote<strong>in</strong>aceous tox<strong>in</strong>s produced<br />

by stra<strong>in</strong>s of <strong>the</strong> extremely <strong>the</strong>rmophilic archaeal genus Sulfolobus. J.<br />

Bacteriol. 182:2985–2988.<br />

40. Prangishvili, D., et al. 2006. Structural and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

archaeal virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />

cycle. J. Mol. Biol. 359:1203–1216.<br />

41. Redder, P., and R. A. Garrett. 2006. Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />

genome of Sulfolobus solfataricus P2. J. Bacteriol. 188:4198–4206.<br />

42. Redder, P., et al. 2009. Four newly isolated fuselloviruses from extreme<br />

geo<strong>the</strong>rmal environments reveal unusual morphologies and a possible <strong>in</strong>terviral<br />

recomb<strong>in</strong>ation mechanism. Environ. Microbiol. 11:2849–2862.<br />

43. Redder, P., Q. She, and R. A. Garrett. 2001. Non-autonomous elements <strong>in</strong><br />

<strong>the</strong> crenarchaeon Sulfolobus solfataricus. J. Mol. Biol. 306:1–6.<br />

44. Reno, M. L., N. L. Held, C. J. Fields, P. V. Burke, and R. J. Whitaker. 2009.<br />

Sulfolobus islandicus pan-genome. Proc. Natl. Acad. Sci. U. S. A. 106:8605–<br />

8610. (Erratum, 106:18873.)<br />

45. Rob<strong>in</strong>son, N. P., and S. D. Bell. 2007. Extrachromosomal element capture<br />

and <strong>the</strong> evolution of multiple replication orig<strong>in</strong>s <strong>in</strong> archaeal chromosomes.<br />

Proc. Natl. Acad. Sci. U. S. A. 104:5806–5811.<br />

46. Rob<strong>in</strong>son, N. P., et al. 2004. Identification of two orig<strong>in</strong>s of replication <strong>in</strong> <strong>the</strong><br />

s<strong>in</strong>gle chromosome of <strong>the</strong> archaeon Sulfolobus solfataricus. Cell 116:25–38.<br />

47. Ru<strong>the</strong>rford, K., et al. 2000. Artemis: sequence visualization and annotation.<br />

Bio<strong>in</strong>formatics 16:944–945.<br />

48. Shah, S. A., and R. A. Garrett. 2011. <strong>CRISPR</strong>/Cas and Cmr modules,<br />

mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res. Microbiol. 162:<br />

27–38.<br />

49. Shah, S. A., N. R. Hansen, and R. A. Garrett. 2009. Distributions of <strong>CRISPR</strong><br />

spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />

and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism. Trans. Biochem. Soc. 37:<br />

23–28.<br />

50. She, Q., et al. 2008. Host-vector <strong>system</strong>s for hyper<strong>the</strong>rmophilic archaeon<br />

Sulfolobus, p. 151–156. In S.-J. Liu and H. L. Drake (ed.), Microbes and <strong>the</strong><br />

environment: perspective and challenges. Science Press, Beij<strong>in</strong>g, Ch<strong>in</strong>a.<br />

51. She, Q., X. Peng, W. Zillig, and R. A. Garrett. 2001. Gene capture <strong>in</strong> archaeal<br />

chromosomes. Nature 409:478.<br />

52. She, Q., B. Shen, and L. Chen. 2004. <strong>Archaea</strong>l <strong>in</strong>tegrases and mechanisms of<br />

gene capture. Biochem. Soc. Trans. 22:222–226.<br />

53. She, Q., et al. 2001. The complete genome of <strong>the</strong> crenarchaeon Sulfolobus<br />

solfataricus P2. Proc. Natl. Acad. Sci. U. S. A. 98:7835–7840.<br />

54. She, Q., et al. 2009. Genetic analyses <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon<br />

Sulfolobus islandicus. Biochem. Soc. Trans. 37:92–96.<br />

55. Tang, T.-H., et al. 2005. Identification of novel non-cod<strong>in</strong>g RNAs as potential<br />

antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus. Mol. Microbiol.<br />

55:469–481.<br />

56. Torar<strong>in</strong>sson, E., H.-P. Klenk, and R. A. Garrett. 2005. Divergent transcriptional<br />

and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7:47–54.<br />

57. Wagner, M., et al. 2009. Expand<strong>in</strong>g and understand<strong>in</strong>g <strong>the</strong> genetic toolbox of<br />

<strong>the</strong> hyper<strong>the</strong>rmophilic genus Sulfolobus. Biochem. Soc. Trans. 37:97–101.<br />

58. Worth<strong>in</strong>gton, P., V. Hoang, F. Perez-Pomares, and P. Blum. 2003. Targeted<br />

disruption of <strong>the</strong> alpha-amylase gene <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon<br />

Sulfolobus solfataricus. J. Bacteriol. 185:482–488.<br />

59. Wurtzel, O., et al. 2010. A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />

Genome Res. 20:133–141.<br />

60. Yokobori, S., et al. 2009. Ga<strong>in</strong> and loss of an <strong>in</strong>tron <strong>in</strong> a prote<strong>in</strong>-cod<strong>in</strong>g gene<br />

<strong>in</strong> <strong>Archaea</strong>: <strong>the</strong> case of an archaeal RNA pseudourid<strong>in</strong>e synthase gene. BMC<br />

Evol. Biol. 9:198.<br />

61. Zago, M. A., P. P. Dennis, and A. D. Omer. 2005. The expand<strong>in</strong>g world of<br />

small RNAs <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus. Mol.<br />

Microbiol. 55:1812–1828.<br />

62. Zhang, C., et al. 2010. Reveal<strong>in</strong>g <strong>the</strong> essentiality of multiple archaeal pcna<br />

genes us<strong>in</strong>g a mutant propagation assay based on an improved knockout<br />

method. Microbiology 156:3386–3397.<br />

63. Zillig, W., et al. 1998. Genetic elements <strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic<br />

archaeon Sulfolobus. Extremophiles2:131–140.


Biochemical Society Transactions www.biochemsoctrans.org<br />

<strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong><br />

Sulfolobales: complexity and diversity<br />

Roger A. Garrett 1 ,ShirazA.Shah,GisleVestergaard,L<strong>in</strong>gDeng,SoleyGudbergsdottir,ChandraS.Kenchappa,<br />

Susanne Erdmann and Qunx<strong>in</strong> She<br />

<strong>Archaea</strong> Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200N Copenhagen K, Denmark<br />

Abstract<br />

<strong>CRISPR</strong> (cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s of Sulfolobus,<br />

target<strong>in</strong>g DNA and RNA respectively of <strong>in</strong>vad<strong>in</strong>g viruses or plasmids are complex and diverse. We address<br />

<strong>the</strong>ir classification and functional diversity, and <strong>the</strong> wide sequence diversity of RAMP (repeat-associated<br />

mysterious prote<strong>in</strong>)-motif conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s encoded <strong>in</strong> Cmr modules. Factors <strong>in</strong>fluenc<strong>in</strong>g ma<strong>in</strong>tenance<br />

of partially impaired <strong>CRISPR</strong>-based <strong>system</strong>s are discussed. The capacity for whole <strong>CRISPR</strong> transcripts to be<br />

generated despite <strong>the</strong> uptake of transcription signals with<strong>in</strong> spacer sequences is considered. Target<strong>in</strong>g of<br />

protospacer regions of <strong>in</strong>vad<strong>in</strong>g elements by Cas prote<strong>in</strong>–crRNA (<strong>CRISPR</strong> RNA) complexes exhibit relatively<br />

low sequence str<strong>in</strong>gency, but <strong>the</strong> <strong>in</strong>tegrity of protospacer-associated motifs appears to be important.<br />

Different mechanisms for circumvent<strong>in</strong>g or <strong>in</strong>activat<strong>in</strong>g <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are presented.<br />

Introduction<br />

The discovery of <strong>the</strong> widespread occurrence of <strong>CRISPR</strong><br />

(cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeat)-based<br />

<strong>immune</strong> <strong>system</strong>s <strong>in</strong> archaea and bacteria has provided<br />

important <strong>in</strong>sights <strong>in</strong>to how hosts can <strong>in</strong>activate and<br />

or regulate <strong>in</strong>vad<strong>in</strong>g foreign DNA and, probably, RNA<br />

genetic elements. In addition, <strong>the</strong>se <strong>system</strong>s are likely to<br />

<strong>in</strong>fluence how co-<strong>in</strong>vad<strong>in</strong>g genetic elements can <strong>in</strong>fluence one<br />

ano<strong>the</strong>r [1,2]. The two ma<strong>in</strong> molecular apparatus <strong>in</strong>volved<br />

are structurally complex, partially <strong>in</strong>dependent and have<br />

diversified functionally. Moreover, <strong>the</strong>ir capacity to facilitate<br />

<strong>the</strong> cont<strong>in</strong>ual uptake of foreign DNA <strong>in</strong>to host chromosomes,<br />

and <strong>the</strong>ir propensity for transfer between organisms, has<br />

important implications for cellular evolution.<br />

The genus Sulfolobus provides an important model <strong>system</strong><br />

for study<strong>in</strong>g <strong>the</strong>se <strong>immune</strong> <strong>system</strong>s. Most Sulfolobus species<br />

carry complex and diverse <strong>CRISPR</strong>-based <strong>system</strong>s and appear<br />

to be particularly active <strong>in</strong> <strong>the</strong> uptake of foreign DNA <strong>in</strong>serts<br />

<strong>in</strong>to <strong>the</strong>ir <strong>CRISPR</strong> loci. Fur<strong>the</strong>rmore, a broad collection<br />

of Sulfolobus genetic elements is available that can be used<br />

to challenge <strong>the</strong> <strong>CRISPR</strong>-based <strong>system</strong>s [3]. It <strong>in</strong>cludes<br />

numerous diverse viruses many of which have been classified<br />

<strong>in</strong>to eight new viral families [4,5] as well as a family of<br />

plasmids encod<strong>in</strong>g an archaeal-specific conjugative apparatus<br />

[6,7].<br />

Many <strong>in</strong>sights <strong>in</strong>to <strong>the</strong> complexity of <strong>the</strong> <strong>CRISPR</strong>based<br />

<strong>immune</strong> <strong>system</strong>s, and <strong>the</strong>ir mechanistic diversity,<br />

have emerged from detailed experimental studies of CR-<br />

ISPR/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s of <strong>the</strong> archaeal genera<br />

Key words: archaeal virus, cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats/Cas module<br />

(<strong>CRISPR</strong>/Cas module), Cmr module, <strong>CRISPR</strong> RNA (crRNA), protospacer-associated motif (PAM).<br />

Abbreviations used: <strong>CRISPR</strong>, cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats; crRNA,<br />

<strong>CRISPR</strong> RNA; IS, <strong>in</strong>sertion sequence; PAM, protospacer-associated motif; RAMP, repeat-associated<br />

mysterious prote<strong>in</strong>; SIRV1, Sulfolobus islandicus rod-shaped virus 1.<br />

1 To whom correspondence should be addressed (email garrett@bio.ku.dk).<br />

Biochem. Soc. Trans. (2011) 39, 51–57; doi:10.1042/BST0390051<br />

Molecular Biology of <strong>Archaea</strong> II 51<br />

Sulfolobus and Pyrococcus respectively, and from <strong>in</strong>vestigation<br />

of bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s of Streptococcus<br />

<strong>the</strong>rmophilus [8,9], Staphylococcus epidermidis [10,11] and<br />

Escherichia coli [12]. In <strong>the</strong> present article, we focus primarily<br />

on current knowledge and ideas deriv<strong>in</strong>g from, and relat<strong>in</strong>g<br />

to, <strong>the</strong> Sulfolobus <strong>immune</strong> <strong>system</strong>s.<br />

<strong>CRISPR</strong>/Cas families: complexity,<br />

classification and versatility<br />

At an early stage, it was clear that <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />

Cmr <strong>system</strong>s were highly complex when approx. 45 different<br />

prote<strong>in</strong>s were implicated <strong>in</strong> <strong>the</strong>ir function [13], and <strong>the</strong><br />

number has cont<strong>in</strong>ued to rise [14]. Genes of <strong>the</strong> two <strong>system</strong>s<br />

are clustered <strong>in</strong>to cas and cmr cassettes which are sometimes<br />

l<strong>in</strong>ked physically. These cassettes encode a few core prote<strong>in</strong>s,<br />

but <strong>the</strong>y also carry different comb<strong>in</strong>ations of o<strong>the</strong>r genes,<br />

some occurr<strong>in</strong>g more commonly than o<strong>the</strong>rs. Thus cassettes<br />

vary markedly <strong>in</strong> <strong>the</strong>ir overall gene contents. To illustrate this,<br />

core gene structures of <strong>the</strong> archaeal cas cassettes are shown<br />

toge<strong>the</strong>r with a more complex family I cas cassette from<br />

Sulfolobus islandicus HVE10/4 (Figures 1A and 1B). The core<br />

cas genes classify <strong>in</strong>to cas group 1, implicated <strong>in</strong> <strong>CRISPR</strong><br />

acquisition of foreign DNA and <strong>in</strong>sertion <strong>in</strong>to <strong>CRISPR</strong> loci,<br />

and cas group 2 associated with crRNA (<strong>CRISPR</strong> RNA)<br />

process<strong>in</strong>g and guidance (Figure 1A).<br />

Families of <strong>CRISPR</strong>/Cas modules have been classified<br />

on <strong>the</strong> basis of gene content and gene order with<strong>in</strong> cas<br />

cassettes, and on <strong>the</strong> basis of conserved sequences of cas genes,<br />

leader regions and repeats with<strong>in</strong> <strong>CRISPR</strong>/Cas modules. For<br />

archaea, about eight families have been proposed, whereas<br />

among <strong>the</strong> Sulfolobales, three are common (I–III) and one<br />

less so (IV) [2,15,16,17].<br />

Cmr modules carry two conserved core genes, cmr2<br />

and cmr5 (Figure 2A), and a variable number of genes<br />

C○The Authors Journal compilation C○2011 Biochemical Society


52 Biochemical Society Transactions (2011) Volume 39, part 1<br />

Figure 1 Core genes of archaeal cas cassettes<br />

(A) Core genes are divided <strong>in</strong>to putative functional cas groups 1 and 2 (see <strong>the</strong> text) and <strong>the</strong> cas6 gene, which encodes<br />

an RNA-process<strong>in</strong>g enzyme [18]. (B) Genetic map of a family I <strong>CRISPR</strong>/Cas module of S. islandicus stra<strong>in</strong> HVE10/4 carry<strong>in</strong>g<br />

several non-core cas genes.<br />

encod<strong>in</strong>g diverse prote<strong>in</strong>s which carry RAMP (repeatassociated<br />

mysterious prote<strong>in</strong>) motifs. The Cmr modules<br />

can be classified <strong>in</strong>to five ma<strong>in</strong> families A, B, C, D and E<br />

for archaea on <strong>the</strong> basis of phylogenetic tree build<strong>in</strong>g us<strong>in</strong>g<br />

sequences of Cmr2 and its homologues Csm1 and Csx11<br />

(Figure 2B), where most Sulfolobus Cmr modules fall with<strong>in</strong><br />

families B or D. Fur<strong>the</strong>r classification is complicated by<br />

<strong>the</strong> presence of multiple diverse copies of genes cod<strong>in</strong>g for<br />

RAMP-motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s. Although <strong>the</strong>se prote<strong>in</strong>s<br />

can be classified <strong>in</strong>to families on <strong>the</strong> basis of <strong>the</strong>se motifs,<br />

<strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> prote<strong>in</strong> sequences tend to be highly<br />

divergent, as illustrated for four prote<strong>in</strong>s encoded <strong>in</strong> a Cmr<br />

family B module of Sulfolobus solfataricus P2 (Figure 2C).<br />

Most Sulfolobus species carry multiple <strong>CRISPR</strong>/Cas<br />

and/or Cmr modules and, given <strong>the</strong> high energy cost of<br />

ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g and express<strong>in</strong>g <strong>the</strong>m, <strong>the</strong>y must confer major<br />

advantages on to <strong>the</strong> cell. Clearly, given <strong>the</strong> molecular<br />

and mechanistic complexities of <strong>the</strong> <strong>system</strong>s, <strong>the</strong>y can be<br />

<strong>in</strong>activated readily by <strong>in</strong>curr<strong>in</strong>g a defect <strong>in</strong> a component or<br />

critical sequence motif. Moreover, <strong>the</strong> <strong>system</strong>s are potential<br />

targets for <strong>in</strong>com<strong>in</strong>g genetic elements which may attempt<br />

to <strong>in</strong>tegrate <strong>in</strong>to essential cas or cmr genes as has been<br />

observed for a viral <strong>in</strong>tegration <strong>in</strong> a csa3 gene of S. islandicus<br />

stra<strong>in</strong> M.16.4 (see below) or modify <strong>the</strong>ir prote<strong>in</strong> products<br />

or o<strong>the</strong>rwise <strong>in</strong>terfere with transcription or maturation of<br />

crRNAs. Therefore multiple <strong>system</strong>s will provide added security<br />

aga<strong>in</strong>st unwanted <strong>in</strong>vasion. The pair<strong>in</strong>g of many family<br />

I <strong>CRISPR</strong>/Cas modules may reflect a compromise between<br />

provid<strong>in</strong>g added security and generat<strong>in</strong>g more compact and<br />

efficient <strong>system</strong>s which can potentially be mobilized and<br />

transferred between organisms as s<strong>in</strong>gle units [2].<br />

A fur<strong>the</strong>r advantage may arise from <strong>the</strong> presence<br />

of different families of <strong>CRISPR</strong>/Cas modules which is<br />

commonly observed for Sulfolobus (e.g. S. solfataricus carries<br />

family I and II modules, whereas Sulfolobus acidocaldarius<br />

carries those of family II and III) [16]. Their presence may<br />

<strong>in</strong>crease versatility <strong>in</strong> both <strong>the</strong> uptake of spacers and target<strong>in</strong>g<br />

of protospacers with different PAMs (protospacer-associated<br />

motifs).<br />

The presence of multiple Cmr modules is also likely to<br />

confer functional versatility, although <strong>the</strong>y are subject to <strong>the</strong><br />

constra<strong>in</strong>t that some encoded prote<strong>in</strong>s must be able to<br />

recognize part of <strong>the</strong> repeat sequence of <strong>the</strong> co-<strong>in</strong>habit<strong>in</strong>g<br />

<strong>CRISPR</strong>/Cas module [18,19]. Cmr modules are sometimes<br />

l<strong>in</strong>ked directly to <strong>CRISPR</strong>/Cas modules on chromosomes<br />

C○The Authors Journal compilation C○2011 Biochemical Society<br />

and, given <strong>the</strong>ir functional <strong>in</strong>terdependence, <strong>the</strong>re is likely<br />

to have been some co-evolution of <strong>the</strong> coupled <strong>system</strong>s.<br />

Consistent with this view, analysis of <strong>the</strong> Sulfolobales<br />

suggests that Cmr family D modules (Figure 2B) are<br />

commonly, but not exclusively, found toge<strong>the</strong>r with family<br />

II <strong>CRISPR</strong>/Cas modules.<br />

<strong>CRISPR</strong> loci: structural and functional<br />

complexity<br />

<strong>CRISPR</strong> loci consist of regularly spaced direct repeat<br />

sequences with <strong>in</strong>terven<strong>in</strong>g spacers deriv<strong>in</strong>g from <strong>in</strong>vad<strong>in</strong>g<br />

foreign DNA elements. <strong>Archaea</strong>l repeats fall <strong>in</strong> <strong>the</strong> size<br />

range 23–37 bp and most spacers are 25–50 bp long [20].<br />

<strong>CRISPR</strong> loci are preceded by a leader region which varies<br />

<strong>in</strong> size from approx. 150 to 550 bp and shows levels of<br />

sequence conservation which are only considered significant<br />

with<strong>in</strong> specific families of <strong>CRISPR</strong>/Cas modules. <strong>CRISPR</strong><br />

locus sizes can also vary considerably, suggest<strong>in</strong>g that rates<br />

of spacer turnover differ markedly for different <strong>CRISPR</strong><br />

loci with<strong>in</strong> a given archaeon. But <strong>the</strong>re is no support for<br />

differences occurr<strong>in</strong>g between <strong>the</strong> <strong>CRISPR</strong>/Cas families of<br />

<strong>the</strong> Sulfolobales, s<strong>in</strong>ce large and small clusters exist for <strong>the</strong><br />

most common families I, II and III.<br />

In organisms carry<strong>in</strong>g several <strong>CRISPR</strong>/Cas modules,<br />

<strong>in</strong>clud<strong>in</strong>g S. solfataricus stra<strong>in</strong>s P1 and P2 with six, and S.<br />

acidocaldarius with five, <strong>the</strong>y may not all be fully functional.<br />

The <strong>CRISPR</strong>/Cas <strong>system</strong> exhibits two partially <strong>in</strong>dependent<br />

functions with one group of Cas prote<strong>in</strong>s responsible for<br />

uptake of <strong>in</strong>vader DNA <strong>in</strong>to <strong>CRISPR</strong> loci and <strong>the</strong> o<strong>the</strong>r<br />

for generat<strong>in</strong>g crRNAs and guid<strong>in</strong>g <strong>the</strong>m to <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g<br />

genetic element (Figure 1). Only <strong>the</strong> latter prote<strong>in</strong>s are<br />

essential for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> to function. Thus nonextend<strong>in</strong>g<br />

<strong>CRISPR</strong> loci may still be useful to cells as long<br />

crRNAs are generated. S. acidocaldarius carries two large<br />

loci and three smaller ones of 11, five and two spacerrepeat<br />

units. All five clusters were transcribed and processed<br />

to mature crRNAs [16], but possibly <strong>the</strong> spacer addition<br />

functions are defective for <strong>the</strong> small clusters. Similarly, for<br />

S. solfataricus P1 and P2, of <strong>the</strong> six <strong>CRISPR</strong> loci, only four<br />

appear to be active <strong>in</strong> elongation. Of <strong>the</strong> o<strong>the</strong>r two, <strong>the</strong><br />

smallest (locus E) carries six spacer-repeat units with a leader<br />

and no cas genes [16] and does not appear to be transcribed<br />

[21]. It carries spacers match<strong>in</strong>g rudiviruses and a conjugative<br />

plasmid and is conserved <strong>in</strong> three S. solfataricus stra<strong>in</strong>s (two


Figure 2 Classification of archaeal Cmr modules<br />

(A) Gene map of an archaeal Cmr module show<strong>in</strong>g <strong>the</strong> conserved core prote<strong>in</strong>s Cmr2 and Cmr5, and <strong>the</strong> grey boxes represent<br />

genes encod<strong>in</strong>g different prote<strong>in</strong>s which carry RAMP motifs. (B) Phylogenetic tree for archaeal Cmr modules based on <strong>the</strong><br />

Cmr2 prote<strong>in</strong> sequence show<strong>in</strong>g five ma<strong>in</strong> families: A, B, C, D and E. The total number of different prote<strong>in</strong>s <strong>in</strong> each family<br />

carry<strong>in</strong>g RAMP motifs is given <strong>in</strong> paren<strong>the</strong>ses. Trees were prepared us<strong>in</strong>g <strong>the</strong> MUSCLE and ClustalW programs as described<br />

previously [17]. (C) MapsoffourRAMPmotif-conta<strong>in</strong><strong>in</strong>gprote<strong>in</strong>swith<strong>in</strong>as<strong>in</strong>gleCmrfamilyBmoduleofS. solfataricus P2.<br />

They illustrate <strong>the</strong> diverse locations of <strong>the</strong> two conserved am<strong>in</strong>o acid sequence regions (1 and 2), determ<strong>in</strong>ed us<strong>in</strong>g <strong>the</strong><br />

MEME program [45]. The rema<strong>in</strong><strong>in</strong>g sequence regions show very low levels of sequence similarity.<br />

Molecular Biology of <strong>Archaea</strong> II 53<br />

C○The Authors Journal compilation C○2011 Biochemical Society


54 Biochemical Society Transactions (2011) Volume 39, part 1<br />

Figure 3 A map of <strong>the</strong> <strong>CRISPR</strong> locus E<br />

Locus E is found <strong>in</strong> S. solfataricus stra<strong>in</strong>s P1, P2 and 98/2 and <strong>the</strong><br />

S. islandicus stra<strong>in</strong> L.D.8.5 [22]. Triangles represent spacer-repeat units<br />

that are colour-coded for match<strong>in</strong>g sequences: red, rudivirus and blue,<br />

conjugative plasmid. The shaded spacer-repeat units carry identical<br />

sequences. L represents <strong>the</strong> leader region. The 36 kb genomic region<br />

flank<strong>in</strong>g <strong>the</strong> locus (grey region) is conserved at >99% sequence identity<br />

<strong>in</strong> all four stra<strong>in</strong>s.<br />

from Naples, Italy) with only <strong>the</strong> f<strong>in</strong>al downstream spacer<br />

differ<strong>in</strong>g between <strong>the</strong> P1/P2 stra<strong>in</strong>s and stra<strong>in</strong> 98/2 (Figure 3).<br />

Moreover, it is also found on a highly conserved 36 kb<br />

chromosomal fragment (99% sequence identity) <strong>in</strong> <strong>the</strong> S.<br />

islandicus stra<strong>in</strong> L.D.8.5 (from Lassen, CA, U.S.A.) [22],<br />

with an almost identical leader region (one mismatch) and<br />

identical repeat sequence but different spacers (Figure 3). The<br />

ma<strong>in</strong>tenance and spread<strong>in</strong>g of locus E, lack<strong>in</strong>g a cas cassette,<br />

would suggest that <strong>the</strong> <strong>CRISPR</strong> module can be activated and<br />

generate crRNAs. The <strong>in</strong>ference that Cas prote<strong>in</strong>s encoded<br />

<strong>in</strong> one <strong>CRISPR</strong>/Cas module can activate o<strong>the</strong>r <strong>CRISPR</strong> loci<br />

would also be consistent with <strong>the</strong> <strong>in</strong>ference that <strong>the</strong> group<br />

1 cas genes (Figure 1A) can exchange between <strong>CRISPR</strong>/Cas<br />

modules [2].<br />

The large <strong>in</strong>active locus F with 88 spacer-repeat units,<br />

is completely conserved <strong>in</strong> sequence between S. solfataricus<br />

stra<strong>in</strong>s P1 and P2, but it lacks a leader region, and, although<br />

transcription occurs <strong>in</strong>ternally with<strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> locus,<br />

mature crRNAs are not generated [21,23]. Thus <strong>the</strong> latter,<br />

which has been lost from S. solfataricus stra<strong>in</strong> 98/2, may be<br />

of little use when a viral <strong>in</strong>fection occurred.<br />

Generally for Sulfolobus species, loss of mobile DNA<br />

elements is difficult, thus IS (<strong>in</strong>sertion sequence) elements<br />

tend to degenerate ra<strong>the</strong>r than be deleted [24], and<br />

this may also apply to <strong>CRISPR</strong>/Cas and Cmr modules,<br />

and expla<strong>in</strong> <strong>the</strong> ma<strong>in</strong>tenance of defective <strong>CRISPR</strong> <strong>system</strong>s<br />

over long periods, although <strong>in</strong> a variant stra<strong>in</strong> of S. solfataricus<br />

P2 (P2A), four physically l<strong>in</strong>ked <strong>CRISPR</strong>/Cas modules (A–<br />

D) were apparently lost via a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />

between border<strong>in</strong>g IS elements [25].<br />

Transcription of <strong>CRISPR</strong> loci and process<strong>in</strong>g<br />

Processed <strong>CRISPR</strong> transcripts were first observed for<br />

<strong>the</strong> euryarchaeon Archaeoglobus fulgidus and crenarchaeon<br />

S. solfataricus, and <strong>the</strong>se studies revealed <strong>the</strong> regular pattern<br />

of <strong>the</strong> RNA process<strong>in</strong>g, us<strong>in</strong>g probes specific for repeat<br />

sequences [26,27]. Subsequently, <strong>the</strong> smallest Sulfolobus<br />

RNA product of approx. 40 bp was identified cover<strong>in</strong>g<br />

primarily a s<strong>in</strong>gle spacer sequence [20]. S. acidocaldarius<br />

<strong>CRISPR</strong> loci are transcribed upstream from <strong>the</strong> first repeat<br />

with<strong>in</strong> <strong>the</strong> leader region and term<strong>in</strong>ation occurs downstream<br />

C○The Authors Journal compilation C○2011 Biochemical Society<br />

from <strong>the</strong> f<strong>in</strong>al repeat. Even for <strong>the</strong> locus carry<strong>in</strong>g 78 spacerrepeat<br />

units (4930 bp), a substantial proportion of transcripts<br />

were approx. 5000 nt long with ano<strong>the</strong>r large portion <strong>in</strong> <strong>the</strong><br />

size range 3000–3500 nt [16].<br />

This raised an important question as to how transcription<br />

cont<strong>in</strong>ues throughout <strong>CRISPR</strong> loci apparently unimpeded by<br />

<strong>the</strong> presence of spacers carry<strong>in</strong>g archaea-specific promoter or<br />

term<strong>in</strong>ator motifs, given that <strong>the</strong> DNA uptake mechanism<br />

is essentially statistically random [15]. A compilation of<br />

potential promoter and term<strong>in</strong>ator motifs on <strong>the</strong> leader<br />

(crRNA) strand of <strong>the</strong> available Sulfolobus genomes revealed,<br />

for a total of 4505 spacers, 2560 carry<strong>in</strong>g archaeal-type<br />

hexameric TATA boxes (at least six consecutive A and Ts<br />

with at least two As) and 725 with T-rich pyrimid<strong>in</strong>e motifs<br />

(at least six consecutive T and Cs with at least five Ts) [28,29].<br />

Although many of <strong>the</strong>se may at best be weakly effective,<br />

never<strong>the</strong>less, given <strong>the</strong> high gene density <strong>in</strong> <strong>the</strong> Sulfolobus<br />

viral and plasmid genomes and <strong>the</strong> low frequency of operon<br />

structures, <strong>the</strong> probability of tak<strong>in</strong>g up such active motifs is<br />

significant. The conclusion that transcripts do not normally<br />

start with<strong>in</strong> <strong>CRISPR</strong> loci is also supported by exam<strong>in</strong>ation<br />

of <strong>CRISPR</strong> transcripts from S. solfataricus P2 transcriptome<br />

data [21], which <strong>in</strong>dicate that most of <strong>the</strong> detectable 5 ′ -ends<br />

are attributable to process<strong>in</strong>g sites with<strong>in</strong> repeats [21]. A<br />

possible explanation for <strong>the</strong> unimpeded transcription through<br />

<strong>the</strong> <strong>CRISPR</strong> loci could be <strong>the</strong> presence of <strong>the</strong> <strong>CRISPR</strong>b<strong>in</strong>d<strong>in</strong>g<br />

prote<strong>in</strong> of Sulfolobus and o<strong>the</strong>r crenarchaea [30]; it<br />

could act as a transcription factor <strong>in</strong>hibit<strong>in</strong>g transcriptional<br />

starts and stops with<strong>in</strong> <strong>the</strong> spacer sequences, and repeats.<br />

Full-length transcripts are also produced from <strong>the</strong> opposite<br />

DNA strand of <strong>CRISPR</strong> loci of S. acidocaldarius which yield<br />

discrete 50–60 bp fragments carry<strong>in</strong>g spacer sequences, albeit<br />

at lower molar levels than for <strong>the</strong> crRNAs [16], and antisense<br />

RNA transcripts also were detected for <strong>CRISPR</strong> loci of<br />

S. solfataricus P2 [21]. Failure to detect similar transcripts<br />

<strong>in</strong> <strong>the</strong> euryarchaeon Pyrococcus and bacterium E. coli [12,19]<br />

suggests that this may be a specific property of Sulfolobus<br />

or crenarchaea. Analyses of cDNA libraries of S. solfataricus<br />

demonstrated previously that antisense RNAs are commonly<br />

produced especially aga<strong>in</strong>st transposase mRNAs [27], and<br />

several o<strong>the</strong>r antisense RNAs have been detected for this<br />

organism [21]. Given that mature crRNAs are produced <strong>in</strong> <strong>the</strong><br />

absence of <strong>in</strong>fect<strong>in</strong>g genetic elements <strong>in</strong> different Sulfolobus<br />

species [16,20,23], one possible explanation is that <strong>the</strong>se<br />

antisense RNAs protect at least a fraction of <strong>the</strong> crRNAs<br />

aga<strong>in</strong>st degradation before <strong>the</strong>ir activation.<br />

Maturation of crRNAs and str<strong>in</strong>gency of<br />

target<strong>in</strong>g mechanisms<br />

Details of RNA-process<strong>in</strong>g mechanism have been elucidated<br />

for a euryarchaeal <strong>CRISPR</strong>/Cmr <strong>system</strong> and an E. coli<br />

<strong>CRISPR</strong>/Cas <strong>system</strong> where Cas6 homologues cut <strong>in</strong> <strong>the</strong><br />

repeat, 8 nt 5 ′ from <strong>the</strong> start of <strong>the</strong> spacer sequence, whereas<br />

<strong>the</strong> 3 ′ -process<strong>in</strong>g sites differ [12,18]. For S. solfataricus,many<br />

5 ′ -ends, and putative process<strong>in</strong>g sites, are detectable 6–8 nt<br />

from <strong>the</strong> spacer start [21], suggest<strong>in</strong>g that a similar mechanism


operates. Process<strong>in</strong>g at <strong>the</strong> 3 ′ -end of <strong>the</strong> crRNA is less clearly<br />

def<strong>in</strong>ed, but for <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> of Pyrococcus, a<br />

14 nt ruler mechanism enables <strong>the</strong> process<strong>in</strong>g ribonuclease to<br />

generate dual cuts at 5 and 11 nt <strong>in</strong>to <strong>the</strong> spacer sequence<br />

[31]. Presumably, crRNA-b<strong>in</strong>d<strong>in</strong>g Cas and Cmr prote<strong>in</strong>s<br />

dist<strong>in</strong>guish between <strong>the</strong> different crRNA products before<br />

target<strong>in</strong>g <strong>the</strong> foreign DNA or RNA respectively.<br />

Until recently, attention focused on target<strong>in</strong>g of doublestranded<br />

DNA elements, but probably s<strong>in</strong>gle-stranded DNA<br />

will also be targeted by <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>. It rema<strong>in</strong>s<br />

an open question whe<strong>the</strong>r <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> targets<br />

both mRNA and viral RNA, and <strong>in</strong>corporation of viral RNA<br />

<strong>in</strong>to <strong>CRISPR</strong> loci would require reverse transcriptase activity.<br />

Never<strong>the</strong>less, all evidence suggests that <strong>the</strong> primary targets<br />

of <strong>the</strong> Sulfolobus <strong>immune</strong> <strong>system</strong>s are viruses and plasmids<br />

and, probably, <strong>the</strong>ir mRNAs. There is no support for a<br />

general target<strong>in</strong>g of transposable elements. Spacers match<strong>in</strong>g<br />

transposase genes are occasionally found <strong>in</strong> <strong>CRISPR</strong> loci<br />

[16,20,32], but <strong>the</strong>y can generally be attributed to transposase<br />

genes present <strong>in</strong> viruses or plasmids, <strong>in</strong> particular orphan orfB<br />

elements (family IS605/200) for Sulfolobus [2,15].<br />

Effective target<strong>in</strong>g of genetic elements requires that <strong>the</strong><br />

mature crRNA anneals to <strong>the</strong> protospacer DNA region.<br />

Although, for <strong>the</strong> bacterium S. <strong>the</strong>rmophilus, a perfect<br />

sequence match was required to elicit a response from <strong>the</strong> CR-<br />

ISPR/Cas <strong>system</strong> [9], studies on different Sulfolobus stra<strong>in</strong>s<br />

have shown that a less str<strong>in</strong>gent recognition <strong>system</strong> prevails.<br />

Challeng<strong>in</strong>g Sulfolobus cells with viral genes carry<strong>in</strong>g one<br />

to three mismatches still produced a strong response from<br />

<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> [23]. Ano<strong>the</strong>r important factor is<br />

<strong>the</strong> motif known as PAM. Targeted genetic elements carry<br />

this short sequence motif which creates a mismatch with<br />

<strong>the</strong> 5 ′ -end of <strong>the</strong> crRNA [16,33,34]. For Sulfolobus, this was<br />

def<strong>in</strong>ed as a family-specific d<strong>in</strong>ucleotide, displaced 1 nt from<br />

<strong>the</strong> spacer sequence [15,16]. Potentially, this can be <strong>in</strong>volved<br />

<strong>in</strong> both selection of protospacers for excision by Cas prote<strong>in</strong>s<br />

and crRNA target<strong>in</strong>g. Whereas a study of <strong>the</strong> bacterium<br />

S. epidermidis concluded that <strong>the</strong> PAM was not important for<br />

protospacer target<strong>in</strong>g and that any mismatched base pair<strong>in</strong>g<br />

would suffice [11], for S. islandicus stra<strong>in</strong> REY15A, alter<strong>in</strong>g<br />

<strong>the</strong> PAM led to a loss of crRNA target<strong>in</strong>g [23].<br />

Anti-immmune <strong>system</strong>s<br />

Although a few archaeal viruses have been shown to be<br />

lytic and to elicit strong <strong>immune</strong> responses, many Sulfolobus<br />

viruses and plasmids coexist <strong>in</strong> a stable relationship, at low<br />

copy numbers, over longer periods. Although <strong>the</strong>se genetic<br />

elements do not appear to be targeted by <strong>the</strong> host <strong>CRISPR</strong><br />

<strong>system</strong>s, <strong>the</strong> latter could never<strong>the</strong>less have a regulatory role<br />

possibly by target<strong>in</strong>g mRNAs.<br />

Ano<strong>the</strong>r special feature of archaeal genetic elements is<br />

that <strong>the</strong>y often carry an <strong>in</strong>tegrase gene which partitions<br />

on chromosomal <strong>in</strong>tegration. Consequently, <strong>the</strong> <strong>in</strong>tegrated<br />

element can only be excised when <strong>the</strong> free element is<br />

present to generate an <strong>in</strong>tact <strong>in</strong>tegrase/excision enzyme [35].<br />

Molecular Biology of <strong>Archaea</strong> II 55<br />

Thus target<strong>in</strong>g and degradation of <strong>the</strong> free genetic element<br />

by <strong>the</strong> host <strong>CRISPR</strong>/Cas <strong>system</strong> could actually favour<br />

entrapment of <strong>the</strong> <strong>in</strong>tegrated element, and such a process<br />

could enhance viral and plasmid evolution <strong>in</strong> archaea. The<br />

Redder Model [36] for archaeal viral evolution hypo<strong>the</strong>sized<br />

that, s<strong>in</strong>ce more than one type of fusellovirus can <strong>in</strong>tegrate<br />

at a given att site with<strong>in</strong> a tRNA gene, <strong>the</strong> encaptured<br />

concatenated viruses would tend to recomb<strong>in</strong>e <strong>the</strong>reby<br />

generat<strong>in</strong>g, and subsequently releas<strong>in</strong>g, hybrid fuselloviruses<br />

[36]. A similar process may occur for Sulfolobus-specific<br />

conjugative plasmids. They are also <strong>in</strong>tegrative, and <strong>the</strong>ir<br />

DNA is regularly <strong>in</strong>corporated <strong>in</strong>to <strong>CRISPR</strong> loci as<br />

spacers [16,20]. Moreover, this could expla<strong>in</strong> why some of<br />

<strong>the</strong> different Icelandic conjugative plasmids cultivated <strong>in</strong><br />

Wolfram Zillig’s laboratory [37] often carry large regions of<br />

almost identical nucleotide sequence [6,7]. Thus, <strong>in</strong>directly,<br />

<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s could be fuell<strong>in</strong>g production of<br />

new viral and plasmid variants which <strong>the</strong>y may subsequently<br />

be required to <strong>in</strong>activate.<br />

Some <strong>in</strong>sights <strong>in</strong>to how genetic elements underm<strong>in</strong>e or<br />

avoid <strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s were ga<strong>in</strong>ed by pass<strong>in</strong>g<br />

<strong>the</strong> rudivirus SIRV1 (Sulfolobus islandicus rod-shaped virus<br />

1) through a series of closely related S. islandicus stra<strong>in</strong>s.<br />

This generated many sequence changes <strong>in</strong> <strong>the</strong> viral genes,<br />

but strik<strong>in</strong>g was <strong>the</strong> frequent occurrence of genes that were<br />

altered by 12 bp <strong>in</strong>dels, probably deletions [38]. When similar<br />

12 bp <strong>in</strong>dels were observed among related lipothrixviruses,<br />

it was <strong>in</strong>ferred that <strong>the</strong>se might occur at crRNA-target<strong>in</strong>g<br />

protospacers on <strong>the</strong> viral genomes [39]. In ano<strong>the</strong>r study of a<br />

hyper<strong>the</strong>rmophilic archaeal virus, HAV1 (hyper<strong>the</strong>rmophilic<br />

archaeal virus 1), cultured <strong>in</strong> a bioreactor over a 2-year period,<br />

samples taken at different times showed genome sequence<br />

changes, not unlike those observed earlier for SIRV1, but also<br />

a series of recomb<strong>in</strong>ation sites were detected along <strong>the</strong> l<strong>in</strong>ear<br />

genome at which frequent rearrangements had occurred to<br />

generate viral variants with altered sequences [40].<br />

Although accumulat<strong>in</strong>g specific sequence changes <strong>in</strong><br />

genetic elements is an effective way of avoid<strong>in</strong>g, at least<br />

temporarily, crRNA target<strong>in</strong>g, more direct methods must<br />

also have evolved. Thus, for <strong>the</strong> S. islandicus stra<strong>in</strong> M.16.4,<br />

an M164 provirus 1 has <strong>in</strong>serted <strong>in</strong>to, and disrupted, <strong>the</strong><br />

csa3 gene considered to encode <strong>the</strong> transcriptional regulator<br />

of <strong>the</strong> group 1 cas genes (Figure 1A) associated with new<br />

spacer uptake [17]. This has <strong>the</strong> advantage for <strong>the</strong> virus that<br />

o<strong>the</strong>r <strong>in</strong>fect<strong>in</strong>g viruses will still be attacked by crRNAs if<br />

match<strong>in</strong>g spacers are already present <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> locus, but<br />

new spacers cannot be generated from M164 provirus itself.<br />

O<strong>the</strong>r possible mechanisms were discerned from a study<br />

<strong>in</strong> which <strong>CRISPR</strong> <strong>system</strong>s of Sulfolobus were challenged<br />

directly by vectors carry<strong>in</strong>g viral genes or protospacers<br />

show<strong>in</strong>g various degrees of match<strong>in</strong>g to host <strong>CRISPR</strong> spacers<br />

which mimicked, to a degree, <strong>the</strong> cont<strong>in</strong>ual <strong>in</strong>fection of a host<br />

cell with a given virus [23]. In many viable transformants,<br />

<strong>CRISPR</strong> locus deletions, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> match<strong>in</strong>g spacer, had<br />

occurred, whereas <strong>in</strong> o<strong>the</strong>rs, whole <strong>CRISPR</strong>/Cas cassettes<br />

were lost. However, several transformants revealed no<br />

changes <strong>in</strong> ei<strong>the</strong>r <strong>CRISPR</strong>/Cas modules or vector constructs,<br />

C○The Authors Journal compilation C○2011 Biochemical Society


56 Biochemical Society Transactions (2011) Volume 39, part 1<br />

suggest<strong>in</strong>g that o<strong>the</strong>r unknown regulatory mechanisms, can<br />

<strong>in</strong>activate <strong>the</strong> <strong>immune</strong> <strong>system</strong> [23].<br />

<strong>CRISPR</strong>/Cas and Cmr module mobility<br />

Sulfolobus <strong>CRISPR</strong>/Cas and Cmr modules generally occur<br />

with<strong>in</strong> variable chromosomal regions where extensive gene<br />

shuffl<strong>in</strong>g has occurred [2,41], often attributable to high levels<br />

of transposable elements. Recomb<strong>in</strong>ation at border<strong>in</strong>g IS<br />

elements can also lead to loss of <strong>CRISPR</strong>/Cas or Cmr<br />

modules [25]. There is also strong evidence <strong>in</strong> support of<br />

<strong>the</strong> transfer of whole modules between organisms based on<br />

comparative studies of <strong>CRISPR</strong>/Cas module families and<br />

<strong>the</strong>ir locations, although <strong>the</strong> transfer mechanisms rema<strong>in</strong><br />

unclear [2]. For bacteria, evidence was provided for transfer<br />

of <strong>the</strong>se modules on large plasmids [42], but many archaeal<br />

<strong>CRISPR</strong>/Cas modules are large, up to 25 kb, and <strong>the</strong><br />

largest conjugative plasmids are only approx. 40 kb [6].<br />

Chromosomal conjugation may provide a vehicle, possibly<br />

facilitated by encaptured Sulfolobus conjugative plasmids<br />

[43,44] or presently unknown mechanisms may operate,<br />

possibly with<strong>in</strong> biofilms. F<strong>in</strong>ally, although phylogenetic<br />

analyses support <strong>the</strong> transfer of <strong>CRISPR</strong>/Cas and Cmr<br />

modules between archaea and bacteria, <strong>the</strong> basic differences<br />

<strong>in</strong> archaeal and bacterial transcriptional and translational<br />

mechanisms and <strong>in</strong> <strong>the</strong> unique cell wall, membrane structures<br />

and conjugative <strong>system</strong> of archaea provide formidable<br />

barriers to transfer between doma<strong>in</strong>s [2].<br />

Fund<strong>in</strong>g<br />

Research was supported by grants from <strong>the</strong> Danish Natural Science<br />

Research Council [grant number 272-08-0391], <strong>the</strong> Danish Research<br />

Council for Technology and Production [grant number 274-07-0116]<br />

and <strong>the</strong> Danish National Research Foundation.<br />

References<br />

1 Karg<strong>in</strong>ov, F.V. and Hannon, G.J. (2010) The <strong>CRISPR</strong> <strong>system</strong>: small<br />

RNA-guided defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7–19<br />

2 Shah, S.A. and Garrett, R.A. (2010) <strong>CRISPR</strong>/Cas and Cmr modules,<br />

mobility and evolution of an adaptive <strong>immune</strong> <strong>system</strong>. Res. Microbiol.,<br />

doi:10.1016/j.resmic.2010.09.001<br />

3 Zillig, W, Arnold, H.P., Holz, I., Prangishvili, D., Schweier, A., Stedman, K.,<br />

She, Q., Phan, H., Garrett, R. and Kristjansson, J.K. (1998) Genetic<br />

elements <strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus.<br />

Extremophiles 2, 131–140<br />

4Prangishvili,D.,Forterre,P.andGarrett,R.A.(2006)Virusesof<strong>the</strong><br />

<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837–848<br />

5 Lawrence, C.M., Menon, S., Eilers, B.J., Bothner, B., Khayat, R., Douglas, T.<br />

and Young, M.J. (2009) Structural and functional studies of archaeal<br />

viruses. J. Biol. Chem. 284, 12599–12603<br />

6 Greve, B., Jensen, S., Brügger, K., Zillig, W. and Garrett, R.A. (2004)<br />

Genomic comparison of archaeal conjugative plasmids from Sulfolobus.<br />

<strong>Archaea</strong> 1, 231–23<br />

7 Erauso, G., Stedman, K.M., van de Werken, H.J.G., Zillig, W. and van der<br />

Oost, J. (2006) Two novel conjugative plasmids from a s<strong>in</strong>gle stra<strong>in</strong> of<br />

Sulfolobus. Microbiology152, 1951–1968<br />

8 Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />

Mo<strong>in</strong>eau, S., Romero, D.A. and Horvath, P. (2007) <strong>CRISPR</strong> provides<br />

acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315,<br />

1709–1712<br />

C○The Authors Journal compilation C○2011 Biochemical Society<br />

9Horvath,P.,Romero,D.A.,Coûté-Monvois<strong>in</strong>, A.-C., Richards, M.,<br />

Deveau, H., Mo<strong>in</strong>eau, S., Boyaval, P., Fremaux, C. and Barrangou, R.<br />

(2008) Diversity, activity, and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus<br />

<strong>the</strong>rmophilus. J. Bacteriol. 190, 1401–1412<br />

10 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference limits<br />

horizontal gene transfer <strong>in</strong> staphylococci by target<strong>in</strong>g DNA. Science 322,<br />

1843–1845<br />

11 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2010) Self versus non-self<br />

discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463,<br />

568–571<br />

12 Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R., Slijkhuis, R.J.,<br />

Snijders, A.P., Dickman, M.J., Makarova, K.S., Koon<strong>in</strong>, E.V. and van der<br />

Oost, J. (2008) Small <strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes.<br />

Science 321, 960–964<br />

13 Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F. and Nelson, K.E. (2005) A guild of<br />

45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/Cas<br />

subtypes exist <strong>in</strong> prokaryotic genomes. PloS Comput. Biol. 1, 474–483<br />

14 Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I. and Koon<strong>in</strong>, E.V.<br />

(2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong><br />

prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />

mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical<br />

mechanisms of action. Biol. Direct 1, 7<br />

15 Shah, S.A., Hansen, N.R. and Garrett, R.A. (2009) Distributions of <strong>CRISPR</strong><br />

spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal<br />

acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />

Biochem. Soc. Trans. 37, 23–28<br />

16 Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H., Christiansen, J.<br />

and Garrett, R.A. (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />

Sulfolobus: bidirectionaltranscriptionanddynamicproperties.Mol.<br />

Microbiol. 72, 259–272<br />

17 Shah, S.A., Vestergaard, G. and Garrett, R.A. (2011) <strong>CRISPR</strong>/Cas and<br />

<strong>CRISPR</strong>/Cmr <strong>immune</strong> <strong>system</strong>s of archaea. In Regulatory RNAs <strong>in</strong><br />

Prokaryotes (Marchfelder, A. and Hess, W., eds), Spr<strong>in</strong>ger, Berl<strong>in</strong>,<br />

<strong>in</strong> <strong>the</strong> press<br />

18 Carte, J., Wang, R., Li, H., Terns, R.M. and Terns, M.P. (2008) Cas6 is an<br />

endoribonuclease that generates guide RNAs for <strong>in</strong>vader defense <strong>in</strong><br />

prokaryotes. Genes Dev. 22, 3489–3496<br />

19 Hale, C., Kleppe, K., Terns, R.M. and Terns, M.P. (2008) Prokaryotic<br />

silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus furiosus. RNA14, 1–8<br />

20 Lillestøl, R.K., Redder, P., Garrett, R.A. and Brügger, K. (2006) A putative<br />

viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59–72<br />

21 Wurtzel, O., Sapra, R., Chen, F., Zhu, Z.Y., Simmons, B.A. and Sorek, R.<br />

(2010) A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />

Genome Res. 20, 133–141<br />

22 Reno, M.L., Hel, N.L., Fields, C.J., Burke, P.V. and Whitaker, R.J. (2009)<br />

Biogeography of <strong>the</strong> Sulfolobus islandicus pan-genome. Proc. Natl. Acad.<br />

Sci. U.S.A. 106, 8605–8610<br />

23 Gudbergsdottir, S., Deng, L., Chen, Z., Jensen, J.V.K., Jensen, L.R., She, Q.<br />

and Garrett, R.A. (2011) Dynamic properties of <strong>the</strong> Sulfolobus<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with<br />

vector-borne viral and plasmid genes and protospacers. Mol. Microbiol.<br />

79, 35–49<br />

24 Blount, Z.D. and Grogan, D.W. (2005) New <strong>in</strong>sertion sequences of<br />

Sulfolobus: functionalpropertiesandimplicationsforgenomeevolution<br />

<strong>in</strong> hyper<strong>the</strong>rmophilic archaea. Mol. Microbiol. 55, 312–325<br />

25 Redder, P. and Garrett, R.A. (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />

genome of Sulfolobus solfataricus P2. J. Bacteriol. 188, 4198–4206<br />

26 Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>, M.-L.,<br />

Huber, H., Drungowski, M., Elge, T., Brosius, J. and Hüttenhofer, A. (2002)<br />

Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong><br />

archaeon Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99,<br />

7536–7541<br />

27 Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K., Garrett, R.A.,<br />

Bachellerie, J.P. and Hüttenhofer, A. (2005) Identification of novel<br />

non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon<br />

Sulfolobus solfataricus. Mol.Microbiol.55, 469–481


28 Torar<strong>in</strong>sson, E., Klenk, H.P. and Garrett, R.A. (2005) Divergent<br />

transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7,<br />

47–54<br />

29 Santangelo, T.J., Cubonová, L., Sk<strong>in</strong>ner, K.M. and Reeve, J.N. (2009)<br />

<strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191,<br />

7102–7108<br />

30 Peng, X., Brügger, K., Shen, B., Chen, L., She, Q. and Garrett, R.A. (2003)<br />

Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> large clusters of DNA repeats (short<br />

regularly spaced repeats) present <strong>in</strong> Sulfolobus genomes. J. Bacteriol.<br />

185, 2410–2417<br />

31 Hale, C.R., Zhao, P., Olson, S., Duff, M.O., Graveley, B.R., Wells, L., Terns,<br />

R.M. and Terns, M.P. (2009) RNA-guided RNA cleavage by a <strong>CRISPR</strong><br />

RNA–Cas prote<strong>in</strong> complex. Cell 139, 945–956<br />

32 Held, N.L. and Whitaker, R.J. (2009) Viral biogeography revealed by<br />

signatures <strong>in</strong> Sulfolobus islandicus genomes. Environ. Microbiol. 11,<br />

457–466<br />

33 Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J., Fremaux, C.,<br />

Boyaval, P., Romero, D.A., Horvath, P. and Mo<strong>in</strong>eau, S. (2008) Phage<br />

response to <strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />

J. Bacteriol. 190, 1390–1400<br />

34 Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J. and Almendros, C.<br />

(2009) Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic<br />

<strong>CRISPR</strong> <strong>system</strong>. Microbiology 155, 733–740<br />

35 She, Q., Peng, X., Zillig, W. and Garrett, R.A. (2001) Gene capture events<br />

<strong>in</strong> archaeal chromosomes. Nature 409, 478<br />

36 Redder, P., Peng, X., Brügger, K., Shah, S.A., Roesch, F., Greve, B.,<br />

She, Q., Schleper, C., Forterre, P., Garrett, R.A. and Prangishvili, D. (2009)<br />

Four newly isolated fuselloviruses from extreme geo<strong>the</strong>rmal<br />

environments reveal unusual morphologies and a possible <strong>in</strong>terviral<br />

recomb<strong>in</strong>ation mechanism. Environ. Microbiol. 11, 2849–2862<br />

37 Prangishvili, D., Albers, S.V., Holz, I., Arnold, H.P., Stedman, K., Kle<strong>in</strong>, T.,<br />

S<strong>in</strong>gh, H., Hiort, J., Schweier, A., Kristjansson, J.K. and Zillig, W. (1998)<br />

Conjugation <strong>in</strong> archaea: frequent occurrence of conjugative plasmids <strong>in</strong><br />

Sulfolobus. Plasmid 40, 190–202<br />

Molecular Biology of <strong>Archaea</strong> II 57<br />

38 Peng, X., Kessler, A., Phan, H., Garrett, R.A. and Prangishvili, D. (2004)<br />

Multiple variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host<br />

and a novel mechanism of genomic variation. Mol. Microbiol. 54,<br />

366–375<br />

39 Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />

Briegel, A., Rachel, R., Garrett, R.A. and Prangishvili, D. (2008) SRV, a<br />

new rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />

rudiviruses with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />

6837–6845<br />

40 Garrett, R.A., Prangishvili, D., Shah, S.A., Reuter, M., Stetter, K. and Peng,<br />

X. (2010) Metagenomic analyses of novel viruses, plasmids, and <strong>the</strong>ir<br />

variants, from an environmental sample of hyper<strong>the</strong>rmophilic<br />

neutrophiles cultured <strong>in</strong> a bioreactor. Environ. Microbiol. 12, 2918–2930<br />

41 Brügger, K., Torar<strong>in</strong>sson, E., Chen, L. and Garrett, R.A. (2004) Shuffl<strong>in</strong>g of<br />

Sulfolobus genomes by autonomous and non-autonomous mobile<br />

elements. Biochem. Soc. Trans. 32, 179–183<br />

42 Godde, J.S. and Bickerton, A. (2006) The repetitive DNA elements called<br />

<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />

among prokaryotes. J. Mol. Evol. 62, 718–729<br />

43 Aagaard, C., Dalgaard, J. and Garrett, R.A. (1995) Inter-cellular mobility<br />

and hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron confers selective advantage over<br />

<strong>in</strong>tron-cells of Sulfolobus acidocaldarius. Proc. Natl. Acad. Sci. U.S.A. 92,<br />

12285–12289<br />

44 Grogan, D.W. (1996) Exchange of genetic markers at extremely high<br />

temperatures <strong>in</strong> <strong>the</strong> archaeon Sulfolobus acidocaldarius. J. Bacteriol.<br />

178, 3207–3211<br />

45 Bailey, T.L., Williams, N., Misleh, C. and Li, W.W. (2006) MEME:<br />

discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong> sequence motifs. Nucleic<br />

Acids Res. 34, 369–373<br />

Received 30 September 2010<br />

doi:10.1042/BST0390051<br />

C○The Authors Journal compilation C○2011 Biochemical Society


Extremophiles (2011) 15:487–497<br />

DOI 10.1007/s00792-011-0379-y<br />

ORIGINAL PAPER<br />

Genomic analysis of Acidianus hospitalis W1 a host for study<strong>in</strong>g<br />

crenarchaeal virus and plasmid life cycles<br />

Xiao-Yan You • Chao Liu • Sheng-Yue Wang • Cheng-Y<strong>in</strong>g Jiang • Shiraz A. Shah •<br />

David Prangishvili • Qunx<strong>in</strong> She • Shuang-Jiang Liu • Roger A. Garrett<br />

Received: 4 March 2011 / Accepted: 26 April 2011 / Published onl<strong>in</strong>e: 24 May 2011<br />

Ó The Author(s) 2011. This article is published with open access at Spr<strong>in</strong>gerl<strong>in</strong>k.com<br />

Abstract The Acidianus hospitalis W1 genome consists<br />

of a m<strong>in</strong>imally sized chromosome of about 2.13 Mb and a<br />

conjugative plasmid pAH1 and it is a host for <strong>the</strong> model<br />

filamentous lipothrixvirus AFV1. The chromosome carries<br />

three putative replication orig<strong>in</strong>s <strong>in</strong> conserved genomic<br />

regions and two large regions where non-essential genes are<br />

clustered. With<strong>in</strong> <strong>the</strong>se variable regions, a few orphan orfB<br />

and o<strong>the</strong>r elements of <strong>the</strong> IS200/607/605 family are concentrated<br />

with a novel class of MITE-like repeat elements.<br />

There are also 26 highly diverse vapBC antitox<strong>in</strong>–tox<strong>in</strong> gene<br />

pairs proposed to facilitate ma<strong>in</strong>tenance of local chromosomal<br />

regions and to m<strong>in</strong>imise <strong>the</strong> impact of environmental<br />

Communicated by L. Huang.<br />

X.-Y. You and C. Liu contributed equally to this work.<br />

X.-Y. You C.-Y. Jiang S.-J. Liu (&)<br />

State Key Laboratory of Microbial Resources and Center<br />

for Environmental Microbiology, Institute of Microbiology,<br />

Ch<strong>in</strong>ese Academy of Sciences,<br />

Bei-Chen-Xi-Lu No. 1 Chao-Yang District,<br />

Beij<strong>in</strong>g 100101, People’s Republic of Ch<strong>in</strong>a<br />

e-mail: liusj@sun.im.ac.cn<br />

C. Liu S. A. Shah Q. She R. A. Garrett (&)<br />

<strong>Archaea</strong> Centre, Department of Biology,<br />

Copenhagen University, Ole Maaløes Vej 5,<br />

2200 N Copenhagen, Denmark<br />

e-mail: garrett@bio.ku.dk<br />

S.-Y. Wang<br />

Shanghai-MOST Key Laboratory of Health and Disease<br />

Genomics, Ch<strong>in</strong>ese National Human Genome Center,<br />

Shanghai, People’s Republic of Ch<strong>in</strong>a<br />

D. Prangishvili<br />

Molecular Biology of <strong>the</strong> Gene <strong>in</strong> Extremophiles Unit,<br />

Institut Pasteur, rue Dr Roux 25, 75724 Paris Cedex, France<br />

stress. Complex and partially defective <strong>CRISPR</strong>/Cas/Cmr<br />

<strong>immune</strong> <strong>system</strong>s are present and <strong>in</strong>terspersed with five<br />

vapBC gene pairs. Remnants of <strong>in</strong>tegrated viral genomes<br />

and plasmids are located at five <strong>in</strong>tron-less tRNA genes and<br />

several non-cod<strong>in</strong>g RNA genes are predicted that are conserved<br />

<strong>in</strong> o<strong>the</strong>r Sulfolobus genomes. The putative metabolic<br />

pathways for sulphur metabolism show some significant<br />

differences from those proposed for o<strong>the</strong>r Acidianus and<br />

Sulfolobus species. The small and relatively stable genome<br />

of A. hospitalis W1 renders it a promis<strong>in</strong>g candidate for<br />

develop<strong>in</strong>g <strong>the</strong> first Acidianus genetic <strong>system</strong>s.<br />

Keywords Tox<strong>in</strong>–antitox<strong>in</strong> VapBC <strong>CRISPR</strong><br />

Sulphur metabolism OrfB element MITE<br />

Introduction<br />

The Acidianus genus consists of acido<strong>the</strong>rmophiles which<br />

grow optimally and slowly <strong>in</strong> <strong>the</strong> temperature range<br />

65–95°C and at pH 2–4 and belongs to <strong>the</strong> order Sulfolobales.<br />

Acidianus species are chemolithoautotrophic and<br />

facultatively anaerobic and are generally versatile physiologically.<br />

Depend<strong>in</strong>g on <strong>the</strong> cultur<strong>in</strong>g conditions, <strong>the</strong>y can<br />

ei<strong>the</strong>r reduce S° to H2S, catalysed by a sulphur reductase<br />

and hydrogenase, or oxidise S° to H2SO4 utilis<strong>in</strong>g <strong>the</strong><br />

sulphur oxygenase-reductase holoenzyme (Kletz<strong>in</strong> 1992,<br />

2007). In contrast to several Sulfolobus species, <strong>the</strong> genomic<br />

properties of an Acidianus species have not been<br />

analysed. The Sulfolobales have been a rich source of<br />

genetic elements, <strong>in</strong>clud<strong>in</strong>g novel conjugative plasmids<br />

(Prangishvili et al. 1998; Greve et al. 2004) and several<br />

exceptional and diverse viruses many of which have now<br />

been classified <strong>in</strong>to eight new viral families (Rachel et al.<br />

2002; Prangishvili et al. 2006; Lawrence et al. 2009).<br />

123


488 Extremophiles (2011) 15:487–497<br />

Acidianus hospitalis W1 is <strong>the</strong> first Acidianus stra<strong>in</strong> to be<br />

isolated carry<strong>in</strong>g a conjugative plasmid pAH1 which is a<br />

member of <strong>the</strong> plasmid family predicted to generate an<br />

archaea-specific conjugative apparatus (Greve et al. 2004;<br />

Basta et al. 2009). These plasmids are also <strong>in</strong>tegrative<br />

elements and <strong>in</strong> an encaptured state have been implicated <strong>in</strong><br />

facilitat<strong>in</strong>g chromosomal DNA conjugation for some Sulfolobus<br />

species (Chen et al. 2005b). A. hospitalis is also a<br />

viable host for <strong>the</strong> model Acidianus alpha lipothrixvirus<br />

AFV1, a filamentous virus carry<strong>in</strong>g exceptional claw-like<br />

structures at its term<strong>in</strong>i which is currently <strong>the</strong> subject of<br />

detailed structural studies (Bettstetter et al. 2003; Goulet<br />

et al. 2009). Infection of A. hospitalis with AFV1 was shown<br />

to lead to a loss of <strong>the</strong> plasmid pAH1 and this contrasts with<br />

observations <strong>in</strong> bacteria where endogenous plasmids tend to<br />

determ<strong>in</strong>e <strong>the</strong> fate of an <strong>in</strong>com<strong>in</strong>g phage (Basta et al. 2009).<br />

In order to study fur<strong>the</strong>r <strong>the</strong> metabolic capability of an<br />

Acidianus species and to exam<strong>in</strong>e <strong>the</strong> molecular mechanisms<br />

<strong>in</strong>volved <strong>in</strong> virus–plasmid–host <strong>in</strong>teractions, it was<br />

important to sequence and annotate <strong>the</strong> A. hospitalis genome.<br />

To date, most genomic studies of <strong>the</strong> Sulfolobales<br />

have concentrated on Sulfolobus species that have revealed<br />

relatively large genomes generally exhibit<strong>in</strong>g high levels of<br />

transposable and <strong>in</strong>tegrated genetic elements, as well as<br />

considerable genetic diversity (Guo et al. 2011). Analysis<br />

of <strong>the</strong> A. hospitalis genome revealed a m<strong>in</strong>imally sized<br />

chromosome that appeared relatively stable with few<br />

transposable elements and no evidence of recent <strong>in</strong>tegration<br />

events, apart from <strong>the</strong> reversible <strong>in</strong>tegration of pAH1<br />

<strong>in</strong>to a tRNA Arg gene (Basta et al. 2009). Potentially,<br />

<strong>the</strong>refore, A. hospitalis W1 could provide a suitable host<br />

for develop<strong>in</strong>g genetic <strong>system</strong>s for <strong>the</strong> Acidianus genus.<br />

Materials and methods<br />

Genome sequenc<strong>in</strong>g and gap closure<br />

Genomic DNA of A. hospitalis was sequenced us<strong>in</strong>g a<br />

Roche 454 Genome Sequencer FLX <strong>in</strong>strument (Titanium)<br />

with an average 19-fold coverage. All useful reads were<br />

<strong>in</strong>itially assembled <strong>in</strong>to seven contigs ([500 bp) us<strong>in</strong>g <strong>the</strong><br />

Newbler assembler software (http://www.454.com/). Gaps<br />

were closed by a Multiplex PCR strategy and PCR products<br />

were gel purified and sequenced us<strong>in</strong>g an ABI3730 DNA<br />

sequenator. Raw sequence data were assembled <strong>in</strong>to contigs<br />

us<strong>in</strong>g phred/phrap/consed software and <strong>the</strong> f<strong>in</strong>al consensus<br />

quality for each base was above 30 (http://www.phrap.org).<br />

Sequence analysis and gene annotation<br />

Initially, ORFs were predicted us<strong>in</strong>g <strong>the</strong> programmes<br />

Glimmer and FgeneSB and prote<strong>in</strong> function predictions<br />

123<br />

were obta<strong>in</strong>ed from <strong>the</strong> follow<strong>in</strong>g searches: (1) homology<br />

searches <strong>in</strong> <strong>the</strong> GenBank (http://www.ncbi.nlm.nih.gov/)<br />

and UniProt prote<strong>in</strong> (http://www.ebi.ac.uk/uniprot/) databases,<br />

(2) function assignment searches <strong>in</strong> <strong>the</strong> Sulfolobus<br />

database (http://www.Sulfolobus.org/), and (3) doma<strong>in</strong> or<br />

motif searches <strong>in</strong> <strong>the</strong> local CDD database (http://www.<br />

ncbi.nlm.nih.gov/cdd/), <strong>the</strong> InterPro and <strong>the</strong> Pfam databases.<br />

The KEGG database (http://www.genome.jp/kegg/)<br />

was used to reconstruct metabolic pathways <strong>in</strong> silico.<br />

Membrane prote<strong>in</strong>s were predicted by Phobius, TMHMM<br />

and ConPred II programmes. Secretory prote<strong>in</strong>s were<br />

divided <strong>in</strong>to two groups; those with a signal peptide were<br />

predicted us<strong>in</strong>g <strong>the</strong> SignalP 3.0 (http://www.cbs.dtu.dk/<br />

services/SignalP/) and non-classical secretory prote<strong>in</strong>s,<br />

lack<strong>in</strong>g a signal peptide, were predicted by <strong>the</strong> SecretomeP<br />

2.0 programme (http://www.cbS.dtu.dk/services/SecretomeP/).<br />

Transporters were predicted by search<strong>in</strong>g <strong>the</strong> TCDB database<br />

(http://www.tcdp.org) us<strong>in</strong>g BLASTP with E values<br />

lower than 1e-05. Insertion sequence (IS) elements and<br />

transposases were identified by BLASTN searches aga<strong>in</strong>st<br />

<strong>the</strong> IS F<strong>in</strong>der database (http://www-is.biotoul.fr/). The<br />

MITE-like elements were detected us<strong>in</strong>g <strong>the</strong> programme<br />

LUNA (Brügger K, unpublished). Potential frameshifts<br />

were checked by sequenc<strong>in</strong>g after manual annotation and<br />

any rema<strong>in</strong><strong>in</strong>g frameshifts were considered to be au<strong>the</strong>ntic.<br />

tRNA genes and <strong>the</strong>ir <strong>in</strong>trons were identified us<strong>in</strong>g<br />

tRNAScan-SE (Lowe and Eddy 1997). All annotations<br />

were manually curated us<strong>in</strong>g Artemis software (Ru<strong>the</strong>rford<br />

et al. 2000). Start codons for s<strong>in</strong>gle genes and first genes of<br />

Sulfolobus operons were generally located 25–30 bp<br />

downstream from <strong>the</strong> archaeal hexameric TATA-like box.<br />

Only genes with<strong>in</strong> operons were preceded by Sh<strong>in</strong>e–<br />

Dalgarno motifs, where GGUG dom<strong>in</strong>ated (Torar<strong>in</strong>sson<br />

et al. 2005). Where alternative start codons occur, a<br />

selection was made on <strong>the</strong> basis of experimental data when<br />

available or on its location relative to a putative promoter<br />

and/or Sh<strong>in</strong>e–Dalgarno motif. The genome sequence<br />

accession number at Genbank/EMBL is CP002535.<br />

Results<br />

Genomic properties<br />

The A. hospitalis genome consists of a circular chromosome<br />

of 2,137,654 bp and a circular conjugative plasmid<br />

pAH1 of 28,644 bp. The chromosome has a GC content of<br />

34.2% and carries 2,389 predicted open read<strong>in</strong>g frames<br />

(ORFs), of which about half are assigned putative functions<br />

with many of <strong>the</strong> conserved hypo<strong>the</strong>tical prote<strong>in</strong>s be<strong>in</strong>g<br />

archaea-specific or specific to <strong>the</strong> Sulfolobales. About 320<br />

of <strong>the</strong> encoded prote<strong>in</strong>s are putative membrane prote<strong>in</strong>s<br />

and a fur<strong>the</strong>r 182 are predicted to be secretory prote<strong>in</strong>s.


Extremophiles (2011) 15:487–497 489<br />

The plasmid sequence is identical to that of <strong>the</strong> conjugative<br />

plasmid pAH1 isolated earlier from <strong>the</strong> A. hospitalis stra<strong>in</strong><br />

W1, except that it is 4 bp shorter (Basta et al. 2009).<br />

Comparison of <strong>the</strong> A. hospitalis genome with those of<br />

o<strong>the</strong>r members of <strong>the</strong> Sulfolobales provided no evidence of<br />

extensive conservation of gene synteny, <strong>in</strong> contrast to that<br />

observed for large regions of several Sulfolobus genomes<br />

(Guo et al. 2011), and consistent with A. hospitalis be<strong>in</strong>g<br />

relatively distant phylogenetically from <strong>the</strong>se stra<strong>in</strong>s (Basta<br />

et al. 2009). Never<strong>the</strong>less, <strong>the</strong> genome carries two major<br />

regions that are predicted to be relatively labile. They<br />

extend approximately from positions 75,000–444,500 and<br />

from 1,300,000–1,870,000 and carry most of <strong>the</strong> transposable<br />

elements, all of <strong>the</strong> <strong>CRISPR</strong> loci and cas and cmr<br />

family genes, most of <strong>the</strong> vapBC tox<strong>in</strong>–antitox<strong>in</strong> gene<br />

pairs, and many genes <strong>in</strong>volved <strong>in</strong> transport-related functions<br />

and metabolism, as well as a degenerate fuselloviral<br />

genome (Fig. 1). These two regions lack genes essential for<br />

<strong>in</strong>formational processes <strong>in</strong>clud<strong>in</strong>g DNA replication, transcription<br />

and translation and <strong>the</strong>y appear to constitute sites<br />

where non-essential genes are collected, <strong>in</strong>terchanged,<br />

exchanged <strong>in</strong>tercellularly and where genetic <strong>in</strong>novation<br />

may occur, similarly to a s<strong>in</strong>gle variable region observed <strong>in</strong><br />

several Sulfolobus genomes (Guo et al. 2011).<br />

Three orig<strong>in</strong>s of chromosomal replication, demonstrated<br />

experimentally for Sulfolobus species (Rob<strong>in</strong>son et al. 2004;<br />

Lundgren et al. 2004), were also predicted to occur <strong>in</strong> <strong>the</strong><br />

Acidianus genome. The Y component of a Z curve analysis<br />

(Zhang and Zhang 2003) revealed two major peaks correspond<strong>in</strong>g<br />

to <strong>the</strong> cdc6-3 gene (Ahos0001), and <strong>the</strong> whiP/cdt1<br />

gene (Ahos1370) and a broader peak co<strong>in</strong>cid<strong>in</strong>g with <strong>the</strong><br />

cdc6-1 gene (Ahos0780) (Fig. 1), where <strong>the</strong> three genes<br />

encode putative replication <strong>in</strong>itiators (Rob<strong>in</strong>son and Bell<br />

2007). The sequences of <strong>the</strong> cdc6 genes and whiP gene<br />

are quite conserved relative to <strong>the</strong> S. solfataricus and<br />

y - component<br />

10K<br />

8K<br />

6K<br />

4K<br />

2K<br />

0<br />

-2K<br />

-4K<br />

2<br />

Family II <strong>CRISPR</strong>s<br />

transposable elements<br />

tox<strong>in</strong>/antitox<strong>in</strong> <strong>system</strong>s<br />

Fig. 1 The Y component of a Z curve plot for <strong>the</strong> A. hospitalis<br />

chromosome show<strong>in</strong>g <strong>the</strong> three putative replication orig<strong>in</strong>s. The<br />

positions of <strong>the</strong> cdc6-3 gene (orig<strong>in</strong> 2), cdc6-1 gene (orig<strong>in</strong> 3) and<br />

<strong>the</strong> whiP/cdt1 gene (orig<strong>in</strong> 1) are <strong>in</strong>dicated as well as locations of <strong>the</strong><br />

S. islandicus genomes, as is <strong>the</strong> synteny of <strong>the</strong> flank<strong>in</strong>g genes<br />

except for <strong>the</strong> region immediately downstream from cdc6-3.<br />

Integrated genetic elements<br />

Integration of genetic elements, generally fuselloviruses or<br />

conjugative plasmids at tRNA genes, occurs commonly for<br />

genomes of <strong>the</strong> Sulfolobales (She et al. 1998; Guo et al.<br />

2011). Most <strong>in</strong>tegration events occur via a reversible<br />

archaea-specific mechanism whereby <strong>the</strong> <strong>in</strong>tegrase gene<br />

partitions <strong>in</strong>to two sections which border <strong>the</strong> <strong>in</strong>tegrated<br />

element and <strong>the</strong> N-term<strong>in</strong>al-encod<strong>in</strong>g region carry<strong>in</strong>g <strong>the</strong><br />

<strong>in</strong>tN sequence overlaps with <strong>the</strong> tRNA gene (Muskhelishvili<br />

et al. 1993). Elements that become encaptured with<strong>in</strong> <strong>the</strong><br />

chromosome subsequently degenerate and are gradually<br />

lost, but will never<strong>the</strong>less leave a trace because <strong>the</strong> <strong>in</strong>tN<br />

fragment overlapp<strong>in</strong>g <strong>the</strong> tRNA gene is generally reta<strong>in</strong>ed<br />

(She et al. 1998) (Table 1).<br />

Earlier plasmid pAH1 was sequenced and shown to<br />

<strong>in</strong>tegrate reversibly <strong>in</strong>to a tRNA Arg gene (Basta et al.<br />

2009). Genome sequenc<strong>in</strong>g of A. hospitalis revealed that a<br />

low fraction of reads matched to <strong>the</strong> junctions of <strong>the</strong><br />

<strong>in</strong>tegrated plasmid whilst <strong>the</strong> majority matched <strong>the</strong><br />

unpartitioned <strong>in</strong>tegrase gene of pAH1, consistent with both<br />

<strong>in</strong>tegrated and free forms be<strong>in</strong>g present <strong>in</strong> <strong>the</strong> culture. The<br />

<strong>in</strong>tegration site of pAH1 was located at genome positions<br />

1,075,876–1,075,946 bp with<strong>in</strong> <strong>the</strong> gene of tRNA Arg<br />

[TCG] (Table 1). In addition, <strong>the</strong> chromosome carries<br />

remnants of <strong>in</strong>tegrated elements adjo<strong>in</strong><strong>in</strong>g ano<strong>the</strong>r five<br />

<strong>in</strong>tron-less tRNA genes, each consist<strong>in</strong>g of a few genes or<br />

pseudogenes (Table 1). Three derive from fuselloviruses,<br />

one from a pDL10-like plasmid of <strong>the</strong> pRN family of<br />

cryptic plasmids (Kletz<strong>in</strong> et al. 1999) and ano<strong>the</strong>r orig<strong>in</strong>ates<br />

from an unknown element (Table 1). Whe<strong>the</strong>r <strong>the</strong>se<br />

all derive from s<strong>in</strong>gle <strong>in</strong>tegration events rema<strong>in</strong>s unclear<br />

0.5M 5S 3 16S 23S 1M 1<br />

1.5M 2M<br />

genome length<br />

Family I <strong>CRISPR</strong>s<br />

ribosomal RNA genes, <strong>the</strong> <strong>CRISPR</strong>-based <strong>system</strong>s, transposable<br />

elements of <strong>the</strong> IS200/605/607 family, and vapBC antitox<strong>in</strong>–tox<strong>in</strong><br />

gene pairs<br />

123


490 Extremophiles (2011) 15:487–497<br />

Table 1 Integration events at tRNA genes show<strong>in</strong>g <strong>the</strong> numbers of<br />

residual <strong>in</strong>tegrated genes<br />

tRNA Intron Ahos W1<br />

Arg–TCG No pAH1<br />

Pro–TGG No <strong>in</strong>tN fragment<br />

Glu–CTC No 0986a–0988<br />

because, <strong>in</strong> pr<strong>in</strong>ciple, successive <strong>in</strong>tegrations can occur at a<br />

given tRNA gene (Redder et al. 2009). An additional 8–10<br />

genes and pseudogenes, most of which are fusellovirusrelated,<br />

are clustered distantly from a tRNA gene and <strong>the</strong>y<br />

may have become displaced from one of <strong>the</strong> three tRNA<strong>in</strong>tegrated<br />

elements.<br />

Transposable elements<br />

The A. hospitalis genome carries five IS elements<br />

belong<strong>in</strong>g to <strong>the</strong> IS200/607 family, only three of which<br />

carry <strong>in</strong>tact transposase genes, and <strong>the</strong>re are 11 copies of<br />

orphan orfB elements of <strong>the</strong> IS605 family, 10 of which<br />

carry <strong>in</strong>tact orfB genes. None of <strong>the</strong>se elements carry<br />

<strong>in</strong>verted term<strong>in</strong>al repeats and <strong>the</strong>y all appear to be transposed<br />

by ‘‘cut-and-paste’’ mechanisms, with <strong>the</strong> orfB elements,<br />

at least, transpos<strong>in</strong>g via circular s<strong>in</strong>gle stranded<br />

<strong>in</strong>termediates and <strong>in</strong>sert<strong>in</strong>g after TTAC sequences (Filée<br />

et al. 2007; Ton-Hoang et al. 2010).<br />

Sulfolobus genomes generally carry IS elements from a<br />

wide variety of families most of which carry <strong>in</strong>verted term<strong>in</strong>al<br />

repeats and are mobilised by ‘‘copy-and-paste’’<br />

mechanisms, and tend to be lost by gradual degeneration<br />

and not by deletion (Blount and Grogan 2005; Redder and<br />

Garrett 2006). None of <strong>the</strong>se IS element classes were<br />

detected <strong>in</strong> <strong>the</strong> A. hospitalis genome and this suggests that<br />

<strong>the</strong> genome has rarely, if ever, taken up any of <strong>the</strong>se IS<br />

element classes.<br />

A new class of MITE-like elements<br />

fusellovirus<br />

Arg–TCT No 1232–1238<br />

unknown element<br />

Cys–GCA No 1550–1558<br />

plasmid pDL10<br />

Leu–GAG No 2147–2151<br />

fusellovirus ASV1<br />

1604–1609 kb (no tRNA) – 1778–1786<br />

fusellovirus SSV<br />

ASV1 Acidianus sp<strong>in</strong>dle-shaped virus, SSV Sulfolobus sp<strong>in</strong>dle-shaped<br />

virus<br />

Although none of <strong>the</strong> MITE elements that are common to<br />

o<strong>the</strong>r Sulfolobus genomes were detected (Redder et al.<br />

123<br />

2001; Guo et al. 2011), <strong>the</strong> A. hospitalis genome carries 10<br />

copies of a repeat sequence resembl<strong>in</strong>g a MITE-like element<br />

(Fig. 3). At one end, it carries a short open read<strong>in</strong>g<br />

frame correspond<strong>in</strong>g <strong>in</strong> am<strong>in</strong>o acid sequence to <strong>the</strong><br />

downstream end of an OrfB prote<strong>in</strong> (Fig. 3). The conserved<br />

term<strong>in</strong>al sequence and <strong>the</strong> <strong>in</strong>ternal similarity to <strong>the</strong> orfB<br />

element suggests that it could be a transposable element.<br />

This supposition is re<strong>in</strong>forced by <strong>the</strong> presence of 10 full<br />

copies <strong>in</strong> <strong>the</strong> genome (and a few degenerate copies), and<br />

also by <strong>the</strong> presence of multiple copies <strong>in</strong> some Sulfolobus<br />

and o<strong>the</strong>r crenarchaeal genomes (unpublished data).<br />

Non-cod<strong>in</strong>g RNAs<br />

Many untranslated RNAs have been characterised experimentally<br />

for different Sulfolobus species us<strong>in</strong>g a variety of<br />

techniques <strong>in</strong>clud<strong>in</strong>g prob<strong>in</strong>g cellular RNA extracts for<br />

K-turn-b<strong>in</strong>d<strong>in</strong>g motifs and generat<strong>in</strong>g cDNA libraries of<br />

total cellular RNA extracts, as well as numerous antisense<br />

RNAs (Tang et al. 2005; Omer et al. 2006; Wurtzel et al.<br />

2010). Most of <strong>the</strong>se RNAs were characterised for partial<br />

sequence and nucleotide length, and several were detected<br />

by more than one experimental approach. Based on <strong>the</strong><br />

genome sequence comparisons and gene contexts, 23<br />

putative conserved non-cod<strong>in</strong>g RNAs were annotated <strong>in</strong> <strong>the</strong><br />

A. hospitalis genome. Genes for 12 C/D box RNAs were<br />

localised of which 7 were predicted to modify rRNAs, 2 to<br />

target tRNAs and a fur<strong>the</strong>r 2 to modify unknown RNAs. In<br />

addition, a s<strong>in</strong>gle copy of a gene for an H/ACA box RNA<br />

was located which toge<strong>the</strong>r with aPus7 should generate<br />

pseudourid<strong>in</strong>e-35 <strong>in</strong> Sulfolobus pre-tRNA Tyr<br />

transcripts<br />

(Muller et al. 2009). However, <strong>in</strong> A. hospitalis, <strong>the</strong> aPus7<br />

gene (Ahos0631) is degenerate. A fur<strong>the</strong>r 10 genes were<br />

assigned to encode RNAs of unknown function. The relatively<br />

high conservation of sequence and gene synteny for<br />

<strong>the</strong>se RNAs between Sulfolobus and Acidianus species<br />

underl<strong>in</strong>es <strong>the</strong>ir potential functional importance.<br />

Read<strong>in</strong>g frame shifts and mRNA <strong>in</strong>tron splic<strong>in</strong>g<br />

Examples of translational read<strong>in</strong>g frame shifts yield<strong>in</strong>g<br />

s<strong>in</strong>gle polypeptides have been demonstrated experimentally<br />

for S. solfataricus P2 (Cobucci-Ponzano et al. 2010).<br />

For two of <strong>the</strong>se, a transketolase (Ahos1219/1218) and a<br />

putative O-sialoglycoprote<strong>in</strong> endopeptidase (Ahos0695/<br />

0696), <strong>the</strong> A. hospitalis genes overlap <strong>in</strong> a similar way, and<br />

are likely to undergo translational frame shifts. Moreover,<br />

transcripts of <strong>the</strong> <strong>in</strong>tron-carry<strong>in</strong>g cbf5 gene (Ahos0734/<br />

0735) are likely to undergo splic<strong>in</strong>g at <strong>the</strong> mRNA level by<br />

<strong>the</strong> archaeal splic<strong>in</strong>g enzyme complex (Ahos0689/0798/<br />

1417) as has been demonstrated experimentally for different<br />

crenarchaea (Yokobori et al. 2009).


Extremophiles (2011) 15:487–497 491<br />

Metabolic pathways<br />

Genome analyses <strong>in</strong>dicate <strong>the</strong> presence of versatile metabolic<br />

pathways <strong>in</strong> A. hospitalis. They suggest that it can<br />

grow autotrophically by fix<strong>in</strong>g CO2 or heterotrophically<br />

us<strong>in</strong>g yeast extract, as has been demonstrated experimentally<br />

(Basta et al. 2009). Genome analyses also revealed<br />

genes encod<strong>in</strong>g sugar transporters and glycosidases suggest<strong>in</strong>g<br />

that A. hospitalis can assimilate carbohydrates,<br />

such as starch, glucose, mannose and galactose. Moreover,<br />

enzymes are encoded that are implicated <strong>in</strong> energy generation<br />

from oxidis<strong>in</strong>g elemental sulphur, hydrogen sulphides<br />

and o<strong>the</strong>r reduced <strong>in</strong>organic sulphide compounds, but not<br />

ferrous ions. However, no hydrogenase genes were detected<br />

suggest<strong>in</strong>g that A. hospitalis cannot use H2 as electron<br />

donor for growth.<br />

Enzymes were identified for a complete TCA cycle that<br />

is important for generat<strong>in</strong>g different <strong>in</strong>termediates for <strong>the</strong><br />

biosyn<strong>the</strong>sis of many cellular components, as well as produc<strong>in</strong>g<br />

reduced electron carriers, such as NAD(P)H,<br />

reduced ferredox<strong>in</strong> (FdR) and FADH2. Formation of acetyl-<br />

CoA from pyruvate and <strong>the</strong> formation of succ<strong>in</strong>yl-<br />

CoA from 2-oxoglutarate were predicted to be catalysed,<br />

respectively, by pyruvate ferredox<strong>in</strong> oxidoreductase (Ahos<br />

1949-1952) and 2-oxoglutarate ferredox<strong>in</strong> oxidoreductase<br />

(Ahos0089/0090/0300/0301). Moreover, both enzymes<br />

were predicted to use ferredox<strong>in</strong> <strong>in</strong>stead of NAD ? as a<br />

cofactor.<br />

Genes encod<strong>in</strong>g enzymes <strong>in</strong>volved <strong>in</strong> pathways for fix<strong>in</strong>g<br />

atmosphere N2, or reduc<strong>in</strong>g nitrate and nitrite, as<br />

nitrogen sources were absent, as observed for o<strong>the</strong>r Acidianus<br />

species, and <strong>the</strong> genome analyses suggest that<br />

ammonium is an exclusive source of nitrogen that is<br />

Fig. 2 Model of pathways for<br />

oxidation and reduction of<br />

sulphur <strong>in</strong> A. hospitalis<br />

<strong>in</strong>dicat<strong>in</strong>g <strong>the</strong> predicted<br />

functions of genes <strong>in</strong> <strong>the</strong><br />

A. hospitalis genome and<br />

correspond<strong>in</strong>g gene numbers are<br />

given for each step. The<br />

follow<strong>in</strong>g abbreviations are<br />

used: OM outer membrane,<br />

IM <strong>in</strong>ner membrane,<br />

SQR sulphide:qu<strong>in</strong>one<br />

oxidoreductase,<br />

Fcc flavocytochrome c sulphide<br />

dehydrogenase, SOR sulphur<br />

oxygenase-reductase,<br />

TetH tetrathionate hydrolase,<br />

TQO thiosulphate–qu<strong>in</strong>one<br />

oxidoreductase; SulP sulphate<br />

transporter permease,<br />

QH 2 qu<strong>in</strong>ol pool<br />

assimilated via formation of carbamoyl phosphate, glutam<strong>in</strong>e<br />

and glutamate. Genes encod<strong>in</strong>g putative carbamoyl<br />

phosphate syn<strong>the</strong>tase (Ahos1106/1107), glutam<strong>in</strong>e syn<strong>the</strong>tase<br />

(Ahos0460, Ahos1272, Ahos2233) and glutamate<br />

dehydrogenase (Ahos0494) are present.<br />

Sulphur metabolism<br />

A. hospitalis encodes several enzymes <strong>in</strong>volved <strong>in</strong> sulphur<br />

metabolism, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> oxidation and reduction of sulphur,<br />

<strong>the</strong> thiosulphate–tetrathionate cycle which generates<br />

sulphate, and <strong>the</strong> participation of sulphur <strong>in</strong> electron<br />

transport. However, genes for some sulphur metabolism<br />

enzymes, <strong>in</strong>clud<strong>in</strong>g sulphite-acceptor oxidoreductase,<br />

adenos<strong>in</strong>e phosphosulphate reductase, sulphate adenylyl<br />

transferase and adenylylsulphate phosphate adenyltransferase<br />

were not found which suggested that A. hospitalis<br />

has some pathways differ<strong>in</strong>g from those of o<strong>the</strong>r Acidianus<br />

and Sulfolobus species (Kletz<strong>in</strong> 2007). Therefore, based on<br />

<strong>the</strong> gene annotations, a model is presented for <strong>the</strong> proposed<br />

sulphur oxidation and reduction pathways <strong>in</strong> A. hospitalis<br />

(Fig. 2). Extracellular H2S is oxidised by a secretory-type<br />

sulphide:qu<strong>in</strong>one oxidoreductase (Ahos0513) and flavocytochrome<br />

c sulphide dehydrogenase (Ahos0188) to produce<br />

a surface layer of sulphur on <strong>the</strong> outer cell membrane.<br />

Elemental sulphur is <strong>the</strong>n transported <strong>in</strong>to <strong>the</strong> cell by<br />

putative-SH radical transporter(s) us<strong>in</strong>g an unknown<br />

mechanism. Subsequently, sulphur is oxidised by sulphur<br />

oxygenase-reductase (Ahos0131) to yield sulphite, thiosulphate<br />

and hydrogen sulphide. Sulphite and elemental<br />

sulphur convert spontaneously and non-enzymatically to<br />

thiosulphate and elemental sulphur and, consistent with this<br />

mechanism, no candidate gene encod<strong>in</strong>g sulphite:acceptor<br />

123


492 Extremophiles (2011) 15:487–497<br />

oxidoreductase was identified <strong>in</strong> <strong>the</strong> A. hospitalis genome.<br />

Thiosulphate enters <strong>the</strong> putative thiosulphate/tetrathionate<br />

cycle and is f<strong>in</strong>ally oxidised to sulphate. The enzymes<br />

<strong>in</strong>volved <strong>in</strong> this cycle were all annotated: thiosulphate:<br />

qu<strong>in</strong>one oxidoreductase (Ahos0112-0113 and Ahos0238-<br />

0239) and tetrathionate hydrolase (Ahos1670). H2S is<br />

ei<strong>the</strong>r oxidised by <strong>the</strong> sulphide:qu<strong>in</strong>one oxidoreductase<br />

(Ahos1014) <strong>in</strong> <strong>the</strong> cytoplasm with qu<strong>in</strong>one-cytochrome as<br />

electron acceptor or it reacts with tetrathionate spontaneously<br />

under <strong>the</strong> high temperature growth conditions.<br />

F<strong>in</strong>ally, sulphate generated from sulphur oxidation is<br />

effluxed from <strong>the</strong> cell by a putative sulphate transport<br />

permease (Ahos1256). Electrons generated from sulphur<br />

oxidation enter <strong>the</strong> electron transport cha<strong>in</strong> via qu<strong>in</strong>one.<br />

Term<strong>in</strong>al qu<strong>in</strong>ol oxidase receives electrons from qu<strong>in</strong>one<br />

and transfers <strong>the</strong>m to O2 coupled with ATP generation.<br />

Some electrons may be transmitted to <strong>the</strong> NADH complex<br />

to produce NADH for use <strong>in</strong> o<strong>the</strong>r pathways.<br />

Transporters and proteolytic enzymes<br />

Twenty-eight gene products were predicted to be <strong>in</strong>volved <strong>in</strong><br />

<strong>the</strong> transport of am<strong>in</strong>o acids, oligopeptide/dipeptides and<br />

ammonium. Of <strong>the</strong>se, 19 are implicated <strong>in</strong> am<strong>in</strong>o acid<br />

transport, <strong>in</strong>clud<strong>in</strong>g 5 am<strong>in</strong>o acid transporters (Ahos0100/<br />

0163/0197/0986/1721), three am<strong>in</strong>o acid permeases (Ahos<br />

0328/0439/1725) and 11 am<strong>in</strong>o acid permease-like prote<strong>in</strong>s<br />

(Ahos0272/0276/0958/1040/1086/1868/1891/1907/1953/<br />

2065/2251) of unknown specificity for am<strong>in</strong>o acid uptake.<br />

Genes encod<strong>in</strong>g an ammonium transporter (Ahos1467) and<br />

two oligopeptide/dipeptide ABC transporter gene clusters<br />

(Ahos0337-0342 and Ahos0170-0175) are present. In<br />

addition, 21 genes were predicted to encode proteolytic<br />

enzymes, <strong>in</strong>clud<strong>in</strong>g 20 peptidases. Of <strong>the</strong>se, four are<br />

endopeptidases (Ahos0428/0516/0695-6/0800), three are<br />

am<strong>in</strong>opeptidases (Ahos0013/0588/1941), two are peps<strong>in</strong>s<br />

(Ahos1929/2087) and one is a carboxypeptidase (Ahos<br />

0991). Five of <strong>the</strong> proteolytic enzymes are predicted to be<br />

membrane-bound and are designated secretory prote<strong>in</strong>s.<br />

These results suggest that A. hospitalis, like Acidianus<br />

brierleyi (Segerer et al. 1986), Acidianus tengchongensis<br />

(He and Li 2004) and Acidianus manzaensis (Yoshida et al.<br />

2006), can grow on organic compounds, such as yeast<br />

extract, peptone, tryptone and casam<strong>in</strong>o acids.<br />

Tox<strong>in</strong>–antitox<strong>in</strong> <strong>system</strong>s<br />

VapBC complexes constitute <strong>the</strong> ma<strong>in</strong> family of antitox<strong>in</strong>–<br />

tox<strong>in</strong>s that are encoded by members of <strong>the</strong> Sulfolobales<br />

(Pandey and Gerdes 2005; Guo et al. 2011), and <strong>the</strong>y occur<br />

ma<strong>in</strong>ly <strong>in</strong> variable genomic regions where <strong>the</strong>y may<br />

undergo loss or ga<strong>in</strong> events (Guo et al. 2011). The A.<br />

hospitalis genome carries 26 vapBC gene pairs that are<br />

123<br />

concentrated <strong>in</strong> <strong>the</strong> genomic regions 350–410 and<br />

1,374–1,912 kb with a s<strong>in</strong>gle vapC-like gene ly<strong>in</strong>g <strong>in</strong> an<br />

operon (Fig. 1). The VapB antitox<strong>in</strong>s, <strong>in</strong> contrast to VapC<br />

tox<strong>in</strong>s, could be classified <strong>in</strong>to three families of transcriptional<br />

regulators, AbrB, CcdA/CopG and DUF217 (Fig. 4a), whilst<br />

no subclassification was observed for <strong>the</strong> VapC prote<strong>in</strong>s<br />

(Fig. 4b). Tree build<strong>in</strong>g based on <strong>the</strong> sequence alignments<br />

demonstrated that <strong>the</strong> sequences of <strong>the</strong>se antitox<strong>in</strong>s and<br />

tox<strong>in</strong>s are highly diverse, with sequence identities between<br />

<strong>the</strong>m rarely exceed<strong>in</strong>g 30%, as <strong>in</strong>dicated by all <strong>the</strong> prote<strong>in</strong>s<br />

exhibit<strong>in</strong>g long branches (Fig. 4). This result contrasted<br />

with <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that VapBC complexes with closely similar<br />

sequences are commonly found when compar<strong>in</strong>g different<br />

genomes of <strong>the</strong> Sulfolobales. For example, 11 of <strong>the</strong> 26<br />

VapBC prote<strong>in</strong> pairs have closely similar homologs encoded<br />

<strong>in</strong> at least 7 of <strong>the</strong> 13 available Sulfolobus genomes<br />

(Fig. 4b). This <strong>in</strong>dicates that <strong>the</strong>re is likely to be a selection<br />

aga<strong>in</strong>st <strong>the</strong> uptake of closely similar vapBC gene pairs <strong>in</strong> a<br />

given genome, despite <strong>the</strong> abundance of such gene pairs <strong>in</strong><br />

<strong>the</strong> environment.<br />

The A. hospitalis genome also encodes six copies of<br />

RelE-related tox<strong>in</strong> prote<strong>in</strong>s, <strong>in</strong> common with o<strong>the</strong>r Sulfolobus<br />

genomes (Pandey and Gerdes 2005, unpublished<br />

results). At least three of <strong>the</strong> relE genes occur <strong>in</strong> <strong>in</strong>tegrated<br />

regions carry<strong>in</strong>g degenerated conjugative plasmids, and<br />

<strong>the</strong>y show sequence similarity to prote<strong>in</strong>s encoded <strong>in</strong><br />

Sulfolobus conjugative plasmids pKEF9 (ORF69b), pING1<br />

(ORF98) and pL085 (gene no. 3195) (Greve et al. 2004;<br />

Stedman et al. 2000; Reno et al. 2009). However, none of<br />

<strong>the</strong> putative tox<strong>in</strong> genes are l<strong>in</strong>ked physically to antitox<strong>in</strong><br />

relB genes and <strong>the</strong>ir function rema<strong>in</strong>s unknown.<br />

Diverse <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s<br />

The <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of A. hospitalis can<br />

be classified <strong>in</strong>to two ma<strong>in</strong> types based on analyses of <strong>the</strong>ir<br />

Cas1 prote<strong>in</strong>, leader and repeat sequences (Shah et al.<br />

2009; Lillestøl et al. 2009). In total, <strong>the</strong>re are six <strong>CRISPR</strong><br />

loci, carry<strong>in</strong>g 129 spacer-repeat units none of which are<br />

identical (Fig. 5). The first three loci <strong>in</strong> <strong>the</strong> genome (Ahos-<br />

53, -13 and -9a) are physically l<strong>in</strong>ked by cassettes of cmr<br />

and cas family genes, each of which conta<strong>in</strong>s a vapBC<br />

antitox<strong>in</strong>–tox<strong>in</strong> gene pair, and <strong>the</strong>y constitute a family II<br />

<strong>CRISPR</strong>/Cas <strong>system</strong> (Fig. 5a). The last two <strong>CRISPR</strong> loci<br />

(Ahos-9b and 5) are coupled <strong>in</strong>to a typical family I paired<br />

<strong>CRISPR</strong>/Cas module (Fig. 5b) and <strong>the</strong>re is a vapBC gene<br />

pair immediately upstream. Preced<strong>in</strong>g <strong>the</strong> latter <strong>CRISPR</strong>/<br />

Cas module, <strong>the</strong>re is a s<strong>in</strong>gle unclassified locus (Ahos-40)<br />

that lacks both cas genes and a leader region (Fig. 5c)<br />

(Shah and Garrett 2011).<br />

We analysed <strong>the</strong> degree to which <strong>CRISPR</strong> spacers<br />

exhibited sequence matches to <strong>the</strong> many diverse genetic<br />

elements available from Acidianus and Sulfolobus species


Extremophiles (2011) 15:487–497 493<br />

us<strong>in</strong>g an earlier approach exam<strong>in</strong><strong>in</strong>g nucleotide and translated<br />

sequences of <strong>the</strong> spacers (Shah et al. 2009; Lillestøl<br />

et al. 2009). Relatively few significant sequence matches<br />

were found and most of <strong>the</strong>se were to conjugative plasmids,<br />

with a few matches to members of five different viral<br />

families (Fig. 5).<br />

Discussion<br />

At about 2.1 Mbp, <strong>the</strong> genome of A. hospitalis is much<br />

smaller than o<strong>the</strong>r sequenced genomes of members of <strong>the</strong><br />

Sulfolobales. Although this partly reflects <strong>the</strong> presence of<br />

low levels of transposable elements and few genes deriv<strong>in</strong>g<br />

from <strong>in</strong>tegrated elements, it also results from a lower<br />

diversity of metabolic and transporter genes (Guo et al.<br />

2011). The Z curve analysis suggests that <strong>the</strong> chromosome<br />

carries three replication orig<strong>in</strong>s as for Sulfolobus species<br />

(Fig. 1), although <strong>in</strong> contrast to <strong>the</strong> sequenced stra<strong>in</strong>s of S.<br />

solfataricus and S. islandicus, <strong>the</strong> whiP/cdt1 and cdc6-2<br />

genes are widely separated.<br />

Although no <strong>system</strong>atic analysis has been performed<br />

experimentally on <strong>the</strong> metabolic capacity of A. hospitalis,<br />

genome analyses revealed that A. hospitalis possesses <strong>the</strong><br />

capacity to assimilate a broad range of organic compounds,<br />

<strong>in</strong>clud<strong>in</strong>g different am<strong>in</strong>o acids and proteolytic products,<br />

which is similar to some o<strong>the</strong>r Acidianus and Sulfolobus<br />

species (Segerer et al. 1986; Grogan 1989; He et al. 2004;<br />

Yoshida et al. 2006; Plumb et al. 2007). The analyses also<br />

support that A. hospitalis can assimilate various carbohydrates,<br />

similarly to several Sulfolobus species (Grogan<br />

1989) but <strong>in</strong> contrast to some Acidianus species (Yoshida<br />

et al. 2006; Plumb et al. 2007).<br />

A. hospitalis, like o<strong>the</strong>r Acidianus and Sulfolobus species,<br />

obta<strong>in</strong>s energy for growth ma<strong>in</strong>ly via oxidation of<br />

reduced <strong>in</strong>organic sulphuric components (RISCs), and <strong>the</strong><br />

enzymes <strong>in</strong>volved were predicted from <strong>the</strong> genome analyses<br />

(Fig. 2). A sulphur oxygenase-reductase was identified<br />

show<strong>in</strong>g am<strong>in</strong>o acid sequence similarity to o<strong>the</strong>r<br />

Acidianus and Sulfolobus SORs of 67–99%, and we<br />

<strong>in</strong>ferred that it is important for elemental sulphur oxidation<br />

and reduction, as occurs <strong>in</strong> both Acidianus and Sulfolobus<br />

species (Kletz<strong>in</strong> 1989, 1992; Sun et al. 2003; Chen et al.<br />

2005a). One product of sulphur oxygenase-reductase<br />

catalysis is sulphite. Ow<strong>in</strong>g to <strong>the</strong> apparent lack of <strong>the</strong> four<br />

enzymes, sulphite-acceptor oxidoreductase, adenos<strong>in</strong>e<br />

phosphosulphate reductase, sulphate adenylyl transferase<br />

and adenylylsulphate phosphate adenyltransferase, A.<br />

hospitalis must have adopted a strategy for sulphite oxidation<br />

that differs from <strong>the</strong> currently known pathway<br />

(Kletz<strong>in</strong> 2007). Here, we propose that sulphite is channelled<br />

to thiosulphate <strong>in</strong> A. hospitalis via a spontaneous<br />

reaction with elemental sulphur, but this rema<strong>in</strong>s to be<br />

tested experimentally. Some Acidianus species, such as<br />

A. manzaensis (Yoshida et al. 2006) and A. sulfidivorans<br />

(Plumb et al. 2007) grow chemolithoautotrophically with<br />

oxidation of molecular hydrogen, but this cannot occur <strong>in</strong><br />

A. hospitalis because it apparently lacks an encoded<br />

hydrogen dehydrogenase.<br />

Transposable elements <strong>in</strong>clude a few IS200/607 elements<br />

and several orphan orfB elements which all belong to <strong>the</strong><br />

IS200/605/607 family. They lack <strong>in</strong>verted term<strong>in</strong>al repeats<br />

and are mobilised by ‘‘cut-and-paste’’ mechanisms (Filée<br />

et al. 2007; Ton-Hoang et al. 2010). No representatives of<br />

o<strong>the</strong>r transposable element families were found, common to<br />

o<strong>the</strong>r Sulfolobus genomes, which carry <strong>in</strong>verted term<strong>in</strong>al<br />

repeats and are mobilised by ‘‘copy-and-paste’’ mechanisms<br />

(Blount and Grogan 2005; Redder and Garrett 2006). It<br />

rema<strong>in</strong>s uncerta<strong>in</strong> whe<strong>the</strong>r <strong>the</strong> OrfB prote<strong>in</strong> is responsible<br />

for transposition of <strong>the</strong> orfB elements or whe<strong>the</strong>r <strong>the</strong>y are<br />

mobilised <strong>in</strong> trans by <strong>the</strong> TnpA transposase encoded by <strong>the</strong><br />

IS200/607 elements (Filée et al. 2007; Guo et al. 2011). The<br />

IS200/607 and orfB elements have been detected <strong>in</strong> Sulfolobus<br />

conjugative plasmids and orfB elements also occur <strong>in</strong> a<br />

few viruses of <strong>the</strong> Sulfolobales <strong>in</strong>clud<strong>in</strong>g four copies <strong>in</strong> <strong>the</strong><br />

Acidianus two-tailed bicaudavirus ATV (She et al. 1998;<br />

Greve et al. 2004; Prangishvili et al. 2006). Thus, <strong>the</strong>y are<br />

likely to be transmitted <strong>in</strong>tercellularly, and enter chromosomes,<br />

via such genetic elements.<br />

MITEs are common <strong>in</strong> Sulfolobus species and have been<br />

predicted to be mobilised by transposases encoded <strong>in</strong> different<br />

IS element families (Redder et al. 2001). The novel<br />

MITE-like elements <strong>in</strong> <strong>the</strong> A. hospitalis genome (Fig. 3)<br />

may derive from orfB elements and be mobilised by a<br />

similar mechanism but at present we can provide no evidence<br />

for <strong>the</strong>ir mobility. In this respect, <strong>the</strong>y may be<br />

similar to o<strong>the</strong>r Sulfolobus MITEs which show a low level<br />

of transpositional activity (Redder and Garrett 2006). This<br />

is consistent with <strong>the</strong> hypo<strong>the</strong>sis that MITEs drive <strong>the</strong><br />

evolutionary diversification of <strong>the</strong>ir mobilis<strong>in</strong>g transposases<br />

to <strong>the</strong> po<strong>in</strong>t that <strong>the</strong>y are no longer recognised which<br />

leads to <strong>the</strong>ir immobilisation and subsequent degeneration<br />

(Feschotte and Pritham 2007).<br />

All of <strong>the</strong> <strong>in</strong>tegrated elements, except one, could be<br />

identified as orig<strong>in</strong>at<strong>in</strong>g from fuselloviruses or a pDL10like<br />

member of <strong>the</strong> pRN family of cryptic plasmids<br />

(Kletz<strong>in</strong> et al. 1999), and <strong>the</strong> conjugative plasmid pAH1<br />

was already shown to reversibly <strong>in</strong>tegrate at a tRNA Arg<br />

[TCG] gene (Basta et al. 2009). None of <strong>the</strong>se events<br />

occurred with<strong>in</strong> any of <strong>the</strong> 15 tRNA genes carry<strong>in</strong>g <strong>in</strong>trons<br />

and this observation is consistent with <strong>the</strong> hypo<strong>the</strong>sis that<br />

archaeal <strong>in</strong>trons protect tRNA genes aga<strong>in</strong>st <strong>in</strong>tegration<br />

events (Guo et al. 2011).<br />

VapBC constitutes <strong>the</strong> predom<strong>in</strong>ant antitox<strong>in</strong>–tox<strong>in</strong><br />

family found amongst <strong>the</strong> Sulfolobales and <strong>the</strong> A. hospitalis<br />

genome carries 26 vapBC gene pairs, more than occur<br />

123


494 Extremophiles (2011) 15:487–497<br />

Fig. 3 Alignment of 10 MITE-like repeat elements present <strong>in</strong> <strong>the</strong> genome of A. hospitalis. The shaded area denotes to a small open read<strong>in</strong>g<br />

frame correspond<strong>in</strong>g to <strong>the</strong> downstream part of <strong>the</strong> OrfB found with<strong>in</strong> transposable orfB elements<br />

A antitox<strong>in</strong>s [VapB] B tox<strong>in</strong>s [VapC]<br />

5%<br />

ORF<br />

0374<br />

2101<br />

0399<br />

0394<br />

0264<br />

0209<br />

0412<br />

1712<br />

1520<br />

1610<br />

0183*<br />

0356<br />

class<br />

0361<br />

1738<br />

1728<br />

0354<br />

1673<br />

1997<br />

1644 CcdA/CopG<br />

1587<br />

2059<br />

0206<br />

1524<br />

1582<br />

1978 DUF217<br />

Fig. 4 VapBC trees. Phylogenetic trees for a VapB antitox<strong>in</strong>s and<br />

b VapC tox<strong>in</strong>s. They demonstrate that VapBs, despite <strong>the</strong>ir high<br />

sequence diversity, can be classified <strong>in</strong>to three ma<strong>in</strong> families AbrB,<br />

CcdA/CopG and DUF217, whereas <strong>the</strong> VapCs are highly diverse <strong>in</strong><br />

<strong>the</strong>ir sequences but cannot be classified <strong>in</strong>to major subgroups. The<br />

Ahos gene numbers are given for each prote<strong>in</strong>. Moreover, <strong>the</strong> class of<br />

<strong>the</strong> VapB correspond<strong>in</strong>g to each VapC is given <strong>in</strong> b. The degree of<br />

conservation of <strong>the</strong> VapC prote<strong>in</strong>s <strong>in</strong> <strong>the</strong> available 13 Sulfolobus<br />

<strong>in</strong> more rapidly grow<strong>in</strong>g Sulfolobus species (Pandey and<br />

Gerdes 2005; Guo et al. 2011). Moreover, <strong>the</strong> groups of<br />

VapB and VapC prote<strong>in</strong>s are highly diverse <strong>in</strong> sequence<br />

(Fig. 4). Antitox<strong>in</strong>–tox<strong>in</strong>s were orig<strong>in</strong>ally shown to<br />

enhance plasmid ma<strong>in</strong>tenance as a consequence of <strong>the</strong><br />

growth of plasmid-free cells be<strong>in</strong>g preferentially <strong>in</strong>hibited<br />

by free tox<strong>in</strong>s which are <strong>in</strong>herently more stable than <strong>the</strong><br />

antitox<strong>in</strong>s (Gerdes 2000). By analogy with this mechanism,<br />

123<br />

AbrB<br />

5%<br />

ORF<br />

0712<br />

1521<br />

0210<br />

0355<br />

0353<br />

1674<br />

1737<br />

0362<br />

1729<br />

1996<br />

0400<br />

0265<br />

0183<br />

1583<br />

1979<br />

0375<br />

1611<br />

0205<br />

1713<br />

2058<br />

0395<br />

0413<br />

1645<br />

2102<br />

1586<br />

1663<br />

1525<br />

antitox<strong>in</strong><br />

class<br />

N/A<br />

AbrB<br />

AbrB<br />

AbrB<br />

CcdA<br />

CcdA<br />

CcdA<br />

CcdA<br />

CcdA<br />

CcdA<br />

AbrB<br />

AbrB<br />

AbrB<br />

DUF217<br />

DUF217<br />

AbrB<br />

AbrB<br />

DUG217<br />

AbrB<br />

CcdA<br />

AbrB<br />

AbrB<br />

CcdA<br />

AbrB<br />

CcdA<br />

unknown<br />

Duf217<br />

o<strong>the</strong>r<br />

genomes<br />

12<br />

3<br />

1<br />

6<br />

8<br />

1<br />

12<br />

7<br />

0<br />

0<br />

4<br />

13<br />

7<br />

7<br />

2<br />

4<br />

0<br />

4<br />

0<br />

13<br />

7<br />

7<br />

0<br />

1<br />

8<br />

4<br />

0<br />

genomes is <strong>in</strong>dicated <strong>in</strong> b where 0 <strong>in</strong>dicates it is absent from all <strong>the</strong><br />

genomes whilst 13 <strong>in</strong>dicates that it is present <strong>in</strong> all. The antitox<strong>in</strong><br />

correspond<strong>in</strong>g to VapC-0183 is not annotated <strong>in</strong> <strong>the</strong> genome because<br />

it lacks a start codon but it is <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> figure. The VapC-like<br />

prote<strong>in</strong> (Ahos0712) is part of <strong>the</strong> operon with a translation-related<br />

prote<strong>in</strong> and lacks a VapB. The Ahos1664/1663 pair are variant ORFs<br />

where both VapB and VapC are longer than usual and <strong>the</strong> VapB does<br />

not cluster with <strong>the</strong> families <strong>in</strong> a<br />

it was proposed that chromosomally encoded tox<strong>in</strong>s may<br />

facilitate ma<strong>in</strong>tenance of local DNA regions where vapBC<br />

gene pairs are located that might o<strong>the</strong>rwise be prone to loss<br />

(Magnuson 2007; Van Melderen 2010). This hypo<strong>the</strong>sis is<br />

consistent with <strong>the</strong> observation that most of <strong>the</strong> A. hospitalis<br />

vapBC gene pairs lie with<strong>in</strong> <strong>the</strong> two variable genomic<br />

regions where DNA regions are exchanged (Fig. 1).<br />

Moreover, it receives strong support from both <strong>the</strong> high


Extremophiles (2011) 15:487–497 495<br />

A<br />

Family II<br />

+ RAMP<br />

B<br />

Family I<br />

C<br />

unknown<br />

vapBC vapBC<br />

353<br />

354<br />

363<br />

vapBC<br />

1737<br />

1738<br />

355<br />

356<br />

diversity, and <strong>the</strong> uniqueness of all <strong>the</strong> VapC prote<strong>in</strong>s<br />

encoded with<strong>in</strong> <strong>the</strong> A. hospitalis chromosome (Fig. 4b),<br />

because any similar VapBC complexes would compensate<br />

for <strong>the</strong> loss of one ano<strong>the</strong>r, <strong>the</strong>reby underm<strong>in</strong><strong>in</strong>g any DNA<br />

ma<strong>in</strong>tenance activity.<br />

In slowly grow<strong>in</strong>g organisms, from nutrient poor environments,<br />

multiple tox<strong>in</strong>s are also assumed to be <strong>in</strong>volved<br />

<strong>in</strong> stress response and/or quality control (Gerdes 2000;<br />

Pandey and Gerdes 2005). Involvement <strong>in</strong> stress response<br />

entails that <strong>the</strong> more stable tox<strong>in</strong>s <strong>in</strong>hibit growth and allow<br />

<strong>the</strong> host to lie <strong>in</strong> a dormant state dur<strong>in</strong>g <strong>the</strong> period of<br />

environmental stress (Gerdes 2000). However, <strong>the</strong>re may<br />

also be a negative effect on host growth due to <strong>the</strong> cont<strong>in</strong>uous<br />

presence of low levels of free tox<strong>in</strong> (Wilbur et al.<br />

2005). Thus, <strong>the</strong> presence of many vapBC gene pairs<br />

<strong>in</strong> A. hospitalis could reflect a compromise between <strong>the</strong><br />

ability to survive different environmental stresses and<br />

ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g an adequate growth rate under normal conditions.<br />

This would be also consistent with <strong>the</strong> presence of<br />

three families of VapB prote<strong>in</strong>s and high sequence diversity<br />

of <strong>the</strong> VapC prote<strong>in</strong>s, s<strong>in</strong>ce functionally overlapp<strong>in</strong>g<br />

<strong>system</strong>s would be redundant for stress responses and <strong>the</strong>y<br />

would confer an unnecessary burden on host growth. The<br />

proposed dual roles of ma<strong>in</strong>tenance of local chromosomal<br />

DNA regions and provid<strong>in</strong>g resistance to stress and are not<br />

mutually exclusive.<br />

Although <strong>the</strong> mechanism of action of VapC tox<strong>in</strong>s<br />

rema<strong>in</strong>s unknown (Arcus et al. 2011), <strong>in</strong> A. hospitalis, a<br />

s<strong>in</strong>gle vapC-like gene (Ahos0712) is directly coupled to<br />

genes encod<strong>in</strong>g prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> transcription and <strong>in</strong>itiator<br />

tRNA b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> ribosome, and this gene cassette<br />

53<br />

<strong>CRISPR</strong><br />

cas4 csx1 vapBC<br />

csm1 csm2 csm3<br />

<strong>CRISPR</strong><br />

csa1 vapBC PaREP cas2 cas1 <strong>CRISPR</strong><br />

cas6<br />

1739<br />

364<br />

365<br />

52 12 8<br />

cas6 csaX casHD cas3 cas5 csa2 csa5csa3 csa1 cas1 cas2 cas4 csa3<br />

8 4<br />

Fig. 5 Schematic representations of <strong>the</strong> <strong>CRISPR</strong> loci of A. hospitalis.<br />

a Family II <strong>CRISPR</strong> module carry<strong>in</strong>g three <strong>CRISPR</strong> loci and Cmr and<br />

Cas family gene cassettes which are both <strong>in</strong>terrupted by, or bordered<br />

by, four vapBC gene pairs (orange). b Paired family I <strong>CRISPR</strong>/Cas<br />

<strong>system</strong> flanked by one vapBC gene pair, and c. an unclassified<br />

<strong>CRISPR</strong> locus lack<strong>in</strong>g a leader region and adjacent cas genes. csm1 is<br />

a homolog of cmr2, csm2 is a homolog of cmr5 and csm3 is a<br />

357<br />

1740<br />

1741<br />

366<br />

1742<br />

367<br />

368<br />

1743<br />

358<br />

369<br />

39<br />

1744<br />

359<br />

370<br />

371<br />

1745<br />

1746<br />

360<br />

13<br />

1747<br />

9b<br />

361<br />

362<br />

372<br />

1748<br />

1749<br />

373<br />

*<br />

1750<br />

1751<br />

374<br />

375<br />

376<br />

1752<br />

5<br />

homolog of cmr4. The light blue genes each carry two short RAMP<br />

motifs. a–c Structures of <strong>the</strong> <strong>in</strong>dividual <strong>CRISPR</strong> loci are shown<br />

toge<strong>the</strong>r with <strong>the</strong> leader region (L) where each triangle represents a<br />

spacer-repeat unit. Significant spacer matches to sequenced viruses<br />

and plasmids are colour coded: red rudivirus, orange lipothrixvirus,<br />

yellow fusellovirus, green bicaudavirus, turquoise turreted icosahedral<br />

virus, blue conjugative plasmid and violet cryptic plasmid<br />

is highly conserved <strong>in</strong> gene content and sequence <strong>in</strong> o<strong>the</strong>r<br />

Sulfolobus genomes (Guo et al. 2011). This suggests that<br />

this VapC prote<strong>in</strong>, at least, may also regulate or <strong>in</strong>hibit<br />

translational <strong>in</strong>itiation by b<strong>in</strong>d<strong>in</strong>g at <strong>the</strong> ribosomal A-site,<br />

as demonstrated recently for a RelE type tox<strong>in</strong> (Neubauer<br />

et al. 2009). A similar <strong>in</strong>activation mechanism would be<br />

plausible for <strong>the</strong> VapC tox<strong>in</strong>s, if one assumes that<br />

expression of <strong>the</strong> <strong>in</strong>dividual VapBC complexes is stimulated<br />

by ei<strong>the</strong>r <strong>the</strong> requirement to ma<strong>in</strong>ta<strong>in</strong> different local<br />

regions of chromosomal DNA or different environmental<br />

stresses.<br />

Despite <strong>the</strong> complexity of <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong><br />

<strong>system</strong>s present <strong>in</strong> <strong>the</strong> genome, <strong>the</strong>y appear to be, at best,<br />

only partially functional. Thus, <strong>the</strong> family II <strong>CRISPR</strong>/Cas<br />

<strong>system</strong> is coupled with an archaeal family D Cmr module<br />

<strong>in</strong> A. hospitalis, but is apparently defective, reta<strong>in</strong><strong>in</strong>g only<br />

its putative RNA, but not DNA, target<strong>in</strong>g function. The<br />

<strong>system</strong> lacks <strong>the</strong> group 2 cas genes (cas3, cas5, csa2, csa5,<br />

csaX) which encode prote<strong>in</strong>s implicated <strong>in</strong> target<strong>in</strong>g and<br />

<strong>in</strong>activat<strong>in</strong>g foreign DNA elements (Fig. 5). However, <strong>the</strong><br />

cas group 1 genes (cas1, cas2, cas4, csa1), putatively<br />

<strong>in</strong>volved <strong>in</strong> <strong>in</strong>tegrat<strong>in</strong>g new spacers from <strong>in</strong>vad<strong>in</strong>g DNA<br />

elements are present, and <strong>the</strong> Cmr module implicated <strong>in</strong><br />

RNA target<strong>in</strong>g are also present (Garrett et al. 2011; Shah<br />

et al. 2011). The family I <strong>system</strong> exhibits small <strong>CRISPR</strong><br />

loci, with <strong>in</strong>tact leader regions and group 2 cas genes.<br />

However, <strong>the</strong> cas2 gene <strong>in</strong> <strong>the</strong> group 1 cas gene cassette is<br />

truncated, hav<strong>in</strong>g <strong>in</strong>curred a po<strong>in</strong>t mutation which produces<br />

a premature stop codon. Thus, this <strong>system</strong> has apparently<br />

lost <strong>the</strong> ability to <strong>in</strong>tegrate new spacers. This suggests that<br />

nei<strong>the</strong>r <strong>CRISPR</strong>-based <strong>system</strong> is fully functional, despite<br />

377<br />

378<br />

9a<br />

379<br />

123


496 Extremophiles (2011) 15:487–497<br />

<strong>the</strong>ir apparent complexity. The presence of five vapBC<br />

gene pairs located ei<strong>the</strong>r with<strong>in</strong> <strong>the</strong> cmr and cas gene<br />

cassettes of <strong>the</strong> family II <strong>CRISPR</strong>/Cas module, or immediately<br />

upstream from <strong>the</strong> modules of both families, may<br />

reflect that <strong>the</strong>y help to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong>se gene cassettes on <strong>the</strong><br />

chromosome (see above).<br />

Although a range of genetic <strong>system</strong>s have been developed<br />

for Sulfolobus species, at present no genetic <strong>system</strong>s<br />

are available for <strong>the</strong> Acidianus genus and A. hospitalis<br />

provides a promis<strong>in</strong>g candidate for such studies. It has a<br />

m<strong>in</strong>imal size and <strong>the</strong> relative stability of its chromosome<br />

suggests that it is likely to generate stable deletion mutants.<br />

This, comb<strong>in</strong>ed with its ability to host different plasmids<br />

and viruses provides a promis<strong>in</strong>g start<strong>in</strong>g po<strong>in</strong>t for develop<strong>in</strong>g<br />

a genetic <strong>system</strong>.<br />

Acknowledgments We thank Mery P<strong>in</strong>a and Tamara Basta for help<br />

with <strong>the</strong> DNA preparation. The work was supported by <strong>the</strong> National<br />

Nature Science Foundation of Ch<strong>in</strong>a (30621005) and <strong>the</strong> M<strong>in</strong>istry of<br />

Science and Technology (2010CB630903), and by <strong>the</strong> Danish Natural<br />

Science Research Council (Grant no. 272-08-0391) and Danish<br />

National Research Foundation.<br />

Open Access This article is distributed under <strong>the</strong> terms of <strong>the</strong><br />

Creative Commons Attribution Noncommercial License which permits<br />

any noncommercial use, distribution, and reproduction <strong>in</strong> any<br />

medium, provided <strong>the</strong> orig<strong>in</strong>al author(s) and source are credited.<br />

References<br />

Arcus VL, McKenzie JL, Robson J, Cook GM (2011) The PINdoma<strong>in</strong><br />

ribonucleases and <strong>the</strong> prokaryotic VapBC tox<strong>in</strong>–antitox<strong>in</strong><br />

array. Prot Eng<strong>in</strong> Design Select 24:33–40<br />

Basta T, Smyth J, Forterre P, Prangishvili D, Peng X (2009) Novel<br />

archaeal plasmid pAH1 and its <strong>in</strong>teractions with <strong>the</strong> lipothrixvirus<br />

AFV1. Mol Microbiol 71:23–34<br />

Bettstetter M, Peng X, Garrett RA, Prangishvili D (2003) AFV1, a<br />

novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic archaea of <strong>the</strong> genus<br />

Acidianus. Virology 315:68–79<br />

Blount ZD, Grogan DW (2005) New <strong>in</strong>sertion sequences of<br />

Sulfolobus: functional properties and implications for genome<br />

evolution <strong>in</strong> hyper<strong>the</strong>rmophilic archaea. Mol Microbiol 55:312–<br />

325<br />

Chen Z-W, Jiang C-Y, She Q, Liu S-J, Zhou P-J (2005a) Key role of<br />

cyste<strong>in</strong>e residues <strong>in</strong> catalysis and subcellular localization of<br />

sulfur oxygenase reductase of Acidianus tengchongensis. Appl<br />

Environ Microbiol 71:621–628<br />

Chen L, Brügger K, Skovgaard M, Redder P, She Q, Torar<strong>in</strong>sson E,<br />

Greve B, Awayez M, Zibat A, Klenk HP, Garrett RA (2005b)<br />

The genome of Sulfolobus acidocaldarius, a model organism of<br />

<strong>the</strong> Crenarchaeota. J Bacteriol 187:4992–4999<br />

Cobucci-Ponzano B, Guzz<strong>in</strong>i L, Benelli D, Londei P, Perrodou E,<br />

Lecompte O, Tran D, Sun J, Wei J, Mathur EJ, Rossi M, Moracci<br />

M (2010) Functional characterisation and high-throughput<br />

proteomic analysis of <strong>in</strong>terrupted genes <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />

solfataricus. J Proteome Res 9:2496–2507<br />

Feschotte C, Pritham EJ (2007) DNA transposons and <strong>the</strong> evolution<br />

of eukaryotic genomes. Annu Rev Genet 41:331–368<br />

123<br />

Filée J, Siguier P, Chandler M (2007) Insertion sequence diversity <strong>in</strong><br />

archaea. Microbiol Mol Biol Revs 71:121–157<br />

Garrett RA, Shah SA, Vestergaard G, Deng L, Gudbergsdottir S,<br />

Kenchappa CS, Erdmann S, She Q (2011) <strong>CRISPR</strong>-based<br />

<strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales—complexity and diversity.<br />

Biochem Soc Trans 39:51–57<br />

Gerdes K (2000) Tox<strong>in</strong>-antitox<strong>in</strong> modules may regulate dynthsis<br />

of macromolecules dur<strong>in</strong>g nutritional stress. J Bacteriol 182:561–<br />

572<br />

Goulet A, Blangy S, Redder P, Prangishvili D, Felisberto-Rodrigues<br />

C, Forterre P, Campanacci V, Cambillau C (2009) Acidianus<br />

filamentous virus 1 coat prote<strong>in</strong>s display a helical fold spann<strong>in</strong>g<br />

<strong>the</strong> filamentous archaeal viruses l<strong>in</strong>eage. Proc Natl Acad Sci<br />

USA 106:21155–21160<br />

Greve B, Jensen S, Brügger K, Zillig W, Garrett RA (2004) Genomic<br />

comparison of archaeal conjugative plasmids from Sulfolobus.<br />

<strong>Archaea</strong> 1:231–239<br />

Grogan DW (1989) Phenotypic characterization of <strong>the</strong> archaebacterial<br />

genus Sulfolobus: comparison of five wild-type stra<strong>in</strong>s. J Bacteriol<br />

171:6710–6719<br />

Guo L, Brügger K, Liu C, Shah SA, Zheng H, Zhu Y, Wang S,<br />

Lillestøl RK, Chen L, Frank J, Prangishvili D, Paul<strong>in</strong> L, She Q,<br />

Huang L, Garrett RA (2011) Genome analyses of Icelandic<br />

stra<strong>in</strong>s of Sulfolobus islandicus: model organisms for genetic and<br />

virus-host <strong>in</strong>teraction studies. J Bacteriol 193:1672–1680<br />

He Z-G, Zhong H, Li Y (2004) Acidianus tengchongensis sp. nov., a<br />

new species of acido<strong>the</strong>rmophilic archaeon isolated from an<br />

acido<strong>the</strong>rmal spr<strong>in</strong>g. Curr Microbiol 48:156–193<br />

Kletz<strong>in</strong> A (1989) Coupled enzymatic production of sulfite, thiosulfate,<br />

and hydrogen sulfide from sulfur: purification and properties of a<br />

sulfur oxygenase reductase from <strong>the</strong> facultatively anaerobic<br />

archaebacterium Desulfurolobus ambivalens.JBacteriol171:1638–<br />

1643<br />

Kletz<strong>in</strong> A (1992) Molecular characterization of <strong>the</strong> sor gene, which<br />

encodes <strong>the</strong> sulfur oxygenase/reductase of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />

Archaeum Desulfurolobus ambivalens. J Bacteriol 174:5854–<br />

5859<br />

Kletz<strong>in</strong> A (2007) Oxidation of sulfur and <strong>in</strong>organic sulfur compounds<br />

<strong>in</strong> Acidianus ambivalens. In: Dahl C, Friedrich CG (eds)<br />

Microbial sulfur metabolism. Spr<strong>in</strong>ger, Heidelberg, pp 184–199<br />

Kletz<strong>in</strong> A, Lieke A, Urich T, Charlebois RL, Sensen CW (1999)<br />

Molecular analysis of pDL10 from Acidianus ambivalens reveals<br />

a family of related plasmids from extremely <strong>the</strong>rmophilic and<br />

acidophilic archaea. Genetics 152:1307–1314<br />

Lawrence CM, Menon S, Eilers BJ, Bothner B, Khayat R, Douglas T,<br />

Young MJ (2009) Structural and functional studies of archaeal<br />

viruses. J Biol Chem 284:12599–12603<br />

Lillestøl RK, Shah SA, Brügger K, Redder P, Phan H, Christiansen J,<br />

Garrett RA (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />

Sulfolobus: bidirectional transcription and dynamic properties.<br />

Mol Microbiol 72:259–272<br />

Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved<br />

detection of transfer RNA genes <strong>in</strong> genomic sequence. Nucleic<br />

Acids Res 25:955–964<br />

Lundgren M, Andersson A, Chen L, Nilsson P, Bernander R (2004)<br />

Three replication orig<strong>in</strong>s <strong>in</strong> Sulfolobus species: synchronous<br />

<strong>in</strong>itiation of chromosome replication and asynchronous term<strong>in</strong>ation.<br />

Proc Natl Acad Sci USA 101:7046–7051<br />

Magnuson RD (2007) Hypo<strong>the</strong>tical functions of tox<strong>in</strong>–antitox<strong>in</strong><br />

<strong>system</strong>s. J Bacteriol 189:6089–6092<br />

Melderen LV (2010) Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s: why so many, what<br />

for? Curr Op<strong>in</strong> Microbiol 13:781–785<br />

Muller S, Urban A, Hecker A, Leclerc A, Branlant C, Motor<strong>in</strong> Y<br />

(2009) Deficiency of <strong>the</strong> tRNA Tyr :W35-synthase aPus7 <strong>in</strong><br />

archaea of <strong>the</strong> Sulfolobales order might be rescued by <strong>the</strong>


Extremophiles (2011) 15:487–497 497<br />

H/ACA sRNA-guided mach<strong>in</strong>ery. Nucleic Acids Res 37:1308–<br />

1322<br />

Muskhelishvili G, Palm P, Zillig W (1993) SSV1-encoded sitespecific<br />

recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol Gen<br />

Genet 273:334–342<br />

Neubauer C, Gao YG, Andersen KR, Dunham CM, Kelley AC,<br />

Hentschel J, Gerdes K, Ramakrishnan V, Brodersen DE (2009)<br />

The structural basis for mRNA recognition and cleavage by <strong>the</strong><br />

ribosome-dependent endonuclease RelE. Cell 139:1084–1095<br />

Omer AD, Zago M, Chang A, Dennis PP (2006) Prob<strong>in</strong>g <strong>the</strong> structure<br />

and function of an archaeal C/D-box methylation guide sRNA.<br />

RNA 12:1708–1720<br />

Pandey DP, Gerdes K (2005) Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly abundant<br />

<strong>in</strong> free-liv<strong>in</strong>g but lost from host-asscoiated prokaryotes. Nucleic<br />

Acids Res 33:966–976<br />

Plumb JJ, Haddad CM, Gibson JAE, Franzmann PD (2007) Acidianus<br />

sulfidivorans sp nov., an extremely acidophilic, <strong>the</strong>rmophilic<br />

archaeon isolated from a solfatara on Lihir Island, Papua New<br />

Gu<strong>in</strong>ea, and amendation of <strong>the</strong> genus description. Int J Syst Evol<br />

Microbiol 57:1418–1423<br />

Prangishvili D, Albers SV, Holz I, Arnold HP, Stedman K, Kle<strong>in</strong> T,<br />

S<strong>in</strong>gh H, Hiort J, Schweier A, Kristjansson JK, Zillig W (1998)<br />

Conjugation <strong>in</strong> archaea: frequent occurrence of conjugative<br />

plasmids <strong>in</strong> Sulfolobus. Plasmid 40:190–202<br />

Prangishvili D, Forterre P, Garrett RA (2006) Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />

a unify<strong>in</strong>g view. Nat Rev Microbiol 4:837–848<br />

Rachel R, Bettstetter M, Hedlund BP, Här<strong>in</strong>g M, Kessler A, Stetter<br />

KO, Prangishvili D (2002) Arch Virol 147:2419–2429<br />

Redder P, Garrett RA (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />

genome of Sulfolobus solfataricus P2. J Bacteriol 188:4198–4206<br />

Redder P, She Q, Garrett RA (2001) Non-autonomous elements <strong>in</strong> <strong>the</strong><br />

crenarchaeon Sulfolobus solfataricus. J Mol Biol 306:1–6<br />

Redder P, Peng X, Brügger K, Shah SA, Roesch F, Greve B, She Q,<br />

Schleper C, Forterre P, Garrett RA, Prangishvili D (2009) Four<br />

newly isolated fuselloviruses from extreme geo<strong>the</strong>rmal environments<br />

reveal unusual morphologies and a possible <strong>in</strong>terviral<br />

recomb<strong>in</strong>ation mechanism. Environ Microbiol 11:2849–2862<br />

Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ (2009)<br />

Sulfolobus islandicus pan-genome. Proc Natl Acad Sci USA<br />

106:8605–8610<br />

Rob<strong>in</strong>son NP, Bell SD (2007) Extrachromosomal element capture and<br />

<strong>the</strong> evolution of multiple replication orig<strong>in</strong>s <strong>in</strong> archaeal<br />

chromosomes. Proc Natl Acad Sci USA 104:5806–5811<br />

Rob<strong>in</strong>son NP, Dionne I, Lundgren M, Marsh VL, Bernander R, Bell<br />

SD (2004) Identification of two orig<strong>in</strong>s of replication <strong>in</strong> <strong>the</strong><br />

s<strong>in</strong>gle chromosome of <strong>the</strong> archaeon Sulfolobus solfataricus. Cell<br />

116:25–38<br />

Ru<strong>the</strong>rford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream<br />

MA, Barrell B (2000) Artemis: sequence visualization and<br />

annotation. Bio<strong>in</strong>formatics 16:944–945<br />

Segerer A, Neuner A, Kristjansson JK, Stetter KO (1986) Acidanus<br />

<strong>in</strong>fernus gen. nov., sp. nov., and Acidianus brierleyi comb. nov.:<br />

facultatively aerobic, extremely acidophilic <strong>the</strong>rmophilic sulfurmetaboliz<strong>in</strong>g<br />

archaebacteria. Int J Syst Bacteriol 36:559–564<br />

Shah SA, Garrett RA (2011) <strong>CRISPR</strong>/Cas and Cmr modules, mobility<br />

and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res Microbiol<br />

162:27–38<br />

Shah SA, Hansen NR, Garrett RA (2009) Distributions of <strong>CRISPR</strong><br />

spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />

and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />

Trans Biochem Soc 37:23–28<br />

Shah SA, Vestergaard G, Garrett RA (2011) <strong>CRISPR</strong>/Cas and<br />

<strong>CRISPR</strong>/Cmr <strong>immune</strong> <strong>system</strong>s of archaea. In: Marchfelder A,<br />

Hess W (eds) Regulatory RNAs <strong>in</strong> prokaryotes. Spr<strong>in</strong>ger, Berl<strong>in</strong><br />

She Q, Phan H, Garrett RA, Albers S-V, Stedman KM, Zillig W<br />

(1998) Genetic profile of pNOB8 from Sulfolobus: <strong>the</strong> first<br />

conjugative plasmid from an archaeon. Extremophiles 2:417–<br />

425<br />

Stedman KM, She Q, Phan H, Holz I, S<strong>in</strong>gh H, Prangishvili D, Garrett<br />

RA, Zillig W (2000) The pING family of conjugative plasmids<br />

from <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus islandicus:<br />

<strong>in</strong>sights <strong>in</strong>to recomb<strong>in</strong>ation and conjugation <strong>in</strong> Crenarchaeota.<br />

J Bacteriol 182:7014–7020<br />

Sun CW, Chen ZW, He ZG, Zhou PJ, Liu SJ (2003) Purification and<br />

properties of <strong>the</strong> sulphur oxygenase/reductase from <strong>the</strong> acido<strong>the</strong>rmophilic<br />

archaeon, Acidianus stra<strong>in</strong> S5. Extremophiles<br />

7:131–134<br />

Tang TH, Polacek N, Zywicki M, Huber H, Brügger K, Garrett R,<br />

Bachellerie JP, Hüttenhofer A (2005) Identification of novel<br />

non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong><br />

archaeon Sulfolobus solfataricus. Mol Microbiol 55:469–481<br />

Ton-Hoang B, Pasternak C, Siguier P, Guynet C, Hickman AB, Dyda<br />

F, Sommer S, Chandler M (2010) S<strong>in</strong>gle-stranded DNA<br />

transposition is coupled to host replication. Cell 142:398–408<br />

Torar<strong>in</strong>sson E, Klenk H-P, Garrett RA (2005) Divergent transcriptional<br />

and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ Microbiol<br />

7:47–54<br />

Wilbur JS, Chivers PT, Mattison K, Potter L, Brennan RG, So M<br />

(2005) Neisseria gonorrheae FitA <strong>in</strong>teracts with FitB to b<strong>in</strong>d<br />

DNA through its ribbon–helix–helix motif. Biochemistry 44:<br />

12515–12524<br />

Wurtzel O, Sapra R, Chen F, Zhu ZY, Simmons BA, Sorek R (2010)<br />

A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />

Genome Res 20:133–141<br />

Yokobori S, Itoh T, Yosh<strong>in</strong>ari S, Nomura N, Sako Y, Yamagishi A,<br />

Oshima T, Kita K, Watanabe Y (2009) Ga<strong>in</strong> and loss of an <strong>in</strong>tron<br />

<strong>in</strong> a prote<strong>in</strong>-cod<strong>in</strong>g gene <strong>in</strong> <strong>Archaea</strong>: <strong>the</strong> case of an archaeal<br />

RNA pseudourid<strong>in</strong>e synthase gene. BMC Evol Biol 9:198<br />

Yoshida N, Nakasato M, Ohmura N, Ando A, Saolo J, Ishii M,<br />

Igarashi Y (2006) Acidianus manzaensis sp. nov., a novel<br />

<strong>the</strong>rmoacidophilic Archaeon grow<strong>in</strong>g autotrophicallly by <strong>the</strong><br />

oxidation of H 2 with <strong>the</strong> reduction of Fe 3? . Curr Microbiol<br />

53:406–411<br />

Zhang R, Zhang CT (2003) Multiple replication orig<strong>in</strong>s of <strong>the</strong><br />

archaeon Halobacterium species NRC-1. Biochem Biophys Res<br />

Comm 302:728–734<br />

123


Review<br />

<strong>Archaea</strong>l <strong>CRISPR</strong>-based <strong>immune</strong><br />

<strong>system</strong>s: exchangeable functional<br />

modules<br />

Roger A. Garrett, Gisle Vestergaard and Shiraz A. Shah<br />

<strong>Archaea</strong> Centre, Department of Biology, Ole Maaløes Vej 5, University of Copenhagen, DK2200 Copenhagen N, Denmark<br />

<strong>CRISPR</strong> (clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />

repeats)-based <strong>immune</strong> <strong>system</strong>s are essentially<br />

modular with three primary functions: <strong>the</strong> excision and<br />

<strong>in</strong>tegration of new spacers, <strong>the</strong> process<strong>in</strong>g of <strong>CRISPR</strong><br />

transcripts to yield mature <strong>CRISPR</strong> RNAs (crRNAs), and<br />

<strong>the</strong> target<strong>in</strong>g and cleavage of foreign nucleic acid. The<br />

primary target appears to be <strong>the</strong> DNA of foreign genetic<br />

elements, but <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> that is widespread<br />

amongst archaea also specifically targets and<br />

cleaves RNA <strong>in</strong> vitro. The archaeal <strong>CRISPR</strong> <strong>system</strong>s tend<br />

to be both diverse and complex. Here we exam<strong>in</strong>e evidence<br />

for exchange of functional modules between archaeal<br />

<strong>system</strong>s that is likely to contribute to <strong>the</strong>ir<br />

diversity, particularly of <strong>the</strong>ir nucleic acid target<strong>in</strong>g<br />

and cleavage functions. The molecular constra<strong>in</strong>ts that<br />

limit such exchange are considered. We also summarize<br />

mechanisms underly<strong>in</strong>g <strong>the</strong> dynamic nature of <strong>CRISPR</strong><br />

loci and <strong>the</strong> evidence for <strong>in</strong>tergenomic exchange of<br />

<strong>CRISPR</strong> <strong>system</strong>s.<br />

<strong>Archaea</strong> and <strong>CRISPR</strong> immunity<br />

The early evolutionary history of archaea rema<strong>in</strong>s unresolved.<br />

<strong>Archaea</strong> could have descended directly from a<br />

universal common ancestor, undergone a shared period<br />

of descent with eukarya, or have been streaml<strong>in</strong>ed from a<br />

more complex (and eukaryal-like) ancestor [1,2]. Although<br />

many cellular processes of archaea and eukarya share<br />

common features that are absent from bacteria [1], <strong>the</strong><br />

uniqueness of archaea appears to lie <strong>in</strong> <strong>the</strong>ir successful<br />

adaptation to extreme environmental conditions <strong>in</strong>clud<strong>in</strong>g<br />

high temperature, extremes of pH, high salt, high pressures,<br />

and strictly anaerobic conditions. These environments<br />

tend to be low <strong>in</strong> sources of energy consistent with<br />

<strong>the</strong> hypo<strong>the</strong>sis that some unique archaeal properties were<br />

ma<strong>in</strong>ta<strong>in</strong>ed through adaptation to chronic energy stress<br />

via, for example, <strong>the</strong>ir catabolic pathways and mechanisms<br />

of energy conservation facilitated by low permeability<br />

e<strong>the</strong>r-l<strong>in</strong>ked lipid membranes [3].<br />

This exceptional biology is reflected <strong>in</strong> <strong>the</strong> properties of<br />

<strong>the</strong> archaeal viruses. Most of those characterized, especially<br />

from extreme <strong>the</strong>rmophilic and halophilic environments,<br />

show morphotypes and genomic properties dist<strong>in</strong>ct from<br />

viruses of bacteria and eukarya [4–6]. There are also<br />

prelim<strong>in</strong>ary <strong>in</strong>dications that levels of free viruses, at least<br />

Correspond<strong>in</strong>g author: Garrett, R.A. (garrett@bio.ku.dk).<br />

<strong>in</strong> extreme <strong>the</strong>rmoacidophilic environments, tend to be low<br />

relative to cellular levels, suggest<strong>in</strong>g that <strong>the</strong>se viruses<br />

prefer to rema<strong>in</strong> ‘<strong>in</strong>side’ cells [7]. Moreover, archaeal<br />

viruses generally exist <strong>in</strong> stable relationships with <strong>the</strong>ir<br />

hosts at low copy-numbers and rarely cause cell lysis<br />

[4,6,8,9].<br />

<strong>CRISPR</strong> <strong>system</strong>s (Box 1) provide immunity aga<strong>in</strong>st<br />

<strong>in</strong>vasion by viruses and conjugative plasmids, are present<br />

<strong>in</strong> most studied archaea and <strong>in</strong> about 40% of bacteria, and<br />

have a common evolutionary orig<strong>in</strong> [10,11]. The <strong>CRISPR</strong><br />

<strong>system</strong>s <strong>in</strong> many archaea are unusual <strong>in</strong> that <strong>the</strong>y tend to<br />

be both diverse and complex, suggest<strong>in</strong>g that <strong>the</strong>y have <strong>the</strong><br />

potential to be more versatile functionally and with more<br />

possibilities for regulation than <strong>in</strong> many bacteria [11,12].<br />

Given <strong>the</strong> tendency of many archaeal viruses and conjugative<br />

plasmids to ma<strong>in</strong>ta<strong>in</strong> stable relationships with <strong>the</strong>ir<br />

hosts, and to avoid target<strong>in</strong>g by <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>,<br />

different regulatory <strong>system</strong>s might play an important role<br />

[4,6,13]. For example, <strong>the</strong> <strong>immune</strong> response may only be<br />

activated at certa<strong>in</strong> levels of viral DNA replication or<br />

transcription.<br />

All <strong>CRISPR</strong> <strong>system</strong>s have three basic functions. First,<br />

<strong>the</strong> excision of protospacer DNA from <strong>in</strong>vad<strong>in</strong>g genetic<br />

elements and <strong>in</strong>sertion <strong>in</strong>to <strong>CRISPR</strong> loci, a process termed<br />

adaptation. Second, transcripts from complete <strong>CRISPR</strong><br />

loci are processed to yield crRNAs that are <strong>the</strong>n assembled<br />

<strong>in</strong>to prote<strong>in</strong> complexes. Third, <strong>the</strong>se complexes target and<br />

cleave <strong>the</strong> DNA or RNA of <strong>in</strong>vad<strong>in</strong>g genetic elements,<br />

termed <strong>in</strong>terference. These steps are illustrated <strong>in</strong><br />

Figure 1 and <strong>the</strong> ma<strong>in</strong> components are def<strong>in</strong>ed <strong>in</strong> Box 1.<br />

<strong>CRISPR</strong>-based <strong>system</strong>s have recently been reclassified<br />

<strong>in</strong>to three ma<strong>in</strong> types, of which only types I and III occur<br />

<strong>in</strong> archaea (Box 2). Prote<strong>in</strong> components of <strong>CRISPR</strong> <strong>system</strong>s<br />

are manifold and highly diverse. Several core prote<strong>in</strong><br />

functions have been predicted from sequence analyses or<br />

crystal structures [14,15] but with few exceptions <strong>the</strong>ir<br />

detailed mechanistic roles rema<strong>in</strong> to be determ<strong>in</strong>ed experimentally<br />

(Box 1). Similarities of essential components and<br />

core mechanisms of archaeal and bacterial <strong>CRISPR</strong> <strong>system</strong>s<br />

are consistent with <strong>the</strong>ir hav<strong>in</strong>g a common evolutionary<br />

orig<strong>in</strong> [10,11].<br />

Attempts to classify <strong>CRISPR</strong> <strong>system</strong>s phylogenetically<br />

have previously <strong>in</strong>volved sequence alignments of <strong>the</strong> most<br />

conserved Cas1 prote<strong>in</strong> [14,16]. This prote<strong>in</strong> is almost<br />

ubiquitous and is associated with <strong>the</strong> adaptation step<br />

(Figure 1). Phylogenetic studies on crenarchaeal <strong>system</strong>s<br />

0966-842X/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tim.2011.08.002 Trends <strong>in</strong> Microbiology, November 2011, Vol. 19, No. 11 549


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

Box 1. Core components of <strong>CRISPR</strong> <strong>system</strong>s<br />

Here we summarize <strong>the</strong> ma<strong>in</strong> components of <strong>CRISPR</strong> <strong>system</strong>s.<br />

Leaders: all active <strong>CRISPR</strong> loci to date are preceded by a leader of<br />

about 300–400 bp, carry<strong>in</strong>g some low complexity sequence and<br />

conserved regions, that is likely to be <strong>in</strong>volved <strong>in</strong> <strong>the</strong> adaptation step<br />

at or near <strong>the</strong> first repeat [16,17]. The <strong>CRISPR</strong> proximal region of <strong>the</strong><br />

leader also carries <strong>the</strong> ma<strong>in</strong> promoter for <strong>CRISPR</strong> transcription [16].<br />

<strong>CRISPR</strong> loci: <strong>the</strong>se consist of arrays of identical direct repeats of 24–<br />

37 bp <strong>in</strong> size and, <strong>in</strong> archaea, often conta<strong>in</strong> up to 100 repeat units.<br />

These are <strong>in</strong>terspaced with similarly sized spacers (35–44 bp) carry<strong>in</strong>g<br />

unique sequences that derive from <strong>in</strong>vad<strong>in</strong>g DNA genetic elements.<br />

They are dynamic structures that undergo loss and exchange of<br />

spacer-repeat units, probably via recomb<strong>in</strong>ation events at repeats<br />

[16,35]. Thus <strong>the</strong>y provide a record, albeit <strong>in</strong>complete, of previous<br />

<strong>in</strong>vad<strong>in</strong>g genetic elements, although if <strong>CRISPR</strong> loci have recently<br />

exchanged between related organisms, as occurs for S. islandicus<br />

[17], <strong>the</strong> record will be erroneous. There is currently no evidence to<br />

<strong>in</strong>dicate whe<strong>the</strong>r spacers can orig<strong>in</strong>ate from RNA viruses.<br />

Protospacer: a segment of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA genetic element that is<br />

<strong>in</strong>corporated <strong>in</strong>to a <strong>CRISPR</strong> locus at or near <strong>the</strong> first repeat, and <strong>in</strong> a<br />

direction predeterm<strong>in</strong>ed by <strong>the</strong> location of <strong>the</strong> adjacent protospacerassociated<br />

motif.<br />

Protospacer-associated motif (PAM): this motif is essential for <strong>the</strong><br />

<strong>immune</strong> response [19]. It corresponds to a short sequence, positioned at<br />

approximately –2 to –4 bp from <strong>the</strong> end of <strong>the</strong> protospacer that becomes<br />

leader-proximal on <strong>in</strong>sertion <strong>in</strong>to a <strong>CRISPR</strong> locus. This suggests that <strong>the</strong><br />

base-paired motif <strong>in</strong>fluences protospacer selection from genetic<br />

elements [16,19,31]. Ano<strong>the</strong>r proposed function of <strong>the</strong> PAM motif is<br />

that it ensures <strong>the</strong> presence of mismatched base-pairs between 5 0 ends<br />

of crRNAs and targeted DNA as a prerequisite for avoid<strong>in</strong>g self<strong>in</strong>terference<br />

of <strong>CRISPR</strong> loci [46]. The PAM motif may also play a more<br />

specific role <strong>in</strong> DNA <strong>in</strong>terference, although how it is recognized and <strong>the</strong><br />

degree of PAM sequence str<strong>in</strong>gency required rema<strong>in</strong> unknown [35,41].<br />

crRNAs: <strong>the</strong> f<strong>in</strong>al products of process<strong>in</strong>g of pre-<strong>CRISPR</strong> RNAs, many<br />

of which exhibit short <strong>in</strong>verted repeats [58]. They are produced for<br />

DNA target<strong>in</strong>g by <strong>in</strong>troduc<strong>in</strong>g s<strong>in</strong>gle cuts <strong>in</strong> adjacent repeats, and<br />

provided evidence for coevolution of Cas1 prote<strong>in</strong> and <strong>the</strong><br />

leader and repeat sequences, strongly suggest<strong>in</strong>g that<br />

<strong>the</strong>se structural components are functionally <strong>in</strong>terdependent<br />

<strong>in</strong> adaptation [16,17]. However, when attempts were<br />

made to extend <strong>the</strong>se analyses to conserved crenarchaeal<br />

<strong>CRISPR</strong> components implicated <strong>in</strong> RNA process<strong>in</strong>g or<br />

nucleic acid <strong>in</strong>terference, divergent trees were obta<strong>in</strong>ed,<br />

suggest<strong>in</strong>g that <strong>CRISPR</strong> <strong>system</strong>s are non-<strong>in</strong>tegral and that<br />

modular exchange can occur [17,18].<br />

DNA<br />

Virus V<br />

Plasmid<br />

aCas complex<br />

DNA<br />

excision<br />

Adaptation<br />

Leader<br />

New spacer<br />

each crRNA carries a spacer sequence flanked by repeat sequence<br />

fragments [20,32]. In Cmr-based RNA target<strong>in</strong>g, crRNAs are fur<strong>the</strong>r<br />

processed at <strong>the</strong> 3 0 end by an unknown enzyme [21,22]. crRNA<br />

complexes with Cas, Csm or Cmr prote<strong>in</strong>s target <strong>in</strong>vad<strong>in</strong>g nucleic<br />

acids by base-pair<strong>in</strong>g to highly similar sequences, where perfect<br />

match<strong>in</strong>g of <strong>the</strong> 5 0 term<strong>in</strong>al spacer sequence of <strong>the</strong> crRNA can be<br />

especially important for DNA target<strong>in</strong>g [35,40,43].<br />

<strong>CRISPR</strong>-associated prote<strong>in</strong>s (Cas): although many functions have<br />

been predicted bio<strong>in</strong>formatically for core Cas prote<strong>in</strong>s, few have been<br />

tested experimentally [14,15]. Cas1 and Cas2 are universally <strong>in</strong>volved<br />

<strong>in</strong> adaptation and <strong>the</strong> prote<strong>in</strong>s exhibit metal-dependent DNA and RNA<br />

endonuclease activity, respectively [59,60]. Cas4 carries a predicted<br />

RecB nuclease doma<strong>in</strong> and is sometimes fused to Cas1, and is<br />

<strong>the</strong>reby implicated <strong>in</strong> adaptation. DNA <strong>in</strong>terference by <strong>the</strong> <strong>CRISPR</strong>/Cas<br />

<strong>system</strong> requires at least three core prote<strong>in</strong>s (Cas5, Cas7, and Cas3),<br />

which carry helicase and s<strong>in</strong>gle-stranded DNA nuclease activities and<br />

are associated with <strong>in</strong>vader DNA cleavage [61]. A large group of RNA<br />

recognition motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s (RAMPs) also carry small<br />

glyc<strong>in</strong>e-rich motifs, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> diverse Cas6 prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong><br />

<strong>CRISPR</strong> RNA process<strong>in</strong>g and many of <strong>the</strong> prote<strong>in</strong>s mak<strong>in</strong>g up <strong>the</strong> Csm<br />

and Cmr prote<strong>in</strong> target<strong>in</strong>g complexes for DNA and RNA, respectively<br />

[23,46].<br />

CASCADE (<strong>CRISPR</strong>-associated complex for antiviral defense): first<br />

characterized for <strong>the</strong> E. coli <strong>CRISPR</strong>/Cas <strong>system</strong>, this constitutes a<br />

prote<strong>in</strong> complex of Cas5e, Cas6, Cas7 (six copies) and two subtype<br />

specific prote<strong>in</strong>s Cse1 and Cse 2 (two copies) [20,45]. It generates a<br />

seahorse-shaped structure encompass<strong>in</strong>g <strong>the</strong> crRNA and specifically<br />

targets <strong>the</strong> complementary strand of protospacer-like DNA (and<br />

unspecifically ssRNA) but does not cleave it. The presence of a Cas6<br />

homolog underl<strong>in</strong>es an additional l<strong>in</strong>k to process<strong>in</strong>g [45]. A similar<br />

structure was modeled for a Sulfolobus complex conta<strong>in</strong><strong>in</strong>g Cas5e,<br />

multiple copies of Cas7 and crRNA, that also targeted DNA but only<br />

<strong>in</strong>teracted weakly with Cas6 and o<strong>the</strong>r Cas prote<strong>in</strong>s [42]. The similarity<br />

of <strong>the</strong> two structures suggests that this may be a universal structure<br />

for DNA target<strong>in</strong>g.<br />

In this review we focus primarily on archaeal <strong>CRISPR</strong><br />

<strong>system</strong>s. The degree of functional and structural <strong>in</strong>terdependence<br />

of <strong>the</strong> functional modules is summarized and<br />

evidence is provided for modular exchange. Fur<strong>the</strong>r, molecular<br />

and sequence constra<strong>in</strong>ts that limit <strong>the</strong> capacity for<br />

exchange are considered and it is <strong>in</strong>ferred that advantages<br />

of exchange lie primarily <strong>in</strong> generat<strong>in</strong>g <strong>in</strong>terference diversity.<br />

Fur<strong>the</strong>r, we summarize <strong>the</strong> evidence for <strong>CRISPR</strong><br />

loci be<strong>in</strong>g dynamic structures and describe factors that<br />

Repeat<br />

pCas poly-crRNA<br />

Process<strong>in</strong>g<br />

iCas-crRNA<br />

iCmr-crRNA<br />

Interference<br />

DNA<br />

Viral/plasmid<br />

DNA<br />

Cleavage<br />

Interference<br />

RNA<br />

Cleaved<br />

mRNA<br />

Cleav Cleaved<br />

viral RRNA<br />

TRENDS <strong>in</strong> Microbiology<br />

Figure 1. Scheme for <strong>the</strong> three primary functions of <strong>CRISPR</strong> <strong>system</strong>s. In <strong>the</strong> adaptation step, Cas prote<strong>in</strong>s excise <strong>the</strong> protospacer sequence from a foreign DNA genetic<br />

element and <strong>in</strong>sert it <strong>in</strong>to <strong>the</strong> repeat adjacent to <strong>the</strong> leader of <strong>the</strong> <strong>CRISPR</strong> locus. Pre-<strong>CRISPR</strong> RNAs are <strong>the</strong>n transcribed from with<strong>in</strong> <strong>the</strong> leader and are subsequently<br />

processed <strong>in</strong>to crRNAs each carry<strong>in</strong>g a s<strong>in</strong>gle spacer sequence and part of <strong>the</strong> adjo<strong>in</strong><strong>in</strong>g repeat sequence. At <strong>the</strong> <strong>in</strong>terference stage, crRNAs are assembled <strong>in</strong>to prote<strong>in</strong><br />

target<strong>in</strong>g complexes that anneal to, and cleave, match<strong>in</strong>g spacer sequences on ei<strong>the</strong>r <strong>in</strong>vad<strong>in</strong>g elements or <strong>the</strong>ir transcripts.<br />

550


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

Box 2. Classification and nomenclature<br />

<strong>CRISPR</strong>-related prote<strong>in</strong>s have been classified <strong>in</strong>to eight types of<br />

<strong>CRISPR</strong> <strong>system</strong>s and up to 45 families of associated prote<strong>in</strong>s [14,61].<br />

An attempt was recently made to simplify both <strong>the</strong> <strong>CRISPR</strong><br />

classification and prote<strong>in</strong> nomenclature and <strong>the</strong> results perta<strong>in</strong><strong>in</strong>g<br />

especially to archaeal <strong>system</strong>s (summarized below) are presented<br />

toge<strong>the</strong>r with a suggested term<strong>in</strong>ology that we use for label<strong>in</strong>g <strong>the</strong><br />

diverse functional modules present <strong>in</strong> archaea [16].<br />

<strong>CRISPR</strong> <strong>system</strong>s: <strong>the</strong>se are now grouped <strong>in</strong>to three major classes –<br />

types I to III (with a few subtypes) – based primarily on sequences of<br />

<strong>the</strong> Cas1 and Cas2 prote<strong>in</strong>s implicated <strong>in</strong> adaptation, but also tak<strong>in</strong>g<br />

<strong>in</strong>to account gene cassette contents [15]. Type I <strong>system</strong>s have been<br />

implicated <strong>in</strong> DNA target<strong>in</strong>g (exemplified <strong>in</strong> Figure 2a) and are<br />

generally characterized by a Cas3 endonuclease considered to cleave<br />

<strong>in</strong>vad<strong>in</strong>g foreign DNA [61]. Type II are bacteria-specific and require a<br />

<strong>CRISPR</strong>-associated trans-encoded small RNA (tracrRNA) and hostencoded<br />

RNase III for process<strong>in</strong>g. The large multifunctional Cas9<br />

prote<strong>in</strong> alone appears to facilitate <strong>the</strong> f<strong>in</strong>al process<strong>in</strong>g and <strong>in</strong>terference<br />

steps [36]. Type III <strong>system</strong>s are over-represented <strong>in</strong> archaea<br />

and <strong>in</strong>clude all <strong>CRISPR</strong> <strong>system</strong>s carry<strong>in</strong>g Cmr or Csm prote<strong>in</strong>s,<br />

illustrated for archaea <strong>in</strong> Figure 2b–f. Some of <strong>the</strong>se prote<strong>in</strong>s (Cmr2/<br />

Csm1 and Cmr4/Csm3) are homologs, whereas o<strong>the</strong>rs show m<strong>in</strong>imal<br />

sequence conservation but carry RNA recognition and glyc<strong>in</strong>e-rich<br />

motifs (RAMP prote<strong>in</strong>s) [12,14]. The Csm and Cmr prote<strong>in</strong> complexes<br />

contribute to <strong>the</strong>ir structural changes and, f<strong>in</strong>ally, evidence<br />

for <strong>in</strong>tergenomic exchange of <strong>CRISPR</strong> <strong>system</strong>s is<br />

discussed. Detailed experimental data perta<strong>in</strong><strong>in</strong>g to <strong>the</strong><br />

mechanisms <strong>in</strong>volved <strong>in</strong> <strong>the</strong> core functional steps <strong>in</strong> archaea<br />

and bacteria have recently been reviewed [11] and<br />

will not be covered <strong>in</strong> depth here.<br />

Functional modules<br />

<strong>CRISPR</strong> <strong>system</strong>s all exhibit three basic functional steps<br />

illustrated <strong>in</strong> Figure 1. (i) Adaptation <strong>in</strong>volves recognition<br />

and degradation of foreign DNA by Cas prote<strong>in</strong>s and<br />

<strong>in</strong>corporation of a DNA fragment <strong>in</strong>to <strong>the</strong> <strong>CRISPR</strong> locus<br />

as a new spacer presumed to occur at <strong>the</strong> repeat adjacent to<br />

<strong>the</strong> leader [16,19]. (ii) In <strong>the</strong> second step, <strong>the</strong> complete<br />

<strong>CRISPR</strong> locus is transcribed from with<strong>in</strong> <strong>the</strong> leader and<br />

processed <strong>in</strong>to multiple <strong>CRISPR</strong> RNAs (crRNAs) each<br />

carry<strong>in</strong>g a s<strong>in</strong>gle spacer sequence and one or more adjo<strong>in</strong><strong>in</strong>g<br />

repeat regions. Prote<strong>in</strong>s implicated <strong>in</strong> <strong>the</strong> archaeal<br />

RNA process<strong>in</strong>g are <strong>the</strong> core prote<strong>in</strong> Cas6 and at least<br />

one o<strong>the</strong>r unidentified prote<strong>in</strong> [20–22]. (iii) Interference<br />

(or <strong>in</strong>vader silenc<strong>in</strong>g) of DNA or RNA occurs when a<br />

prote<strong>in</strong>–crRNA complex targets and cleaves a highly similar<br />

sequence of <strong>the</strong> genetic element [23–25]. At present<br />

three <strong>in</strong>terference <strong>system</strong>s have been identified based on<br />

Cas and Csm prote<strong>in</strong> complexes each target<strong>in</strong>g DNA <strong>in</strong><br />

vivo and Cmr prote<strong>in</strong>s target<strong>in</strong>g RNA <strong>in</strong> vitro. Here we<br />

<strong>in</strong>troduce terms for <strong>the</strong> ma<strong>in</strong> molecular components <strong>in</strong>volved<br />

<strong>in</strong> each functional step to simplify <strong>the</strong> discussion of<br />

functional module exchange as follows: aCas for adaptation;<br />

pCas for process<strong>in</strong>g, and iCas, iCsm and iCmr for<br />

nucleic acid <strong>in</strong>terference (Box 2).<br />

Currently about 165 <strong>CRISPR</strong> <strong>system</strong>s from 110 archaeal<br />

genomes are available <strong>in</strong> public sequence databases and<br />

have provided a basis for analyz<strong>in</strong>g gene organization<br />

patterns of different functional modules [12,15,26]. They<br />

reveal six major comb<strong>in</strong>ations of gene cassettes illustrated<br />

with color-coded functional modules <strong>in</strong> Figure 2. Whereas<br />

<strong>the</strong> aCas cassette is relatively conserved <strong>in</strong> <strong>the</strong> first four<br />

comb<strong>in</strong>ations, <strong>the</strong> <strong>in</strong>terference modules are diverse <strong>in</strong><br />

are implicated <strong>in</strong> target<strong>in</strong>g and cleavage of DNA and RNA, respectively<br />

[23,25]. In archaea <strong>the</strong> type I and type III <strong>system</strong>s are often<br />

functionally <strong>in</strong>terdependent [17,44].<br />

Prote<strong>in</strong> nomenclature: <strong>the</strong> names of prote<strong>in</strong>s Cas1 to Cas6 are<br />

reta<strong>in</strong>ed but <strong>the</strong>y are extended to <strong>in</strong>clude many disparate homologs<br />

<strong>in</strong> different organisms. Cas7 to Cas10 represent new categories, each<br />

of which br<strong>in</strong>gs toge<strong>the</strong>r a group of differently named homologs. The<br />

changes especially relevant to archaeal <strong>CRISPR</strong> <strong>system</strong>s are: Cas7 for<br />

Csa2, Cas8 for Csa4, and Cas10 is proposed for homologs Cmr2 and<br />

Csm1 of type III <strong>system</strong>s (Figure 3). Cas9 is exclusive to <strong>the</strong> bacteriaspecific<br />

type II <strong>system</strong>.<br />

Functional module nomenclature: <strong>the</strong> follow<strong>in</strong>g terms are <strong>in</strong>troduced<br />

for <strong>the</strong> central mechanistic steps: adaptation, aCas; process<strong>in</strong>g, pCas;<br />

and <strong>in</strong>terference, iCas, iCmr, and iCsm, which are generally genetically<br />

discrete units but are also functionally <strong>in</strong>terdependent (Figure 2). These<br />

terms are considered to provide a useful label for all components of <strong>the</strong><br />

genetically diverse archaeal functional modules. Gene cassettes of all<br />

<strong>the</strong> functional modules often carry additional prote<strong>in</strong>s that are<br />

conserved for different <strong>CRISPR</strong> subtypes, and gene cassettes for <strong>the</strong><br />

three types of <strong>in</strong>terference module are particularly diverse (Figures 2<br />

and 4). The terms are applied to components that are specifically<br />

<strong>in</strong>volved <strong>in</strong> <strong>the</strong> ma<strong>in</strong> functional steps of different <strong>CRISPR</strong> <strong>system</strong>s, but<br />

exclude transcriptional regulators (Figures 2 and 4).<br />

both <strong>the</strong>ir gene contents and <strong>in</strong> <strong>the</strong>ir comb<strong>in</strong>ations<br />

(Figure 2a–d). About half of <strong>the</strong> archaeal iCmr and iCsm<br />

gene cassettes are physically separated on genomes from<br />

<strong>CRISPR</strong> loci and aCas genes (Figure 2e,f).<br />

Adaptation<br />

New spacer uptake <strong>in</strong>volves excision of a protospacer from<br />

an <strong>in</strong>vad<strong>in</strong>g DNA genetic element and its <strong>in</strong>tegration as a<br />

new spacer at <strong>the</strong> repeat sequence adjacent to <strong>the</strong> leader,<br />

result<strong>in</strong>g <strong>in</strong> duplication of <strong>the</strong> repeat. It has only been<br />

observed under laboratory conditions for Streptococcus<br />

<strong>the</strong>rmophilus [27]. For archaea, evidence is limited to<br />

comparative genomic studies of closely related Sulfolobus<br />

stra<strong>in</strong>s where more recently <strong>in</strong>corporated spacers are clustered<br />

adjacent to <strong>the</strong> leader [16,28–30]. The short PAM<br />

motif adjacent to <strong>the</strong> protospacer (Box 1) has been implicated<br />

<strong>in</strong> determ<strong>in</strong><strong>in</strong>g <strong>the</strong> orientation of <strong>in</strong>serted spacers<br />

[16,26,31]. Most aCas modules are relatively conserved <strong>in</strong><br />

content, generally carry<strong>in</strong>g prote<strong>in</strong>s Cas1, Cas2 and Cas4<br />

(Figure 2), of which <strong>the</strong> first two appear to be essential.<br />

Moreover, spacer <strong>in</strong>tegration at <strong>the</strong> first repeat, comb<strong>in</strong>ed<br />

with phylogenetic evidence for coevolution of cas1, leader<br />

and repeat sequences, suggest that <strong>the</strong> leader is cofunctional<br />

[16,17].<br />

RNA process<strong>in</strong>g<br />

Transcripts <strong>in</strong>itiate with<strong>in</strong> leaders and term<strong>in</strong>ate downstream<br />

from <strong>CRISPR</strong> loci [16]; early work on Archaeoglobus<br />

fulgidus <strong>in</strong>dicated that process<strong>in</strong>g occurs with<strong>in</strong><br />

adjacent repeats [32]. The primary process<strong>in</strong>g enzyme is<br />

<strong>the</strong> ubiquitous and diverse Cas6 prote<strong>in</strong> and, at least <strong>in</strong><br />

Pyrococcus furiosus, <strong>the</strong> <strong>CRISPR</strong> transcript wraps around<br />

<strong>the</strong> Cas6 endonuclease and is cut once <strong>in</strong> each adjacent<br />

repeat [23,33]. The order and direction of process<strong>in</strong>g<br />

rema<strong>in</strong>s unclear. Early work on Sulfolobus solfataricus<br />

suggested that, <strong>in</strong> contrast to A. fulgidus, <strong>in</strong>itially every<br />

third repeat is cut and that process<strong>in</strong>g occurs primarily<br />

from <strong>the</strong> 3 0 end of <strong>the</strong> <strong>CRISPR</strong> transcript, but this rema<strong>in</strong>s<br />

to be confirmed [16,34]. Moreover, process<strong>in</strong>g levels were<br />

551


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

(a)<br />

S. islandicus<br />

(b)<br />

P. furiosus<br />

(c)<br />

C. subterranum<br />

(d)<br />

T. volcanium<br />

(e)<br />

H. butylicus<br />

(f)<br />

M. vulcanius<br />

17<br />

R<br />

<strong>CRISPR</strong><br />

115 93<br />

cas6 cmr2 cmr3 csx1 cmr5 cas8 cas7 cas5 cas3 cas4 cas1 cas2<br />

csx1<br />

csx1<br />

R R R R R R R<br />

csx1<br />

R<br />

cmr5<br />

cas6<br />

R<br />

csm1<br />

higher <strong>in</strong> Sulfolobus dur<strong>in</strong>g stationary phase when <strong>the</strong><br />

cells are more vulnerable to viral attack [16,28]. Patterns of<br />

archaeal crRNAs are often complex, extend<strong>in</strong>g over <strong>the</strong><br />

approximate size range 35–60 nt, and this probably reflects<br />

<strong>the</strong> diversity of <strong>CRISPR</strong> <strong>system</strong>s present [28,35]. To date,<br />

all <strong>the</strong> characterized crRNAs carry an 8 nt repeat sequence<br />

at <strong>the</strong> 5 0 end. Larger crRNAs implicated <strong>in</strong> DNA target<strong>in</strong>g<br />

<strong>in</strong> vivo are 60–65 nt <strong>in</strong> length and carry partial repeat<br />

sequences at each end, whereas smaller crRNAs which can<br />

target RNA <strong>in</strong> vitro are 37–45 nt <strong>in</strong> length and lack repeat<br />

and partial spacer sequences at <strong>the</strong> 3 0 end [20–22]. Process<strong>in</strong>g<br />

at <strong>the</strong> 3 0 end of <strong>the</strong>se RNAs is performed by an<br />

unknown enzyme [21,22]. Process<strong>in</strong>g with<strong>in</strong> repeats <strong>in</strong><br />

Streptococcus pyrogenes is effected by a trans-encoded<br />

RNA and host-encoded RNase III [36]. This type II<br />

<strong>CRISPR</strong>/Cas <strong>system</strong> does not occur <strong>in</strong> archaea [15] where<br />

<strong>the</strong> cellular functions of RNase III appear to be performed<br />

by a general <strong>in</strong>tron-splic<strong>in</strong>g enzyme with a different substrate<br />

specificity [37].<br />

In most studies archaeal <strong>CRISPR</strong> loci are constitutively<br />

expressed and processed <strong>in</strong>to mature crRNAs <strong>in</strong> <strong>the</strong> absence<br />

of <strong>in</strong>vad<strong>in</strong>g DNA elements, but it rema<strong>in</strong>s unclear<br />

whe<strong>the</strong>r <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s require fur<strong>the</strong>r activation<br />

[16,21]. Bacterial studies have revealed diverse <strong>CRISPR</strong><br />

regulatory mechanisms which can be activated on viral<br />

<strong>in</strong>fection produc<strong>in</strong>g elevated expression [38,39].<br />

DNA <strong>in</strong>terference<br />

Independent l<strong>in</strong>es of evidence support that DNA is <strong>the</strong><br />

primary target for most <strong>CRISPR</strong> <strong>system</strong>s. Putative protospacer<br />

sequences are essentially distributed randomly on<br />

R<br />

csm1<br />

csm2<br />

R<br />

R R R<br />

R R<br />

csx1 csm1 csm2<br />

csx1<br />

cas2 t.r. t.r.<br />

csa1 cas1 cas4<br />

<strong>CRISPR</strong><br />

csa5 cas7 cas5 cas3'<br />

R<br />

cmr2<br />

R<br />

csx1 cas1cas2<br />

cmr3<br />

19<br />

42<br />

cas2<br />

t.r.<br />

cas4 cas1 cas6<br />

18<br />

Key:<br />

9<br />

archaeal virus and plasmid DNA with no significant bias of<br />

match<strong>in</strong>g crRNAs to ei<strong>the</strong>r genes relative to <strong>in</strong>tergenic<br />

regions or to cod<strong>in</strong>g versus non-cod<strong>in</strong>g strands [26,28].<br />

Moreover, genetic studies on different Sulfolobus species<br />

have provided strong evidence for DNA target<strong>in</strong>g <strong>in</strong> vivo,<br />

presumably <strong>in</strong>volv<strong>in</strong>g iCas ra<strong>the</strong>r than iCmr modules<br />

[35,40]. For bacteria, experimental evidence for DNA target<strong>in</strong>g<br />

<strong>in</strong> vivo was provided for <strong>the</strong> <strong>CRISPR</strong>/Csm <strong>system</strong> of<br />

Staphylococcus epidermidis [41] (equivalent to Figure 2d),<br />

<strong>the</strong> <strong>CRISPR</strong>/Csn (bacterial type II) <strong>system</strong> of S. <strong>the</strong>rmophilus<br />

[25] and for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> of Escherichia<br />

coli [20], although none of <strong>the</strong>se studies precluded additional<br />

RNA target<strong>in</strong>g.<br />

A large prote<strong>in</strong> complex, conta<strong>in</strong><strong>in</strong>g multiple prote<strong>in</strong><br />

components, was first characterized for an E. coli <strong>CRISPR</strong>/<br />

Cas <strong>system</strong> that participates <strong>in</strong> crRNA maturation and<br />

that facilitates anneal<strong>in</strong>g of <strong>the</strong> crRNA to <strong>the</strong> DNA target,<br />

but not cleavage [41]. It generates a seahorse form and is<br />

def<strong>in</strong>ed as a CASCADE complex (Box 1). A related structure<br />

is produced for a S. solfataricus <strong>CRISPR</strong>/Cas <strong>system</strong><br />

made up of only Cas5e and multiple copies of Cas7, and<br />

which appears to be <strong>in</strong>volved primarily <strong>in</strong> DNA target<strong>in</strong>g<br />

[42]. This is <strong>the</strong>refore likely to be a universal structure, at<br />

least for iCas target<strong>in</strong>g <strong>system</strong>s.<br />

Studies on S. <strong>the</strong>rmophilus demonstrated that effective<br />

<strong>in</strong>terference requires perfect matches between crRNA and<br />

protospacers [19]. However, recent work on Sulfolobus species<br />

has demonstrated that three or more mismatches located<br />

near <strong>the</strong> centre of <strong>the</strong> protospacer or at <strong>the</strong> distal end from<br />

<strong>the</strong> PAM motif do not prevent <strong>in</strong>terference [35,40]. Moreover,<br />

a <strong>system</strong>atic study of <strong>the</strong> E. coli <strong>CRISPR</strong>/Cas <strong>system</strong><br />

22<br />

csa5 cas7<br />

aCas<br />

iCas<br />

pCas<br />

iCsm or iCmr<br />

cas3" csaX cas6<br />

csaXa<br />

cas5 csaXb cas3' cas3"<br />

<strong>CRISPR</strong> / no. of repeats<br />

TRENDS <strong>in</strong> Microbiology<br />

Figure 2. Representative gene maps of six ma<strong>in</strong> classes of archaeal <strong>CRISPR</strong> <strong>system</strong>s. (a) <strong>CRISPR</strong>/aCas-pCas-iCas, common <strong>in</strong> archaea; <strong>in</strong> this example those of S. islandicus<br />

are shown. (b) <strong>CRISPR</strong>/aCas-pCas-iCas-iCmr; studied experimentally <strong>in</strong> P. furiosus. (c) <strong>CRISPR</strong>/aCas-pCas-iCas-iCsm; from Caldiarchaeum subterranum. (d) <strong>CRISPR</strong>/aCasiCsm;<br />

shown for Thermoplasma volcanium. (e) iCmr from Hyper<strong>the</strong>rmus butylicus. (f) iCsm from Methanocaldococcus vulcanius. Genes encod<strong>in</strong>g <strong>the</strong> functional doma<strong>in</strong>s<br />

are color-coded: aCas module, light blue; pCas gene, orange; iCas module, yellow; iCsm and iCmr modules, red. t.r. genes <strong>in</strong> green encode putative transcriptional regulator<br />

genes that are not considered to be part of <strong>the</strong> functional modules. R <strong>in</strong>dicates prote<strong>in</strong>s carry<strong>in</strong>g RNA-recognition motifs (RAMPs). (a) belongs to <strong>the</strong> type I <strong>CRISPR</strong> <strong>system</strong>;<br />

(b) and (c) are mixtures of type I and type III, whereas (d–f) are classified as type III [15].<br />

552


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

has shown that only six of <strong>the</strong> seven nucleotides of <strong>the</strong><br />

targeted protospacer strand proximal to <strong>the</strong> PAM motif<br />

must match <strong>the</strong> crRNA perfectly, and this was proposed<br />

to act as a recognition site, or seed, for <strong>the</strong> <strong>in</strong>terference<br />

reaction [43]. Whe<strong>the</strong>r this is a general property of <strong>the</strong><br />

<strong>CRISPR</strong> DNA target<strong>in</strong>g <strong>system</strong>s rema<strong>in</strong>s to be determ<strong>in</strong>ed.<br />

RNA <strong>in</strong>terference<br />

In <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> of P. furiosus (Figure 2b), a<br />

complex of Cmr prote<strong>in</strong>s encompass<strong>in</strong>g a small crRNA,<br />

lack<strong>in</strong>g <strong>the</strong> 3 0 end of <strong>the</strong> spacer sequence, targets and<br />

cleaves complementary s<strong>in</strong>gle-stranded RNA (ssRNA) <strong>in</strong><br />

vitro [24]. To date <strong>the</strong>re is no evidence for or aga<strong>in</strong>st <strong>in</strong> vivo<br />

RNA target<strong>in</strong>g, and it is too early to establish whe<strong>the</strong>r<br />

mRNAs, non-cod<strong>in</strong>g RNAs (ncRNAs), and/or RNA viruses<br />

can be targets. Never<strong>the</strong>less, iCmr modules are common <strong>in</strong><br />

archaea and are encoded ei<strong>the</strong>r toge<strong>the</strong>r with aCas modules<br />

and <strong>CRISPR</strong> loci or as separate genetic entities<br />

(Figure 2c,e). Paradoxically, some Cmr prote<strong>in</strong>s show significant<br />

sequence similarity to Csm prote<strong>in</strong>s implicated <strong>in</strong><br />

DNA target<strong>in</strong>g <strong>in</strong> S. epidermidis [41], and both are common<br />

<strong>in</strong> archaea. A phylogenetic tree of archaeal Cmr2 (Cas10)<br />

homologs shows five ma<strong>in</strong> subfamilies, four of which represent<br />

iCmr and iCsm modules (Figure 3) [12,44]. The fifth<br />

subfamily, A (represented by Csx11), is present <strong>in</strong> a few<br />

bacteria and methanoarchaea but has not been studied<br />

experimentally, and is <strong>the</strong>refore not considered fur<strong>the</strong>r.<br />

O<strong>the</strong>r components of iCmr and iCsm modules <strong>in</strong>clude <strong>the</strong><br />

small conserved Cmr5/Csm2 prote<strong>in</strong> and three to seven<br />

copies of highly diverse RNA b<strong>in</strong>d<strong>in</strong>g motif-conta<strong>in</strong><strong>in</strong>g<br />

prote<strong>in</strong>s (RAMP prote<strong>in</strong>s denoted R <strong>in</strong> Figure 2)<br />

[12,14,44]. In summary, <strong>the</strong> degree of cofunctionality of<br />

Euryarchaea<br />

(c)<br />

<strong>Archaea</strong>specific<br />

Cmr2<br />

(d)<br />

Crenarchaea<br />

bias<br />

(b)<br />

Csm1<br />

(e)<br />

10%<br />

(a)<br />

Csx11<br />

Euryarchaea<br />

Euryarchaea<br />

bias<br />

TRENDS <strong>in</strong> Microbiology<br />

Figure 3. Phylogenetic tree of <strong>the</strong> archaeal Cas10 subtypes Cmr2, Csm1 and Csx11.<br />

These are <strong>the</strong> largest and most conserved sub components of <strong>the</strong> <strong>in</strong>terference<br />

modules of type III <strong>CRISPR</strong> <strong>system</strong>s, where <strong>the</strong> iCmr module has been implicated<br />

<strong>in</strong> RNA target<strong>in</strong>g [23] and <strong>the</strong> iCsm <strong>system</strong> <strong>in</strong> DNA target<strong>in</strong>g [41]. The deep<br />

branch<strong>in</strong>g reflects <strong>the</strong> very divergent sequences. Analysis of <strong>the</strong> five subfamilies<br />

A–E <strong>in</strong>dicates strong biases <strong>in</strong> <strong>the</strong>ir distributions among crenarchaea and<br />

euryarchaea, and family D is archaea-specific and is present <strong>in</strong> crenarchaea,<br />

euryarchaea and unclassified archaea. The Figure is reproduced with permission<br />

from [44]. 10% <strong>in</strong>dicates <strong>the</strong> amount of am<strong>in</strong>o acid sequence change for <strong>the</strong> given<br />

length on <strong>the</strong> tree branches.<br />

<strong>the</strong> partly homologous iCmr and iCsm modules rema<strong>in</strong>s<br />

unclear.<br />

Module exchange<br />

Attempts to classify archaeal <strong>CRISPR</strong>/Cas <strong>system</strong>s of <strong>the</strong><br />

Sulfolobales on <strong>the</strong> basis of <strong>the</strong> cas1, leader and repeat<br />

sequences provided evidence for four families that were<br />

conserved <strong>in</strong> gene content and synteny and <strong>the</strong>y appeared<br />

to constitute <strong>in</strong>tegral genetic units [16,26]. However, more<br />

detailed phylogenetic analysis of <strong>the</strong> aCas and iCas genes<br />

of family I <strong>CRISPR</strong>/Cas <strong>system</strong>s of different Sulfolobus<br />

islandicus stra<strong>in</strong>s (Figure 2a) revealed that <strong>the</strong> aCas tree<br />

diverges from <strong>the</strong> iCas tree as well as from trees generated<br />

from all <strong>the</strong> concatenated genes of each host genome,<br />

consistent with exchange of aCas modules hav<strong>in</strong>g occurred<br />

[17]. The results of this analysis are illustrated <strong>in</strong><br />

Figure 4a for two divergent pairs of <strong>CRISPR</strong>/Cas <strong>system</strong>s<br />

from four selected S. islandicus stra<strong>in</strong>s [17]. For each<br />

similar pair <strong>the</strong> concatenated homologous Cas prote<strong>in</strong>s<br />

showed about 99% am<strong>in</strong>o acid sequence identity. However,<br />

when <strong>the</strong> prote<strong>in</strong> sequences of <strong>the</strong> two pairs were compared<br />

whereas <strong>the</strong> iCas modules ma<strong>in</strong>ta<strong>in</strong>ed <strong>the</strong>ir high sequence<br />

identity (99%), <strong>the</strong> aCas identity was reduced to 74%<br />

(Figure 4a), consistent with <strong>the</strong> aCas module hav<strong>in</strong>g been<br />

exchanged [17]. Us<strong>in</strong>g <strong>the</strong> same approach, similar<br />

<strong>CRISPR</strong>/Cas <strong>system</strong>s of two divergent pairs of stra<strong>in</strong>s of<br />

<strong>the</strong> <strong>the</strong>rmoneutrophile Pyrobaculum were compared. All<br />

<strong>the</strong> concatenated homologous Cas prote<strong>in</strong> components of<br />

each similar pair showed 70% am<strong>in</strong>o acid sequence identity.<br />

However, when <strong>the</strong> two pairs were compared aCas<br />

(a) aCas exchange (S. islandicus)<br />

Group 1<br />

vs<br />

Group 2<br />

(b)<br />

Group 1<br />

vs<br />

Group 2<br />

cas2 t.r.<br />

csa1 cas1 cas4<br />

74%<br />

70%<br />

C<br />

C<br />

90%<br />

aCas iCas<br />

iCas exchange (Pyrobaculum sp.)<br />

cas2<br />

cas4 cas1 csa1<br />

t.r.<br />

csa5 cas7 cas5 cas3'<br />

csa5<br />

t.r. cas7 cas5 n.d.<br />

n.d. cas7 cas5<br />

28%<br />

cas3" csaX cas6<br />

cas3' cas3"<br />

cas3' cas3" n.d.<br />

TRENDS <strong>in</strong> Microbiology<br />

Figure 4. Examples of genetic exchange of functional modules where am<strong>in</strong>o acid<br />

sequences from shared genes <strong>in</strong> each functional module are compared [17]. (a)<br />

Comparison of <strong>the</strong> aCas and iCas modules for type I <strong>CRISPR</strong>/Cas <strong>system</strong>s of four<br />

closely related S. islandicus stra<strong>in</strong>s. Pairwise <strong>the</strong>y show a high sequence identity of<br />

99% for two modules, but when <strong>the</strong> two pairs are compared <strong>the</strong> comb<strong>in</strong>ed iCas<br />

prote<strong>in</strong>s rema<strong>in</strong> almost identical <strong>in</strong> sequence, whereas <strong>the</strong> aCas modules show<br />

only 74% sequence similarity between <strong>the</strong> pairs, consistent with <strong>the</strong> aCas module<br />

hav<strong>in</strong>g been exchanged for one of <strong>the</strong> group of stra<strong>in</strong>s [17]. (b) A similar study was<br />

performed for shared genes of four <strong>the</strong>rmoneutrophilic Pyrobaculum stra<strong>in</strong>s,<br />

where two pairs each show similar levels of am<strong>in</strong>o acid sequence similarity for<br />

<strong>the</strong>ir aCas and iCas modules (about 70%), but when <strong>the</strong> two pairs are compared <strong>the</strong><br />

aCas sequences rema<strong>in</strong> constant at about 70% whereas <strong>the</strong> iCas module yields<br />

only 28% similarity – <strong>in</strong>dicative of <strong>the</strong> iCas modules hav<strong>in</strong>g been exchanged. Gene<br />

contents of <strong>the</strong> two pairs of iCas modules also <strong>in</strong>dicate that <strong>the</strong>y belong to different<br />

subtypes. Gene modules are color-coded as <strong>in</strong> Figure 2. Abbreviations: C, <strong>CRISPR</strong><br />

locus; t.r., transcriptional regulator (<strong>in</strong> green); n.d., gene identity not determ<strong>in</strong>ed.<br />

553


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

prote<strong>in</strong> sequence identity rema<strong>in</strong>ed at 70%, but a much<br />

lower value of 28% was observed for <strong>the</strong> iCas prote<strong>in</strong>s,<br />

<strong>in</strong>dicative of exchange of <strong>the</strong> latter (Figure 4b).<br />

Constra<strong>in</strong>ts on modular exchange<br />

Specific <strong>in</strong>teractions with <strong>the</strong> repeat sequence, ei<strong>the</strong>r at<br />

<strong>the</strong> DNA or RNA level, are crucial for <strong>the</strong> function of <strong>the</strong><br />

aCas, pCas and <strong>in</strong>terference modules, and <strong>the</strong> capacity of<br />

some prote<strong>in</strong> components to <strong>in</strong>teract specifically with <strong>the</strong><br />

repeat sequence might be a major constra<strong>in</strong>t on modular<br />

exchange. Integration of new spacers thus probably<br />

depends on Cas prote<strong>in</strong> recognition of <strong>the</strong> first repeat<br />

and adjo<strong>in</strong><strong>in</strong>g leader region [16,19]. Cas6 associates specifically<br />

with, and cleaves, <strong>the</strong> repeat dur<strong>in</strong>g process<strong>in</strong>g<br />

[20,22] and is sometimes cofunctional with different <strong>in</strong>terference<br />

modules. The iCas complex recognizes repeat sequence<br />

elements at <strong>the</strong> ends of crRNA for DNA target<strong>in</strong>g,<br />

and <strong>the</strong> iCmr complex b<strong>in</strong>ds to <strong>the</strong> repeat sequence at <strong>the</strong> 5 0<br />

end of crRNAs target<strong>in</strong>g RNA [23,45,42]. The small PAM<br />

motif also differs <strong>in</strong> sequence for different <strong>CRISPR</strong>/Cas<br />

<strong>system</strong>s, and <strong>the</strong> motif is likely to be important for protospacer<br />

selection, for determ<strong>in</strong><strong>in</strong>g its orientation on <strong>in</strong>sertion<br />

<strong>in</strong> <strong>CRISPR</strong> loci [16,19,31] and, at some level, to be<br />

important for DNA target<strong>in</strong>g [19,35,46]. Moreover, <strong>the</strong><br />

length of <strong>the</strong> crRNA spacer sequence may <strong>in</strong>fluence <strong>the</strong><br />

target<strong>in</strong>g and cleavage by <strong>the</strong> iCas module [42]. Taken<br />

toge<strong>the</strong>r, <strong>the</strong>re appear to be multiple sequence and structural<br />

constra<strong>in</strong>ts on modular exchange that are likely to be<br />

offset partly by <strong>the</strong> relatively conserved sequence at <strong>the</strong><br />

leader-distal end of repeats. In support, putative examples<br />

of modular exchange, <strong>in</strong>clud<strong>in</strong>g those shown <strong>in</strong> Figure 4,<br />

exhibit fairly conserved repeat sequences, spacer sizes and<br />

predicted PAM motifs. These examples also show that on<br />

modular exchange <strong>the</strong> repeat <strong>in</strong>variably follows <strong>the</strong> aCas<br />

and not <strong>the</strong> iCas modules [17].<br />

Natural dynamics of <strong>CRISPR</strong> loci<br />

Changes can occur <strong>in</strong> <strong>CRISPR</strong> loci by a variety of mechanisms<br />

without compromis<strong>in</strong>g <strong>the</strong>ir overall viability. New<br />

spacer-repeat units are added, <strong>in</strong>termittently, at or near<br />

<strong>the</strong> repeat adjacent to <strong>the</strong> leader [16,19,28,29]. Moreover,<br />

comparative analyses of closely related archaeal species<br />

support: (i) <strong>the</strong> occurrence of large <strong>in</strong>dels, generally deletions;<br />

(ii) duplication of sets of spacer-repeat units, and (iii)<br />

<strong>in</strong>tracellular exchange of spacer-repeat units between<br />

<strong>CRISPR</strong> loci [12,16,28]. Changes can also be <strong>in</strong>duced <strong>in</strong><br />

<strong>CRISPR</strong> loci by <strong>in</strong>vad<strong>in</strong>g genetic elements carry<strong>in</strong>g, for<br />

example, essential metabolic genes or, possibly, tox<strong>in</strong>–<br />

antitox<strong>in</strong> ma<strong>in</strong>tenance <strong>system</strong>s [35,47]. Such changes were<br />

demonstrated by challeng<strong>in</strong>g <strong>CRISPR</strong> loci of different<br />

Sulfolobus species with plasmids carry<strong>in</strong>g match<strong>in</strong>g protospacers<br />

and appropriate PAM motifs ma<strong>in</strong>ta<strong>in</strong>ed under<br />

selection [35]. This resulted <strong>in</strong> loss of ei<strong>the</strong>r <strong>CRISPR</strong><br />

regions conta<strong>in</strong><strong>in</strong>g match<strong>in</strong>g spacers or complete<br />

<strong>CRISPR</strong>/Cas <strong>system</strong>s. In S. islandicus, 50% of viable<br />

transformants had specifically lost <strong>the</strong> match<strong>in</strong>g spacerrepeat<br />

unit, suggest<strong>in</strong>g that feedback and <strong>in</strong>terference of<br />

match<strong>in</strong>g spacers might occur rarely, followed by recomb<strong>in</strong>ational<br />

repair via adjacent repeats or by slippage occurr<strong>in</strong>g<br />

dur<strong>in</strong>g DNA replication [35]. Fur<strong>the</strong>rmore, some<br />

challenged spacers of S. solfataricus were <strong>in</strong>activated by<br />

554<br />

<strong>the</strong> direct <strong>in</strong>sertion of <strong>in</strong>sertion sequence (IS) elements<br />

[35]. Bio<strong>in</strong>formatic analyses have also provided support for<br />

spacers be<strong>in</strong>g <strong>in</strong>activated by mutation of <strong>the</strong> border<strong>in</strong>g<br />

repeats, and this could generate defective crRNAs [48].<br />

Thus <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong> loci can be compromised by<br />

many different mechanisms.<br />

Anti-<strong>CRISPR</strong> mechanisms and defective <strong>CRISPR</strong>/Cas<br />

and Cmr modules<br />

Specific ways <strong>in</strong> which archaeal viruses and plasmids<br />

might circumvent <strong>CRISPR</strong> <strong>system</strong>s rema<strong>in</strong> speculative,<br />

<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> observation that genomes of crenarchaeal<br />

rudiviruses and lipothrixviruses accrue 12 bp <strong>in</strong>dels, probably<br />

deletions, when passed through different hosts [49].<br />

However, given <strong>the</strong> complexity of many archaeal <strong>CRISPR</strong><br />

<strong>system</strong>s, <strong>the</strong>y are also vulnerable to mutation, rearrangements<br />

or transposition events [30,50]. The multiple transcriptional<br />

regulators present <strong>in</strong> many archaeal <strong>CRISPR</strong><br />

<strong>system</strong>s (Figures 2 and 4) are obvious targets. For example,<br />

<strong>in</strong> an S. islandicus stra<strong>in</strong> <strong>the</strong> putative provirus M164 is<br />

<strong>in</strong>tegrated <strong>in</strong>to <strong>the</strong> gene for Csa3, <strong>the</strong> putative transcriptional<br />

regulator of <strong>the</strong> aCas gene cassette, but apparently<br />

leaves <strong>the</strong> pCas and iCas modules unaffected [12,17].<br />

Moreover, bacteriophage EPV1 characterized <strong>in</strong> a metagenomics<br />

study encodes <strong>the</strong> proteobacterial transcriptional<br />

repressor H-NS [51] that can <strong>in</strong>activate <strong>the</strong> entire E. coli<br />

<strong>CRISPR</strong> <strong>system</strong> [38]. Many archaeal <strong>system</strong>s lack core<br />

genes, and <strong>CRISPR</strong> loci sometimes lack leaders<br />

[16,30,50]. It rema<strong>in</strong>s unclear whe<strong>the</strong>r <strong>the</strong>se defective<br />

modules can be complemented by Cas prote<strong>in</strong>s of ano<strong>the</strong>r<br />

module of a similar type with<strong>in</strong> a given organism. S.<br />

solfataricus stra<strong>in</strong>s P1 and P2 carry a <strong>CRISPR</strong> locus E<br />

lack<strong>in</strong>g an aCas module that is not complemented by aCas<br />

prote<strong>in</strong>s associated with <strong>the</strong> phylogenetically similar<br />

<strong>CRISPR</strong> loci C and D, but this could reflect sequence<br />

differences <strong>in</strong> <strong>the</strong> leaders [16]. There might also be advantages,<br />

at least temporarily, <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g cofunctional<br />

process<strong>in</strong>g and <strong>in</strong>terference modules despite defective adaptation<br />

[26,28].<br />

Genomic mobility<br />

Comparative studies of <strong>the</strong> Sulfolobales <strong>in</strong>dicated that<br />

<strong>CRISPR</strong> <strong>system</strong>s are <strong>in</strong>variably located <strong>in</strong> genomic regions<br />

variable <strong>in</strong> gene content and often rich <strong>in</strong> transposable<br />

elements [44,52]. Fur<strong>the</strong>rmore, n<strong>in</strong>e genomes of closely<br />

related S. islandicus stra<strong>in</strong>s from different geographical<br />

locations carried two to four apparently viable comb<strong>in</strong>ations<br />

of different subfamilies of both <strong>CRISPR</strong>/Cas, and<br />

<strong>in</strong>dependent iCmr and iCsm modules, <strong>in</strong>dicative of <strong>the</strong>ir<br />

hav<strong>in</strong>g been transferred between stra<strong>in</strong>s [44]. Strong evidence<br />

for specific <strong>in</strong>tergenomic transfer of <strong>CRISPR</strong> loci<br />

carried on larger chromosomal fragments is available for<br />

Pyrococcus and Sulfolobus stra<strong>in</strong>s [12,53] and for lactic<br />

acid bacteria [50]. Whe<strong>the</strong>r such exchange is common for<br />

all archaea rema<strong>in</strong>s unclear because for stra<strong>in</strong>s of S.<br />

solfataricus, more distantly related than those of S. islandicus,<br />

<strong>CRISPR</strong>/Cas <strong>system</strong>s have been largely reta<strong>in</strong>ed<br />

and share many identical spacer sequences [12,16].<br />

Given <strong>the</strong> potential for mobility of <strong>CRISPR</strong> <strong>system</strong>s, it<br />

was speculated that tox<strong>in</strong>–antitox<strong>in</strong> <strong>system</strong>s, encoded near<br />

<strong>CRISPR</strong> loci, could help to stabilize <strong>the</strong> <strong>CRISPR</strong> genetic


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

vapBC vapBC<br />

53 13 9<br />

<strong>system</strong>s with<strong>in</strong> chromosomes [52]. An extreme example of<br />

this occurs <strong>in</strong> Acidianus hospitalis, a slowly grow<strong>in</strong>g organism<br />

carry<strong>in</strong>g 26 vapBC antitox<strong>in</strong>–tox<strong>in</strong> gene pairs, four<br />

of which are <strong>in</strong>terwoven with <strong>the</strong> <strong>CRISPR</strong>/Cas/Csm <strong>system</strong><br />

(Figure 5) and <strong>the</strong> fifth is associated with a separate<br />

<strong>CRISPR</strong>/Cas <strong>system</strong> [52]. The absence of any encoded<br />

VapB or VapC prote<strong>in</strong>s with similar sequences <strong>in</strong> this<br />

organism is essential for <strong>the</strong> proposed capacity to ma<strong>in</strong>ta<strong>in</strong><br />

a <strong>CRISPR</strong>/Cas <strong>system</strong> when loss of <strong>the</strong> DNA region could<br />

lead to VapC-<strong>in</strong>duced cell death [52].<br />

Interdoma<strong>in</strong> mobility<br />

Genetic exchange between archaea and bacteria is restricted<br />

by many factors, <strong>in</strong>clud<strong>in</strong>g basic <strong>in</strong>compatibility of <strong>the</strong>ir<br />

virus–host <strong>in</strong>teractions and radically different conjugative<br />

mechanisms [4,6,54]. Moreover, even after successful DNA<br />

exchange, basic differences <strong>in</strong> <strong>the</strong> mechanisms of transcriptional<br />

<strong>in</strong>itiation and term<strong>in</strong>ation, and of translational<br />

<strong>in</strong>itiation, would present formidable barriers to viable gene<br />

expression [55,56]. Fur<strong>the</strong>rmore, as argued above, many<br />

archaea have adapted to extreme low-energy environments<br />

where levels of bacterial cells are low or nonexistent.<br />

In an attempt to <strong>in</strong>terpret <strong>the</strong> extent to which <strong>in</strong>terdoma<strong>in</strong><br />

exchange has <strong>in</strong>fluenced <strong>the</strong> evolution of archaeal <strong>CRISPR</strong><br />

<strong>system</strong>s, Markov cluster<strong>in</strong>g algorithm (MCL) techniques<br />

based on Cas1 sequences were used to compare phylogenetically<br />

<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s of archaea and bacteria.<br />

The results support <strong>the</strong> absence of type II <strong>CRISPR</strong>/Cas<br />

<strong>system</strong>s <strong>in</strong> archaea and revealed clusters specific to, or<br />

strongly biased to, archaea or bacteria, with one cluster<br />

carry<strong>in</strong>g multiple archaeal, predom<strong>in</strong>antly methanoarchaeal,<br />

and bacterial members [17,18,57]. Qualitatively, <strong>the</strong><br />

analysis suggests that <strong>in</strong>terdoma<strong>in</strong> exchange of aCas modules<br />

occurs rarely and <strong>the</strong>n predom<strong>in</strong>antly <strong>in</strong> environments<br />

where archaea and bacteria are both abundant.<br />

There is limited evidence for homologous <strong>CRISPR</strong>-like<br />

mechanisms operat<strong>in</strong>g <strong>in</strong> eukaryotes. The RNA-target<strong>in</strong>g<br />

<strong>CRISPR</strong>/Cmr <strong>system</strong> shows some mechanistic similarity to<br />

RNAi, <strong>the</strong> viral RNA <strong>in</strong>terference <strong>system</strong> of eukarya<br />

[14,23]. Moreover, DNA-target<strong>in</strong>g <strong>CRISPR</strong>/Cas <strong>system</strong>s,<br />

<strong>in</strong> general, share features of <strong>the</strong> Piwi/Argonaute-<strong>in</strong>teract<strong>in</strong>g<br />

(piRNA) <strong>system</strong> where RNA-encod<strong>in</strong>g DNA accumulates<br />

passively <strong>in</strong> a small number of chromosomal loci.<br />

Transcripts from <strong>the</strong>se loci are processed <strong>in</strong>to small<br />

ssRNAs that complex with Piwi/Argonaute prote<strong>in</strong>s and<br />

can <strong>in</strong>hibit DNA transposition activity [10,16,31]. Although<br />

early <strong>in</strong> evolution <strong>the</strong>re may have been limited<br />

coevolution of <strong>the</strong>se <strong>in</strong>terference <strong>system</strong>s for all three<br />

doma<strong>in</strong>s, <strong>the</strong> archaeal and bacterial <strong>system</strong>s have clearly<br />

coevolved and <strong>in</strong>terchanged to a significant degree, with<br />

<strong>the</strong> exception of <strong>the</strong> type II <strong>CRISPR</strong>/Cas <strong>system</strong> dependent<br />

on <strong>the</strong> bacteria-specific RNase III enzyme for process<strong>in</strong>g<br />

[36].<br />

cas4 csx1 vapBC iCsm<br />

csa1 vapBC cas2 cas1 cas6<br />

TRENDS <strong>in</strong> Microbiology<br />

Figure 5. A type III <strong>CRISPR</strong> <strong>system</strong> of <strong>the</strong> acido<strong>the</strong>rmophile A. hospitalis carry<strong>in</strong>g four <strong>in</strong>terwoven antitox<strong>in</strong>–tox<strong>in</strong> vapBC gene pairs that are highly divergent <strong>in</strong> sequence<br />

[52]. Functional module genes are color-coded as <strong>in</strong> Figure 2, and <strong>in</strong>clude genes of unknown function (grey). Numbers of repeats are <strong>in</strong>dicated for each <strong>CRISPR</strong> locus.<br />

Conclud<strong>in</strong>g remarks<br />

One of <strong>the</strong> puzzles concern<strong>in</strong>g archaeal <strong>CRISPR</strong> <strong>system</strong>s is<br />

why <strong>the</strong>y are so diverse and complex. There are often<br />

multiple <strong>CRISPR</strong> loci with<strong>in</strong> a given archaeon carry<strong>in</strong>g<br />

hundreds of unique spacer sequences with multiple significant<br />

spacer matches to a given type of virus or conjugative<br />

plasmid [26,28,30,44]. Possibly <strong>the</strong> diversity and complexity<br />

reflects <strong>the</strong> large variety of different virus families<br />

characterized for extreme <strong>the</strong>rmophiles, and to a lesser<br />

extent haloarchaea [4–6]. Ano<strong>the</strong>r possibility is that, given<br />

<strong>the</strong>ir modular structures, and <strong>the</strong> diversity of <strong>the</strong>ir putative<br />

transcriptional regulators, <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s may<br />

not necessarily elim<strong>in</strong>ate genetic elements. For example,<br />

<strong>the</strong> <strong>immune</strong> <strong>system</strong>s might only be activated when replication<br />

or transcription of genetic elements reaches a certa<strong>in</strong><br />

level, consistent with many viruses be<strong>in</strong>g stably<br />

ma<strong>in</strong>ta<strong>in</strong>ed at low copy-numbers with<strong>in</strong> cells [4–6].<br />

In addition to determ<strong>in</strong><strong>in</strong>g <strong>the</strong> detailed mechanistic roles<br />

of most of <strong>the</strong> core prote<strong>in</strong>s, many uncharacterized <strong>CRISPR</strong>related<br />

prote<strong>in</strong>s rema<strong>in</strong>, some which are archaea-specific<br />

and that are commonly associated with <strong>in</strong>terference modules<br />

(Figure 4a), and <strong>the</strong>se might generate diversity <strong>in</strong><br />

target<strong>in</strong>g or cleavage mechanisms. Some unclassified<br />

<strong>CRISPR</strong>-related prote<strong>in</strong>s are likely to have secondary roles,<br />

as suggested for <strong>the</strong> antitox<strong>in</strong>–tox<strong>in</strong> <strong>system</strong> of A. hospitalis<br />

help<strong>in</strong>g to stabilize <strong>CRISPR</strong>/Cas <strong>system</strong>s on chromosomes<br />

[52]. Function(s) of <strong>the</strong> iCmr and iCsm modules need to be<br />

exam<strong>in</strong>ed more extensively <strong>in</strong> vivo to establish whe<strong>the</strong>r<br />

RNA viruses and/or transcripts of DNA viruses are targeted.<br />

Target<strong>in</strong>g of transcripts could be a means of regulat<strong>in</strong>g and<br />

stabiliz<strong>in</strong>g DNA viruses <strong>in</strong> vivo. At least for Sulfolobus<br />

species, robust genetic <strong>system</strong>s are now available to resolve<br />

<strong>the</strong>se questions [35,40]. Questions rema<strong>in</strong> as to whe<strong>the</strong>r<br />

crRNAs are selected for DNA or RNA target<strong>in</strong>g or whe<strong>the</strong>r<br />

any spacer RNA potentially can be used for ei<strong>the</strong>r <strong>system</strong>,<br />

and to what extent Cas6 prote<strong>in</strong>s are <strong>in</strong>terchangeable between<br />

<strong>the</strong> different <strong>in</strong>terference <strong>system</strong>s with<strong>in</strong> a given<br />

organism [42]. Ano<strong>the</strong>r press<strong>in</strong>g question is <strong>the</strong> extent to<br />

which defective functional modules are complemented by<br />

components of o<strong>the</strong>r <strong>CRISPR</strong> <strong>system</strong>s; a high priority will be<br />

to experimentally test a broad range of phylogenetically<br />

diverse <strong>CRISPR</strong> <strong>system</strong>s to establish <strong>the</strong> extent of <strong>the</strong>ir<br />

structural and functional diversity.<br />

Acknowledgments<br />

We thank Luciano Marraff<strong>in</strong>i, Mark Young, Qunx<strong>in</strong> She and Malcolm<br />

White for helpful discussions and <strong>the</strong> referees for <strong>the</strong>ir constructive <strong>in</strong>put.<br />

Research was supported by <strong>the</strong> Danish Natural Science Research<br />

Council.<br />

References<br />

1 Gribaldo, S. et al. (2010) The orig<strong>in</strong> of eukaryotes and <strong>the</strong>ir relationship<br />

with <strong>the</strong> <strong>Archaea</strong>: are we at a phylogenomic impasse? Nat. Rev.<br />

Microbiol. 8, 743–752<br />

555


Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />

2 Kurland, C.G. et al. (2006) Genomics and <strong>the</strong> irreducible nature of<br />

eukaryotic cells. Science 312, 1011–1014<br />

3 Valent<strong>in</strong>e, D.L. (2007) Adaptations to energy stress dictate <strong>the</strong> ecology<br />

and evolution of archaea. Nat. Rev. Microbiol. 5, 316–323<br />

4 Prangishvili, D. et al. (2006) Viruses of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view.<br />

Nat. Rev. Microbiol. 11, 837–848<br />

5 Porter, K. et al. (2007) Virus–host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr. Op<strong>in</strong>.<br />

Microbiol. 10, 418–424<br />

6 Lawrence, C.M. et al. (2009) Structural and functional studies of<br />

archaeal viruses. J. Biol. Chem. 284, 12599–12603<br />

7 Snyder, J.C. et al. (2010) Use of cellular <strong>CRISPR</strong> (clusters of regularly<br />

<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats) spacer-based microarrays for<br />

detection of viruses <strong>in</strong> environmental samples. Appl. Environ.<br />

Microbiol. 76, 7251–7258<br />

8 Brumfield, S.K. et al. (2009) Particle assembly and ultrastructural<br />

features associated with replication of <strong>the</strong> lytic archaeal virus<br />

Sulfolobus turreted icosahedral virus. J. Virol. 83, 5964–5970<br />

9 Bize, A. et al. (2009) A unique virus release mechanism <strong>in</strong> <strong>the</strong> archaea.<br />

Proc. Natl. Acad. Sci. U.S.A. 106, 11306–11311<br />

10 Karg<strong>in</strong>ov, F.V. and Hannon, G.J. (2010) The <strong>CRISPR</strong> <strong>system</strong>: small<br />

RNA-guided defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7–19<br />

11 Terns, M.P. and Terns, R.M. (2011) <strong>CRISPR</strong>-based adaptive <strong>immune</strong><br />

<strong>system</strong>s. Curr. Op<strong>in</strong>. Microbiol. 14, 1–7<br />

12 Garrett, R.A. et al. (2011) <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong><br />

Sulfolobales: complexity and diversity. Biochem. Soc. Trans. 39, 51–57<br />

13 Prangishvili, D. et al. (1998) Conjugation <strong>in</strong> archaea: frequent occurrence<br />

of conjugative plasmids <strong>in</strong> Sulfolobus. Plasmid 40, 190–202<br />

14 Makarova, K.S. et al. (2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong><br />

<strong>system</strong> <strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted<br />

enzymatic mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and<br />

hypo<strong>the</strong>tical mechanisms of action. Biol. Direct 1, 7<br />

15 Makarova, K.S. et al. (2011) Evolution and classification of <strong>the</strong><br />

<strong>CRISPR</strong>-Cas <strong>system</strong>s. Nat. Rev. Microbiol. 9, 467–477<br />

16 Lillestøl, R.K. et al. (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />

Sulfolobus: bidirectional transcription and dynamic properties. Mol.<br />

Microbiol. 72, 259–272<br />

17 Shah, S.A. and Garrett, R.A. (2011) <strong>CRISPR</strong>/Cas and Cmr modules,<br />

mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res. Microbiol.<br />

162, 27–38<br />

18 Shah, S.A. et al. (2011) <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong><br />

<strong>system</strong>s of archaea. In Regulatory RNAs <strong>in</strong> Prokaryotes<br />

(Marchfelder, A. and Hess, W., eds), pp. 163–181, Spr<strong>in</strong>ger Press<br />

19 Deveau, H. et al. (2008) Phage response to <strong>CRISPR</strong>-encoded resistance<br />

<strong>in</strong> Streptococcus <strong>the</strong>rmophilus. J. Bacteriol. 190, 1390–1400<br />

20 Brouns, S.J. et al. (2008) Small <strong>CRISPR</strong> RNAs guide antiviral defense<br />

<strong>in</strong> prokaryotes. Science 321, 960–964<br />

21 Hale, C. et al. (2008) Prokaryotic silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus<br />

furiosus. RNA 14, 1–8<br />

22 Carte, J. et al. (2010) B<strong>in</strong>d<strong>in</strong>g and cleavage of <strong>CRISPR</strong> RNA by Cas6.<br />

RNA 16, 2181–2188<br />

23 Hale, C.R. et al. (2009) RNA-guided RNA cleavage by a <strong>CRISPR</strong> RNA–<br />

Cas prote<strong>in</strong> complex. Cell 139, 945–956<br />

24 Wang, R. et al. (2011) Interaction of Cas6 riboendonuclease with<br />

<strong>CRISPR</strong> RNAs: recognition and cleavage. Structure 19, 257–264<br />

25 Garneau, J.E. et al. (2010) The <strong>CRISPR</strong>/Cas bacterial <strong>immune</strong> <strong>system</strong><br />

cleaves bacteriophage and plasmid DNA. Nature 468, 67–71<br />

26 Shah,S.A.etal.(2009)Distributionsof<strong>CRISPR</strong>spacermatches<strong>in</strong>viruses<br />

and plasmids of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />

<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem. Soc. Trans. 37, 23–28<br />

27 Barrangou, R. et al. (2007) <strong>CRISPR</strong> provides acquired resistance<br />

aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315, 1709–1712<br />

28 Lillestøl, R.K. et al. (2006) A putative viral defence mechanism <strong>in</strong><br />

archaeal cells. <strong>Archaea</strong> 2, 59–72<br />

29 Held, N.L. et al. (2010) <strong>CRISPR</strong> associated diversity with<strong>in</strong> a<br />

population of Sulfolobus islandicus. PLoS ONE 5, e12988<br />

30 Andersson, A.F. and Banfield, J.F. (2008) Virus population dynamics<br />

and acquired resistance <strong>in</strong> natural microbial communities. Science 320,<br />

1047–1049<br />

31 Mojica, F.J. et al. (2009) Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of<br />

<strong>the</strong> prokaryotic <strong>CRISPR</strong> <strong>system</strong>. Microbiology 155, 733–740<br />

32 Tang, T-H. et al. (2002) Identification of 86 candidates for small nonmessenger<br />

RNAs from <strong>the</strong> archaeon Archaeoglobus fulgidus. Proc.<br />

Natl. Acad. Sci. U.S.A. 99, 7536–7541<br />

556<br />

33 Haurwitz, R.E. et al. (2010) Sequence and structure-specific RNA<br />

process<strong>in</strong>g by a <strong>CRISPR</strong> endonuclease. Science 10, 1355–1358<br />

34 Tang, T-H. et al. (2005) Identification of novel non-cod<strong>in</strong>g RNAs as<br />

potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus.<br />

Mol. Microbiol. 55, 469–481<br />

35 Gudbergsdottir, S. et al. (2011) Dynamic properties of <strong>the</strong> Sulfolobus<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vectorborne<br />

viral and plasmid genes and protospacers. Mol. Microbiol. 7, 35–49<br />

36 Deltcheva, E. et al. (2011) <strong>CRISPR</strong> RNA maturation by trans-encoded<br />

small RNA and host factor RNase III. Nature 471, 602–607<br />

37 Lykke-Andersen, J. et al. (1997) <strong>Archaea</strong>l <strong>in</strong>trons: splic<strong>in</strong>g,<br />

<strong>in</strong>tercellular mobility and evolution. Trends Biochem. Sci. 22, 326–331<br />

38 Pul, U. et al. (2010) Identification and characterisation of E. coli<br />

<strong>CRISPR</strong>-cas promoters and <strong>the</strong>ir silenc<strong>in</strong>g by H-NS. Mol. Microbiol.<br />

75, 1495–1512<br />

39 Agari, Y. et al. (2011) Transcription profile of Thermus <strong>the</strong>rmophilus<br />

<strong>CRISPR</strong> <strong>system</strong>s after phage <strong>in</strong>fection. J. Mol. Biol. 395, 270–281<br />

40 Manica, A. et al. (2011) In vitro activity of <strong>CRISPR</strong>-mediated virus<br />

defence <strong>in</strong> a hyper<strong>the</strong>rmophilic archaeon. Mol. Microbiol. 80, 481–491<br />

41 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />

limits horizontal gene transfer <strong>in</strong> Staphylococci by target<strong>in</strong>g DNA.<br />

Science 322, 1843–1845<br />

42 L<strong>in</strong>dtner, N.G. et al. (2011) Structural and functional characterisation<br />

of an archaeal CASCADE complex for <strong>CRISPR</strong>-mediated viral defense.<br />

J. Biol. Chem. 85, 6287–6292<br />

43 Semenova, E. et al. (2011) Interference by clustered regularly<br />

<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat (<strong>CRISPR</strong>) RNA is governed by<br />

a seed sequence. Proc. Natl. Acad. Sci. U.S.A. 108, 10098–10103<br />

44 Guo, L. et al. (2011) Genome analyses of Icelandic stra<strong>in</strong>s of Sulfolobus<br />

islandicus, model organisms for genetic and virus–host <strong>in</strong>teraction<br />

studies. J. Bacteriol. 193, 1672–1680<br />

45 Jore, M.M. et al. (2011) Structural basis of <strong>CRISPR</strong>-guided RNA<br />

recognition. Nat. Struct. Mol. Biol. 18, 529–537<br />

46 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2010) Self versus non-self<br />

discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463,<br />

568–571<br />

47 Dyall-Smith, M. (2011) Dangerous weapons: a cautionary tale of<br />

<strong>CRISPR</strong> defence. Mol. Microbiol. 79, 3–6<br />

48 Stern, A. et al. (2010) Self-target<strong>in</strong>g by <strong>CRISPR</strong>: gene regulation or<br />

autioimmunity? Trends Genet. 26, 335–340<br />

49 Vestergaard, G. et al. (2008) SRV, a new rudiviral isolate from<br />

Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses with <strong>the</strong><br />

host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190, 6837–6845<br />

50 Horvath et al. (2009) Comparative analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid<br />

bacteria genomes. Int. J. Food Microbiol. 131, 62–70<br />

51 Skennerton, C.T. et al. (2011) Phage encoded H-NS: a potential achilles<br />

heel <strong>in</strong> <strong>the</strong> bacterial defence <strong>system</strong>. PLoS ONE 6, e20095<br />

52 You, X-Y. et al. (2011) Genomic studies of Acidianus hospitalis W1 a<br />

crenarchaeal host for study<strong>in</strong>g virus and plasmid life cycles.<br />

Extremophiles 15, 487–497<br />

53 Portillo, M.C. and Gonzalez, J.M. (2009) <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong><br />

Thermococcales: evidence for associated horizontal gene transfer <strong>in</strong><br />

Pyrococcus furiosus. J. Appl. Genet. 50, 421–430<br />

54 Greve et al. (2004) Genomic comparison of archaeal conjugative<br />

plasmids from Sulfolobus. <strong>Archaea</strong> 1, 231–239<br />

55 Torar<strong>in</strong>sson, E. et al. (2005) Divergent transcriptional and<br />

translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7, 47–54<br />

56 Santangelo, T.J. et al. (2009) <strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription<br />

term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191, 7102–7108<br />

57 Haft, D.H. et al. (2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong><br />

families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong> prokaryotic<br />

genomes. PLoS Comput. Biol. 1, 474–483<br />

58 Kun<strong>in</strong>, V. et al. (2007) Evolutionary conservation of sequence and<br />

secondary structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol. 8, R611–R617<br />

59 Wiedenheft, B. et al. (2009) Structural base for DNase activity of a<br />

conserved prote<strong>in</strong> implicated <strong>in</strong> <strong>CRISPR</strong>-mediated genome defense.<br />

Structure 17, 904–912<br />

60 Beloglazova, N. et al. (2008) A novel family of sequence-specific<br />

endoribonucleases associated with clustered regularly <strong>in</strong>terspaced<br />

short pal<strong>in</strong>dromic repeats. J. Biol. Chem. 283, 20361–20371<br />

61 S<strong>in</strong>kunas, T. et al. (2011) Cas3 is a s<strong>in</strong>gle-stranded DNA nuclease and<br />

ATP-dependent helicase <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong>. EMBO J.<br />

30, 1335–1342


Chapter 10<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune<br />

Systems of <strong>Archaea</strong><br />

Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett*<br />

1 Introduction<br />

The <strong>CRISPR</strong>/Cas (Clustered Regularly Interspaced Short Pal<strong>in</strong>dromic Repeats/<br />

<strong>CRISPR</strong>-Associated Genes) and <strong>CRISPR</strong>/Cmr <strong>system</strong>s (Cmr: Cas module-RAMP<br />

(Repeat-Associated Mysterious Prote<strong>in</strong>s)) provide <strong>the</strong> basis for adaptive and hereditable<br />

<strong>immune</strong> responses directed aga<strong>in</strong>st <strong>the</strong> DNA and RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g<br />

elements. The former consists of <strong>CRISPR</strong> loci physically l<strong>in</strong>ked to a cassette of<br />

cas genes which toge<strong>the</strong>r appear to constitute <strong>in</strong>tegral genetic modules. cmr genes,<br />

clustered <strong>in</strong> Cmr modules, are sometimes physically l<strong>in</strong>ked to <strong>CRISPR</strong>/Cas modules.<br />

The <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong> occurs <strong>in</strong> almost all archaea and about 40 %<br />

of bacteria. Cmr modules are less common, occurr<strong>in</strong>g <strong>in</strong> only about one third of<br />

genomes carry<strong>in</strong>g <strong>CRISPR</strong>/Cas modules. An outl<strong>in</strong>e of how <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />

<strong>CRISPR</strong>/Cmr <strong>system</strong>s function is <strong>in</strong>dicated <strong>in</strong> Figure 1 where <strong>the</strong> former targets<br />

DNA and <strong>the</strong> latter RNA (mRNA and/or viral RNA) of <strong>the</strong> genetic elements.<br />

<strong>Archaea</strong>l <strong>CRISPR</strong> loci consist of clusters of spacer-repeat units vary<strong>in</strong>g <strong>in</strong> size<br />

from one to more than one hundred spacer-repeat units where each unit is about<br />

60 – 90 bp with repeats and spacers of, on average, 30 bp and 40 bp, respectively<br />

(Lillestøl et al., 2006; Grissa et al., 2008). <strong>CRISPR</strong> loci are preceded by a non<br />

prote<strong>in</strong> cod<strong>in</strong>g leader region which varies <strong>in</strong> size from about 150 to 550 bp and is<br />

<strong>in</strong>variably physically l<strong>in</strong>ked to a cas gene cassette (Jansen et al., 2002; Haft et al.,<br />

2005; Makarova et al., 2006; Lillestøl et al., 2006; Lillestøl et al., 2009). Cas and<br />

Cmr prote<strong>in</strong>s, <strong>in</strong>volved <strong>in</strong> <strong>the</strong> two different target<strong>in</strong>g pathways, are functionally and<br />

phylogenetically diverse. The <strong>CRISPR</strong>/Cas <strong>system</strong> specifically targets DNA elements<br />

(Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008;; Shah et al., 2009) while <strong>the</strong> <strong>CRISPR</strong>/Cmr<br />

<strong>system</strong> targets RNA, although whe<strong>the</strong>r mRNA and/or viral RNA rema<strong>in</strong>s unclear<br />

(Hale et al., 2009). <strong>CRISPR</strong>/Cas modules have been classified <strong>in</strong>to families on <strong>the</strong><br />

basis of sequences of <strong>the</strong>ir cas genes, leaders and repeats. Although <strong>the</strong>se modules<br />

* Adress<br />

53


54<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

show a capacity for transfer between phyla of <strong>the</strong> archaeal and bacterial Doma<strong>in</strong>s,<br />

and supposedly rarely across Doma<strong>in</strong> boundaries, archaea-specific features are never<strong>the</strong>less<br />

apparent.<br />

Crucial for <strong>the</strong> function<strong>in</strong>g of <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are <strong>the</strong> spacer sequences<br />

which derive from foreign <strong>in</strong>vad<strong>in</strong>g elements (Mojica et al., 2005; Pourcel et al.,<br />

2005; Bolot<strong>in</strong> et al., 2005; Lillestøl et al., 2006; Barrangou et al., 2007). The<br />

<strong>CRISPR</strong> loci generate whole transcripts which <strong>in</strong>itiate with<strong>in</strong> <strong>the</strong> leader sequence<br />

adjacent to <strong>the</strong> first repeat (Lillestøl et al., 2009). These are subsequently processed<br />

<strong>in</strong> <strong>the</strong>ir repeat regions yield<strong>in</strong>g end-products that constitute s<strong>in</strong>gle spacer-conta<strong>in</strong><strong>in</strong>g<br />

crRNAs (Tang et al., 2002; Tang et al., 2005; Lillestøl et al., 2006; Lillestøl et al.,<br />

2009). Process<strong>in</strong>g is effected by specific Cas or Cmr prote<strong>in</strong>s and, at least for <strong>the</strong><br />

leader<br />

virus<br />

new spacer<br />

viral DNA<br />

repeat<br />

viral DNA<br />

Cas-crRNA complex<br />

Cmr-crRNA complex<br />

DNA<br />

excision<br />

Cas complex<br />

cleaved viral DNA<br />

cleaved<br />

viral mRNA<br />

cleaved<br />

viral RNA<br />

Fig. 1. Diagram illustrat<strong>in</strong>g how <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s target genetic elements<br />

<strong>in</strong>vad<strong>in</strong>g a host cell. crRNAs are processed from whole transcripts of <strong>CRISPR</strong> loci.<br />

For <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> Cas prote<strong>in</strong>s complex with <strong>the</strong> crRNA and guide it to <strong>the</strong> complementary<br />

protospacer sequence <strong>in</strong> <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA element where <strong>the</strong>y anneal prior to<br />

DNA degradation. Cmr prote<strong>in</strong>s also complex with crRNA and guide <strong>the</strong>m to ei<strong>the</strong>r mRNA<br />

or viral RNA, target<strong>in</strong>g <strong>the</strong>m for degradation


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 55<br />

latter, two discrete archaeal crRNAs are produced each carry<strong>in</strong>g 8 nt of <strong>the</strong> repeat<br />

at <strong>the</strong> 5’-end and lack<strong>in</strong>g 5 nt or 11 nt from <strong>the</strong> 3’-end of each spacer (Carte et al.,<br />

2008; Hale et al., 2009). Complexes of Cas or Cmr prote<strong>in</strong>s transport <strong>the</strong> processed<br />

crRNAs to target, and <strong>in</strong>activate, DNA or RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g genetic<br />

elements (Brouns et al., 2008; Hale et al., 2008; Carte et al., 2008; Hale et al.,<br />

2009). Base pair<strong>in</strong>g mismatches occurr<strong>in</strong>g between <strong>the</strong> 5’ 8 nt repeat sequence of<br />

<strong>the</strong> crRNA and <strong>the</strong> Protospacer-Associated Motif (PAM) sequence adjacent to <strong>the</strong><br />

targeted protospacer of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA are essential for subsequent degradation<br />

of <strong>the</strong> latter and for ensur<strong>in</strong>g that <strong>the</strong> chromosomal <strong>CRISPR</strong> locus, itself, is not<br />

targeted (Horvath and Barrangou, 2010;; Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2010;; Lillestøl<br />

et al., 2009; Gudbergsdottir et al., 2010).<br />

2 <strong>Archaea</strong>l Viruses and Plasmids<br />

and Chromosomal Evolution<br />

Although few comprehensive studies have been performed on <strong>the</strong> relative abundance<br />

of different virus-like particle (VLP) morphotypes <strong>in</strong> archaea-rich environments,<br />

available results <strong>in</strong>dicate that sp<strong>in</strong>dles, filaments, rods and spheres predom<strong>in</strong>ate<br />

<strong>in</strong> terrestial hot spr<strong>in</strong>gs and hydro<strong>the</strong>rmal vents, while sp<strong>in</strong>dle-shaped and<br />

spherical virus-like particles (VLPs) prevail <strong>in</strong> hypersal<strong>in</strong>e environments (Rachel<br />

et al., 2002; Porter et al., 2007; Bize et al., 2008). Bacteriophage-like head-tail<br />

VLPs are found <strong>in</strong>frequently, although <strong>the</strong>ir proviruses have been detected <strong>in</strong> a few<br />

halo- and methanoarchaeal genomes (Porter et al., 2007; Krupovic et al., 2010).<br />

Several viruses, ma<strong>in</strong>ly from terrestial hot spr<strong>in</strong>gs have been classified <strong>in</strong>to eight<br />

new viral archaeal families and examples of <strong>the</strong>ir diverse morphotypes are illustrated<br />

<strong>in</strong> Figure 2. O<strong>the</strong>r viruses <strong>in</strong>clud<strong>in</strong>g several haloarchaeal viruses rema<strong>in</strong> to be<br />

classified (Porter et al., 2007). The latter process is complicated by <strong>the</strong> absence of<br />

a consistent relationship between morphology and genomic properties for euryarchaeal<br />

and crenarchaeal viruses. Overall <strong>the</strong>se discoveries underl<strong>in</strong>e <strong>the</strong> major differences<br />

between <strong>the</strong> archaeal and bacterial virospheres (Prangishvili et al., 2006a;<br />

Lawrence et al., 2009).<br />

<strong>Archaea</strong>l viral genomes fall <strong>in</strong> <strong>the</strong> size range 15 to 75 kb dsDNA and are circular<br />

or l<strong>in</strong>ear. Some l<strong>in</strong>ear genomes have free ends whereas o<strong>the</strong>rs, <strong>in</strong>clud<strong>in</strong>g those of<br />

rudiviruses and some lipothrixviruses have modified ends or are covalently closed<br />

and some genomes carry base-specific modifications (Zillig et al., 1998;; Peng et al.,<br />

2001). Consistent with <strong>the</strong> unusual and sometimes unique viral morphologies (Figure<br />

2), <strong>the</strong> viral genomes yielded very few significant sequence matches with genes<br />

<strong>in</strong> public sequence databases (Prangishvili et al., 2006b). These results are summarised<br />

<strong>in</strong> histograms of <strong>the</strong> major hyper<strong>the</strong>rmophilic crenarchaeal viruses <strong>in</strong> Figure<br />

3 where a large percentage of <strong>the</strong> genes are classified as unique for each virus.<br />

The most extreme case was for genes of <strong>the</strong> <strong>the</strong>rmoneutrophilic virus PSV which<br />

yielded almost no significant sequence matches <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al study (Bettstetter<br />

et al., 2003).


56<br />

a b c<br />

e<br />

d<br />

f<br />

h<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

Fig. 2. Typical morphologies of representatives of different families of archaeal viral families.<br />

a, SNDV; b, STSV1; c, ATV; d, SIFV; e, AFV1; f, PSV; g, SSV4; h, ARV1. Bars are<br />

100 nm<br />

With <strong>the</strong> availability of an <strong>in</strong>creas<strong>in</strong>g number of archaeal genome sequences,<br />

it has become clear that archaeal viruses and plasmids have played a major role<br />

<strong>in</strong> <strong>the</strong> evolution of host genomes. This process has apparently been fuelled by <strong>the</strong><br />

entrapment of foreign DNA elements <strong>in</strong> host chromosomes via an archaea-specific<br />

<strong>in</strong>tegrative process. Many archaeal <strong>in</strong>tegrase genes partition on <strong>in</strong>tegration such<br />

that, if <strong>the</strong> free form of <strong>the</strong> element is lost, <strong>the</strong> <strong>in</strong>tegrase will not be expressed<br />

and cannot effect excision of <strong>the</strong> genetic element from <strong>the</strong> chromosome (She et al.,<br />

2001). Many of <strong>the</strong> encaptured elements are recognisable as <strong>in</strong>tact or degenerate<br />

genetic entities and Markov-model analyses of whole archaeal genomes suggest<br />

that such genes of viral or plasmid orig<strong>in</strong> contribute disproportionately to <strong>the</strong> genes<br />

of unknown function <strong>in</strong> archaeal chromosomes (Cortez et al., 2009).<br />

<strong>Archaea</strong>l viruses and plasmids have also evolved complex relationships as dependents<br />

or antagonists. Thus, <strong>in</strong> <strong>the</strong> presence of a fusellovirus, pRN family plasmids<br />

g


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 57<br />

Fig. 3. Histogram show<strong>in</strong>g a summary of archaeal viral gene homologies to o<strong>the</strong>r viruses<br />

(virus only genes) and cellular chromosomes (cellular); unique <strong>in</strong>dicates no detectable homologs.<br />

Homologs <strong>in</strong> closely related viruses, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> rudiviruses ARV1 and SIRV1 and<br />

<strong>the</strong> spherical viruses PSV and TTSV1 are not <strong>in</strong>cluded (Prangishvili et al., 2006b)<br />

pSSVx and pSSVi are packaged <strong>in</strong>to fusellovirus-like particles and spread through<br />

Sulfolobus host cultures as satellite viruses (Arnold et al., 1999; Wang et al., 2007).<br />

In contrast, when a stra<strong>in</strong> of Acidianus hospitalis carry<strong>in</strong>g <strong>the</strong> conjugative plasmid<br />

pAH1 was <strong>in</strong>fected with <strong>the</strong> lipothrixvirus AFV1, plasmid replication appeared to<br />

be <strong>in</strong>hibited (Basta et al., 2009). Moreover, as mentioned below, Sulfolobus conjugative<br />

plasmids pNOB8 and pKEF9 carry <strong>CRISPR</strong> loci which may directly target<br />

and <strong>in</strong>activate archaeal viruses (She et al., 1998; Greve et al., 2004).<br />

3 Diversity of <strong>Archaea</strong>l <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr<br />

Immune Systems<br />

Bio<strong>in</strong>formatic analyses have demonstrated that homologs of a few core Cas prote<strong>in</strong>s<br />

occur widely throughout <strong>the</strong> archaeal and bacterial doma<strong>in</strong>s while o<strong>the</strong>rs occur less<br />

commonly and some are predom<strong>in</strong>antly archaeal or bacterial <strong>in</strong> character. Core<br />

gene sets typify <strong>the</strong> cas and cmr gene cassettes (Figure 4). For <strong>the</strong> former, <strong>the</strong> cas<br />

genes fall <strong>in</strong>to groups 1 and 2. This division is based on different factors <strong>in</strong>clud<strong>in</strong>g<br />

co-occurrence, co-regulation and synteny of <strong>the</strong> genes and, possibly, functional differences<br />

for <strong>the</strong> groups of prote<strong>in</strong>s (see below). The cas6 gene can occur <strong>in</strong> ei<strong>the</strong>r<br />

group and is likely to be cofunctional with both <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr


58<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

<strong>system</strong>s (Hale et al., 2009). For <strong>the</strong> cmr cassette, <strong>the</strong> two most conserved genes<br />

cmr2 and cmr5 are <strong>in</strong>terspersed with diverse RAMP-motif conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s<br />

(Figure 4B).<br />

It has also been shown that <strong>the</strong>re is a consistent phylogenetic l<strong>in</strong>kage between<br />

sequences of selected Cas prote<strong>in</strong>s and <strong>CRISPR</strong> locus repeats for archaea and<br />

bacteria (Haft et al., 2005; Makarova et al., 2006; Kun<strong>in</strong> et al., 2007; Shah et al.,<br />

2009). Fur<strong>the</strong>rmore for <strong>the</strong> Sulfolobales, a broader analysis of sequences of repeats,<br />

leader regions, and of Cas1 prote<strong>in</strong>s, demonstrated that <strong>the</strong> <strong>CRISPR</strong>/Cas modules<br />

could be classified <strong>in</strong>to dist<strong>in</strong>ct <strong>CRISPR</strong>/Cas families I to IV (Lillestøl et al., 2009;;<br />

Shah and Garrett, 2010) which are components of an earlier more broadly def<strong>in</strong>ed<br />

group of families CASS1 + 5 + 6 + 7 from archaea and bacteria (Haft et al., 2005;<br />

Makarova et al., 2006). Spatial distributions of all <strong>the</strong> archaeal and bacterial families<br />

are illustrated <strong>in</strong> Figure 5A us<strong>in</strong>g a Markov cluster<strong>in</strong>g approach based on Cas1<br />

prote<strong>in</strong> sequences. Whereas <strong>the</strong> crenarchaeal families I, II and III tend to cluster<br />

separately, <strong>the</strong> archaeal family IV sequences, which derive ma<strong>in</strong>ly from mesophilic<br />

euryarchaea, fall toge<strong>the</strong>r with a family of bacterial sequences (<strong>in</strong> green). A closely<br />

similar spatial distribution is also observed when crenarchaeal families I to IV are<br />

Fig. 4. Gene maps of A. cas cassettes and B. a Cmr module show<strong>in</strong>g only conserved core<br />

genes. Many o<strong>the</strong>r genes that occur less frequently are not <strong>in</strong>cluded. The cas genes are divided<br />

<strong>in</strong>to two groups 1 and 2 (see text). The Cmr module conta<strong>in</strong>s <strong>the</strong> highly conserved<br />

cmr2 and cmr5 genes and genes a to e, shaded grey, which correspond to different genes<br />

encod<strong>in</strong>g RAMP motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s which are present <strong>in</strong> 3 to 5 copies <strong>in</strong> <strong>the</strong> different<br />

Cmr module families (Garrett et al., 2010b)<br />

► Fig. 5. <strong>CRISPR</strong>/Cas modules can be divided <strong>in</strong>to families based on <strong>the</strong>ir unique characteristics,<br />

<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> Cas1 prote<strong>in</strong> sequence and nucleotide sequences of <strong>the</strong> repeat and<br />

leader regions.<br />

a) Spheres represent Cas1 prote<strong>in</strong> sequences from different organisms. Small distances between<br />

spheres reflects higher sequence similarity between <strong>the</strong>m. All Cas1 sequences that are<br />

currently publicly available are represented. Markov cluster<strong>in</strong>g reveals that all <strong>the</strong> sequences<br />

fall with<strong>in</strong> about 20 families (each coloured differently), 5 of which are very large. Strongly<br />

coloured spheres represent archaeal Cas1 sequences while bacterial sequences are shown <strong>in</strong><br />

faded colours. It is evident that some families are specific to bacteria, whereas o<strong>the</strong>rs are<br />

archaea-specific. A few <strong>CRISPR</strong>/Cas families are shared between both archaea and bacteria.<br />

Sulfolobales families I – IV are marked (Lillestøl et al. 2009) and o<strong>the</strong>rs rema<strong>in</strong> to be formally<br />

classified. An earlier broader classification, CASS1 to 7, is also <strong>in</strong>cluded (Haft et al.<br />

2005; Makarova et al. 2006).<br />

b) Leaders from <strong>the</strong> Sulfolobales are clustered based on <strong>the</strong>ir sequence similarities and <strong>the</strong>y<br />

fall <strong>in</strong>to <strong>the</strong> same group of families (I–IV) as those found for <strong>the</strong> Cas1 prote<strong>in</strong>s, and a similar<br />

result is obta<strong>in</strong>ed when repeat sequences are clustered (Lillestøl et al. 2009)


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 59<br />

clustered on <strong>the</strong> basis of <strong>the</strong>ir leader sequences (Figure 5B). Clearly, <strong>the</strong>re are o<strong>the</strong>r<br />

archaea-specific families (strongly coloured <strong>in</strong> Figure 5A) which rema<strong>in</strong> to be analysed<br />

and classified.<br />

Family I <strong>CRISPR</strong>/Cas modules are <strong>the</strong> most common amongst <strong>the</strong> Sulfolobales<br />

and o<strong>the</strong>r crenarchaea, and <strong>the</strong> most conserved <strong>in</strong> structural organisation. The two<br />

conserved groups of cas genes are located between <strong>the</strong> leaders and externally at<br />

one end of <strong>the</strong> module. The separation may be functionally significant with <strong>the</strong> former<br />

<strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g and <strong>in</strong>sertion of DNA spacer-repeat units and <strong>the</strong> latter<br />

encod<strong>in</strong>g RNA process<strong>in</strong>g and effector prote<strong>in</strong>s (Shah and Garrett, 2010).<br />

a<br />

b


60<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

The methanoarchaea and haloarchaea, which carry <strong>the</strong> majority of <strong>the</strong> family<br />

IV <strong>CRISPR</strong>/Cas modules show <strong>the</strong> least conservation <strong>in</strong> <strong>the</strong>ir cas gene contents. In<br />

particular, <strong>the</strong>ir group 2 cas genes range from those typical of crenarchaea to those<br />

common amongst bacteria. Putative genetic exchange between archaea and bacteria<br />

has generally been attributed to <strong>the</strong> methanoarchaea and haloarchaea thriv<strong>in</strong>g <strong>in</strong><br />

environments rich <strong>in</strong> bacteria.<br />

Cmr modules <strong>in</strong>variably coexist with, and are sometimes physically l<strong>in</strong>ked to,<br />

<strong>CRISPR</strong>/Cas modules but <strong>the</strong>y occur less widely than <strong>the</strong> latter. For archaea <strong>the</strong>y<br />

are found <strong>in</strong> about 70 % of genomes carry<strong>in</strong>g <strong>CRISPR</strong>/Cas modules, more prevalent<br />

than for <strong>CRISPR</strong>/Cas carry<strong>in</strong>g bacterial genomes (about 30 %). Both <strong>CRISPR</strong>/Cas<br />

modules and Cmr modules frequently occur <strong>in</strong> multiple copies <strong>in</strong> a given archaeal<br />

genome. cmr genes are ma<strong>in</strong>ly co-transcribed and <strong>the</strong>ir prote<strong>in</strong> products have been<br />

implicated <strong>in</strong> process<strong>in</strong>g of crRNAs and <strong>in</strong> <strong>the</strong> guid<strong>in</strong>g of crRNAs to target RNA of<br />

<strong>in</strong>vad<strong>in</strong>g genetic elements, whe<strong>the</strong>r viral RNA, transcripts, or both, rema<strong>in</strong>s unclear<br />

(Hale et al., 2009).<br />

Comparison of phylogenetic trees for <strong>the</strong> <strong>CRISPR</strong>/Cas and Cmr modules, based<br />

on archaeal and bacterial sequences of Cas1 and <strong>the</strong> Cmr2 prote<strong>in</strong>, and its homologs<br />

Csm1 and Csx11, revealed five major families of Cmr modules, named A to E,<br />

show<strong>in</strong>g dist<strong>in</strong>ctive gene syntenies (Garrett et al., 2010b).<br />

Given that Cmr and <strong>CRISPR</strong>/Cas modules are sometimes physically l<strong>in</strong>ked and<br />

can potentially be mobilised as a unit, and that <strong>the</strong>y have to recognise <strong>CRISPR</strong><br />

repeat sequences of similar sequence, it is likely that some degree of coevolution<br />

has occurred. In support of this idea, <strong>the</strong>re are many examples of family II <strong>CRISPR</strong>/<br />

Cas modules coexist<strong>in</strong>g with family D Cmr modules amongst <strong>the</strong> Sulfolobales and<br />

this relationship extends to o<strong>the</strong>r archaea <strong>in</strong>clud<strong>in</strong>g for example, <strong>the</strong> euryarchaeon<br />

Methanospirillum hungatei.<br />

Sizes of <strong>CRISPR</strong> loci vary from a s<strong>in</strong>gle spacer bordered by repeats to more<br />

than 100 spacer-repeat units (Lillestøl et al., 2006; Grissa et al., 2008). New spacerrepeat<br />

units are added at <strong>the</strong> leader-repeat junction and <strong>the</strong> <strong>CRISPR</strong> loci also<br />

undergo deletions of spacer-repeat units, probably via recomb<strong>in</strong>ation at <strong>the</strong> direct<br />

repeats, without impair<strong>in</strong>g <strong>the</strong> overall <strong>CRISPR</strong>/Cas functionality, and <strong>the</strong> deletions<br />

can range from one to several spacer-repeat units. Moreover, <strong>the</strong>re are also putative<br />

examples of duplications of spacer-repeat units, or small groups <strong>the</strong>reof, occurr<strong>in</strong>g<br />

and exchange between <strong>CRISPR</strong> loci with<strong>in</strong> a genome (Lillestøl et al., 2006; Lillestøl<br />

et al., 2009; Shah and Garrett, 2010; Gudbergsdottir et al., 2010).<br />

4 Development and Stability of <strong>CRISPR</strong> Loci<br />

<strong>CRISPR</strong> loci generally appear to be quite stable, gradually add<strong>in</strong>g spacer-repeat<br />

units at <strong>the</strong> junction with <strong>the</strong> leader, albeit at different rates for different loci with<strong>in</strong><br />

an organism. There is also a compensatory mechanism for gradual loss of <strong>in</strong>ternal<br />

spacers which probably <strong>in</strong>volves recomb<strong>in</strong>ation between <strong>the</strong> identical direct<br />

repeats of a given locus, and occasionally between loci carry<strong>in</strong>g identical repeats<br />

(Lillestøl et al., 2009;; Shah and Garrett, 2010). A specific example of such changes


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 61<br />

is illustrated <strong>in</strong> Figure 6, show<strong>in</strong>g <strong>the</strong> pairwise alignments of <strong>CRISPR</strong> locus A of<br />

Sulfolobus solfataricus stra<strong>in</strong>s P1, P2 and 98/2 where shared spacers are shaded, as<br />

well as spacers added adjacent to <strong>the</strong> leader region after <strong>the</strong>se stra<strong>in</strong>s diverged. The<br />

pattern of shared spacers for each pair of organisms demonstrate that stra<strong>in</strong> 98/2<br />

separated prior to <strong>the</strong> divergence of stra<strong>in</strong>s P1 and P2 which carry more common<br />

spacers. Those spacers which show significant matches to known genetic elements<br />

are also colour-coded (Figure 6A,B) <strong>in</strong>dicat<strong>in</strong>g a wide variety of matches especially<br />

to rudiviruses, bicaudaviruses and conjugative plasmids (Lillestøl et al., 2009).<br />

Earlier evidence suggested that <strong>CRISPR</strong> loci were strongly resistant to <strong>in</strong>tegrative<br />

events (Lillestøl et al., 2006). For example, three stra<strong>in</strong>s of S. solfataricus P1,<br />

P2 and 98, which carry multiple large <strong>CRISPR</strong> loci, <strong>in</strong> addition to locus A <strong>in</strong> Figure<br />

6. They are also extremely rich <strong>in</strong> active transposable elements (about 350 <strong>in</strong> stra<strong>in</strong><br />

P2) which have contributed to extensive genome shuffl<strong>in</strong>g (Brügger et al., 2004) but<br />

no IS <strong>in</strong>sertions were detected <strong>in</strong> <strong>the</strong> extensive <strong>CRISPR</strong> loci (Lillestøl et al., 2009;<br />

Shah and Garrett, 2010). Thus, although <strong>the</strong>y do occasionally occur <strong>in</strong>tergenically<br />

<strong>in</strong> <strong>the</strong> cas and cmr gene clusters, <strong>the</strong>re appears to be a strong selective pressure to<br />

ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong> loci which are essential for <strong>the</strong> function of both<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s. Whe<strong>the</strong>r this is a general rule for archaea<br />

or is dependent on environmental conditions, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> levels of viruses and<br />

plasmids present, is unclear. A different picture has emerged from bacterial studies.<br />

For example, <strong>in</strong> a biofilm carry<strong>in</strong>g acidophilic Leptospirillum group II bacteria,<br />

about 20 % of <strong>the</strong> partially sequenced <strong>CRISPR</strong> loci conta<strong>in</strong>ed IS elements (Tyson<br />

and Banfield, 2008).<br />

Many archaeal and bacterial chromosomes, with or without <strong>CRISPR</strong>/Cas modules,<br />

carry short <strong>CRISPR</strong>-like clusters lack<strong>in</strong>g associated leader regions and cas<br />

genes (Grissa et al., 2008). Although <strong>the</strong>ir orig<strong>in</strong>(s) rema<strong>in</strong> unknown, <strong>the</strong>y may<br />

have separated from <strong>in</strong>tact <strong>CRISPR</strong>/Cas modules, possibly via transposable elements.<br />

If preceded by promoters <strong>the</strong>ir transcripts can, <strong>in</strong> pr<strong>in</strong>ciple, be processed<br />

and activated. Such <strong>CRISPR</strong> loci are present <strong>in</strong> Sulfolobus conjugative plasmids<br />

pNOB8 and pKEF9 (She et al., 1998; Greve et al., 2004) and at least for <strong>the</strong> latter,<br />

Fig. 6. Pairwise comparison of <strong>the</strong> spacer-repeat units of <strong>CRISPR</strong> A locus of three closely<br />

related stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2. Shaded regions <strong>in</strong>dicate identical spacerrepeat<br />

units shared by two <strong>CRISPR</strong> loci. Colour-coded spacer-repeat units <strong>in</strong>dicate that spacers<br />

have significant sequence matches to <strong>the</strong> viruses or plasmid families <strong>in</strong>dicated on <strong>the</strong><br />

Figure


62<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

<strong>the</strong> spacer-repeat cluster is transcribed and RNA processed <strong>in</strong> a S. solfataricus host,<br />

suggest<strong>in</strong>g that at least some of <strong>the</strong>se small clusters can be activated and functional<br />

if complementary Cas or Cmr prote<strong>in</strong>s are present (Lillestøl et al., 2009; Shah and<br />

Garrett, 2010).<br />

5 Mobility of <strong>CRISPR</strong>/Cas and Cmr Modules<br />

Genomic analyses of closely related Sulfolobus species have provided strong evidence<br />

for <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g mobilised given that <strong>the</strong>y occur at different<br />

genomic positions even when <strong>the</strong>re is high level of gene synteny present and <strong>the</strong>y<br />

are generally conf<strong>in</strong>ed to <strong>the</strong> variable genetic regions (Shah and Garrett, 2010).<br />

Their ability to transfer between organisms is also supported by <strong>the</strong> different comb<strong>in</strong>ations<br />

of <strong>CRISPR</strong>/Cas families found <strong>in</strong> closely related organisms (Lillestøl<br />

et al., 2009; Shah and Garrett, 2010). For example, <strong>in</strong> S. islandicus stra<strong>in</strong>s HVE10/4<br />

and REY15A, <strong>the</strong> former carries family I and III <strong>CRISPR</strong>/Cas modules and one<br />

Cmr module, while <strong>the</strong> latter exhibits a family I <strong>CRISPR</strong>/Cas module and two family<br />

B Cmr modules (Shah and Garrett, 2010; Guo et al., 2010). Fur<strong>the</strong>r support for<br />

such transfer was provided by analysis of <strong>the</strong> Pyrococcus furiosus genome where<br />

a 155 kb fragment bordered by a <strong>CRISPR</strong> locus and a repeat show<strong>in</strong>g significantly<br />

different properties of G+C content, third codon position and codon usage from <strong>the</strong><br />

rest of <strong>the</strong> genome (Portillo and Gonzalez, 2009).<br />

Evidence for gene exchange with<strong>in</strong> <strong>the</strong> <strong>CRISPR</strong>/Cas modules derived from<br />

exam<strong>in</strong>ation of <strong>the</strong> structural <strong>in</strong>tegrities of <strong>the</strong> paired family I <strong>CRISPR</strong>/Cas modules<br />

of several closely related Sulfolobus stra<strong>in</strong>s. The results <strong>in</strong>dicated that <strong>the</strong><br />

<strong>in</strong>ternal group 1 cas genes, which are functionally implicated <strong>in</strong> spacer addition at<br />

<strong>the</strong> leader-repeat junction (Figure 4) seem to coevolve, and be mobilised, with <strong>the</strong><br />

<strong>CRISPR</strong> locus whereas <strong>the</strong> group 2 cas genes, putatively <strong>in</strong>volved <strong>in</strong> RNA process<strong>in</strong>g<br />

and crRNA mobility (Figure 4), were reta<strong>in</strong>ed with<strong>in</strong> <strong>the</strong> stra<strong>in</strong>s, suggest<strong>in</strong>g that<br />

some exchange with<strong>in</strong> cas gene cassettes can occur (Shah and Garrett, 2010).<br />

The mechanism(s) of transfer of <strong>CRISPR</strong>/Cas modules, vary<strong>in</strong>g <strong>in</strong> size from<br />

about 7 kb to 25 kb, rema<strong>in</strong>s unclear. The larger <strong>CRISPR</strong>/Cas modules, at least,<br />

may be too large to be borne on <strong>the</strong> plasmids as has been proposed for bacteria<br />

(Godde and Bickerton, 2006). At least for <strong>the</strong> crenarchaea, genetic elements are<br />

relatively small and, although small <strong>CRISPR</strong> loci have been detected <strong>in</strong> crenarchaeal<br />

conjugative plasmids, transfer is more likely to result from chromosomal<br />

conjugation which may well be facilitated by <strong>in</strong>tegrated conjugative plasmids (Lillestøl<br />

et al., 2009).<br />

6 Targets of <strong>the</strong> <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Systems<br />

Bio<strong>in</strong>formatic evidence <strong>in</strong>dicated that <strong>the</strong> spacer crRNAs carry<strong>in</strong>g significant<br />

sequence matches to <strong>the</strong> protospacer sequence were complementary to ei<strong>the</strong>r strand


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 63<br />

of genes imply<strong>in</strong>g that <strong>the</strong>y were not exclusively target<strong>in</strong>g mRNAs (Lillestøl et al.,<br />

2006). Moreover, extensive analyses of significant matches to <strong>the</strong> many known<br />

viruses and plasmids of <strong>the</strong> Sulfolobales revealed several matches to protospacers<br />

ly<strong>in</strong>g between genes. They demonstrated, fur<strong>the</strong>r, that <strong>the</strong> locations of <strong>the</strong> protospacers<br />

were randomly distributed along, and on ei<strong>the</strong>r strand of, <strong>the</strong> genetic elements.<br />

This is illustrated <strong>in</strong> Figure 7 for five crenarchaeal viruses and two plasmids,<br />

where <strong>the</strong> positions of <strong>the</strong> significant matches are shown <strong>in</strong> relation to <strong>the</strong> annotated<br />

gene locations. A similar conclusion that DNA, and not mRNA, was targeted by <strong>the</strong><br />

Fig. 7. Significant <strong>CRISPR</strong> spacer matches to protospacer sequences are superimposed on<br />

genomes of <strong>the</strong> follow<strong>in</strong>g representative viruses and plasmids: SIRV1 – rudiviruses, AFV9 –<br />

betalipothrixviruses, SSV2 – fuselloviruses, STIV turreted icosahedral viruses, ATV – bicaudavirus,<br />

pNOB8 – conjugative plasmids. and pHEN7 – cryptic plasmids where circular<br />

genomes (SSV2, STIV, ATV, pNOB8 and pHEN7) are presented <strong>in</strong> a l<strong>in</strong>ear form. Prote<strong>in</strong><br />

cod<strong>in</strong>g regions are boxed and shaded, as <strong>in</strong>dicated on <strong>the</strong> Figure, accord<strong>in</strong>g <strong>the</strong>ir levels of<br />

conservation for those genomes. No comparative genomic data were used for ATV. Spacer<br />

sequence matches are <strong>in</strong>dicated by l<strong>in</strong>es above and below <strong>the</strong> genomes for <strong>the</strong> two DNA<br />

strands and <strong>the</strong>y are colour-coded accord<strong>in</strong>g to whe<strong>the</strong>r <strong>the</strong>y occur exclusively at a nucleotide<br />

level (red) or additionally at an am<strong>in</strong>o acid level (green). Significant spacer matches<br />

were found by sett<strong>in</strong>g an e-value cut off correspond<strong>in</strong>g to a 10 % false positive ratio, which<br />

was estimated by us<strong>in</strong>g <strong>the</strong> genome of S. acidocaldarius as a negative control (Chen et al.,<br />

2005). These data are updated from an earlier study (Shah et al., 2009)


64<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

<strong>CRISPR</strong>/Cas <strong>system</strong> of <strong>the</strong> bacterium Staphylococcus epidermidis was achieved<br />

experimentally (Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008).<br />

However, more recently it was demonstrated that crRNAs complexed with Cmr<br />

prote<strong>in</strong>s target RNA carry<strong>in</strong>g match<strong>in</strong>g protospacers (Hale et al., 2009) but it is still<br />

unclear whe<strong>the</strong>r this <strong>in</strong>cludes both mRNAs and viral RNAs. For archaea, this will<br />

only be resolved when <strong>the</strong> first archaeal RNA viruses have been characterised.<br />

A few sequence matches have been detected between archaeal <strong>CRISPR</strong> spacers<br />

and IS elements suggest<strong>in</strong>g that <strong>CRISPR</strong>/Cas <strong>system</strong> can target transposable elements<br />

(Lillestøl et al., 2006; Held and Whitaker, 2009; Mojica et al., 2009; Shah<br />

et al., 2009). However, most of those reported can be attributed to transposase<br />

genes carried on viral genomes or plasmids, <strong>in</strong>clud<strong>in</strong>g, for example, spacer matches<br />

to each of <strong>the</strong> four transposase genes of <strong>the</strong> bicaudavirus ATV (Figure 7) but <strong>the</strong>se<br />

transposase genes/IS elements are presumably <strong>in</strong>dist<strong>in</strong>guishable from any o<strong>the</strong>r<br />

viral/plasmid genomic target if <strong>the</strong>y carry appropriate PAM motifs adjacent to protospacer<br />

sites.<br />

7 Formation of crRNAs and Target<strong>in</strong>g of Foreign Elements<br />

The few archaeal <strong>CRISPR</strong> loci that have been tested experimentally for transcription,<br />

<strong>in</strong>clud<strong>in</strong>g some lack<strong>in</strong>g <strong>in</strong>tact leader regions, produced processed transcripts<br />

(Tang et al., 2002; Tang et al., 2005; Lillestøl et al., 2006; Carte et al., 2008; Lillestøl<br />

et al., 2009). Sulfolobus acidocaldarius carries five <strong>CRISPR</strong> loci with sizes of<br />

133, 78, 11, 5 and 2 spacer-repeat units. For <strong>the</strong> four smaller clusters, whole length<br />

transcripts were detected experimentally and for locus-78, <strong>the</strong> maximum transcript<br />

size of about 5000 nt, exceeded <strong>the</strong> size of <strong>the</strong> 4930 bp <strong>CRISPR</strong> locus, consistent<br />

with <strong>the</strong> whole transcript extend<strong>in</strong>g from with<strong>in</strong> <strong>the</strong> leader region and term<strong>in</strong>at<strong>in</strong>g<br />

downstream from <strong>the</strong> locus (Lillestøl et al., 2006; Lillestøl et al., 2009). However,<br />

a large fraction of <strong>the</strong> transcripts also fell <strong>in</strong> <strong>the</strong> size range 3000–3500 nt suggest<strong>in</strong>g<br />

that endogenous degradation, premature term<strong>in</strong>ation or process<strong>in</strong>g had occurred<br />

towards <strong>the</strong> 3’-end of <strong>the</strong> transcript. Given that promoter and term<strong>in</strong>ator motifs will<br />

be randomly taken up <strong>in</strong> spacers of <strong>CRISPR</strong> loci (Shah et al., 2009), <strong>the</strong>re must be<br />

some form of transcriptional regulation to ensure <strong>the</strong> formation of whole <strong>CRISPR</strong><br />

transcript from <strong>the</strong> foreign genetic elements, possibly <strong>in</strong>volv<strong>in</strong>g <strong>the</strong> Sulfolobus<br />

<strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (Peng et al., 2003).<br />

In <strong>the</strong> euryarchaeon P. furiosus and <strong>in</strong> Escherichia coli RNA transcripts are processed<br />

with<strong>in</strong> repeats, 8 nt from <strong>the</strong> spacer start by <strong>the</strong> Cas6-type endonuclease.<br />

The process<strong>in</strong>g of <strong>the</strong> 3’-end is less clear but for P. furiosus it occurs at two sites<br />

with<strong>in</strong> <strong>the</strong> spacer, at 5 nt and 11 nt from <strong>the</strong> 3’-end of <strong>the</strong> spacer sequence. Complexes<br />

of Cas or Cmr prote<strong>in</strong>s guide <strong>the</strong> mature crRNAs to <strong>the</strong>ir targets (Brouns<br />

et al., 2008; Hale et al., 2009). Anneal<strong>in</strong>g of <strong>the</strong> spacer sequence of <strong>the</strong> crRNA to<br />

<strong>the</strong> protospacer of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g element is crucial for <strong>the</strong> recognition and <strong>in</strong>activation<br />

of <strong>the</strong> target. For <strong>the</strong> bacterium Streptococcus <strong>the</strong>rmophilus it was claimed that<br />

100 % sequence match<strong>in</strong>g between <strong>the</strong> crRNAs and protospacer RNAs was essential<br />

for target <strong>in</strong>activation (Barrangou et al., 2007; Horvath and Barrangou, 2010).


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 65<br />

However, for S. solfataricus and Sulfolobus islandicus <strong>the</strong> requirements appear to<br />

be much less str<strong>in</strong>gent because even with 3 mismatches between crRNA and protospacer<br />

target<strong>in</strong>g was still effective (Gudbergsdottir et al., 2010).<br />

There may also be differences between some archaea and bacteria <strong>in</strong> <strong>the</strong> role<br />

of <strong>the</strong> family specific Protospacer-Associated Motif (PAM) complementary to part<br />

of <strong>the</strong> 5’-repeat sequence of <strong>the</strong> crRNA which, <strong>in</strong> Sulfolobus species constitutes a<br />

conserved d<strong>in</strong>ucleotide (Lillestøl et al., 2009). For S. islandicus it was shown that<br />

alter<strong>in</strong>g <strong>the</strong> PAM motif <strong>in</strong>hibited protospacer target<strong>in</strong>g (Gudbergsdottir et al., 2010)<br />

whereas for <strong>the</strong> bacterium Staphylococcus epidermidis it was concluded that any<br />

sequence mismatch with <strong>the</strong> 5’-end of <strong>the</strong> crRNA ensured protospacer target<strong>in</strong>g and<br />

that sequence complementarity to <strong>the</strong> PAM motif was not essential (Marraff<strong>in</strong>i and<br />

Son<strong>the</strong>imer, 2010).<br />

The <strong>CRISPR</strong>-like locus of pKEF9 lacks an associated cas cassette and leader<br />

region but when transformed <strong>in</strong>to S. solfataricus P2 it produced transcripts cover<strong>in</strong>g<br />

<strong>the</strong> whole <strong>CRISPR</strong> locus <strong>in</strong>itiat<strong>in</strong>g 32 bp upstream from <strong>the</strong> first repeat and<br />

<strong>the</strong>se were found to be processed. Process<strong>in</strong>g sites were detected with<strong>in</strong> each repeat<br />

spacer unit but some of <strong>the</strong> sites occurred with<strong>in</strong> <strong>the</strong> spacer. At <strong>the</strong> time it was<br />

presumed that some <strong>in</strong>accurate process<strong>in</strong>g had occurred, possibly reflect<strong>in</strong>g mismatches<br />

occurr<strong>in</strong>g between <strong>the</strong> plasmid repeat sequence and <strong>the</strong> host Cas prote<strong>in</strong>s<br />

(Lillestøl et al., 2009), but it was not known <strong>the</strong>n that Cmr prote<strong>in</strong>s process with<strong>in</strong><br />

<strong>the</strong> 3’-ends of spacers (Hale et al., 2009).<br />

In contrast to reports on <strong>the</strong> euryarchaeal <strong>CRISPR</strong> transcripts (Carte et al., 2008)<br />

and a bacterium (Brouns et al., 2008) transcripts were detected from both DNA<br />

strands of each of <strong>the</strong> five <strong>CRISPR</strong> loci of S. acidocaldarius (Lillestøl et al., 2006;;<br />

Lillestøl et al., 2009). The largest <strong>CRISPR</strong> locus Saci-133 was probed aga<strong>in</strong>st spacer<br />

sequences distributed along <strong>the</strong> cluster and each yielded clear signals <strong>in</strong> Nor<strong>the</strong>rn<br />

analyses. The smallest processed products <strong>in</strong> <strong>the</strong> size range 55–60 nt were larger<br />

than those of leader strand crRNAs and were less regularly processed. These small<br />

RNAs were observed for all five S. acidocaldarius repeat-clusters and must conta<strong>in</strong><br />

most or all of <strong>the</strong> spacer sequence because <strong>the</strong> correspond<strong>in</strong>g band was not detected<br />

when <strong>the</strong> spacer probe was replaced by a repeat probe. Whe<strong>the</strong>r <strong>the</strong>se have a role <strong>in</strong><br />

protect<strong>in</strong>g <strong>the</strong> mature crRNAs when <strong>the</strong>re are no <strong>in</strong>vad<strong>in</strong>g elements present rema<strong>in</strong>s<br />

unclear.<br />

8 Anti <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Systems<br />

Examples have been recorded of archaeal <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g lost from<br />

genomes. For example, a variant stra<strong>in</strong> of S. solfataricus P2 (stra<strong>in</strong> P2A) was characterised<br />

that had lost four closely l<strong>in</strong>ked <strong>CRISPR</strong>/Cas modules, A to D, apparently<br />

via a s<strong>in</strong>gle recomb<strong>in</strong>ation event between border<strong>in</strong>g IS elements (Redder and Garrett,<br />

2006). Border<strong>in</strong>g IS elements also have <strong>the</strong> potential to generate transposons<br />

carry<strong>in</strong>g whole <strong>CRISPR</strong>/Cas or Cmr modules. Possibly this loss reflects S. solfataricus<br />

P2A be<strong>in</strong>g a laboratory stra<strong>in</strong> where <strong>the</strong> <strong>immune</strong> <strong>system</strong> had become an unnecessary<br />

burden on <strong>the</strong> cell’s energy resources <strong>in</strong> <strong>the</strong> absence of <strong>in</strong>vad<strong>in</strong>g genetic ele-


66<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

ments and this may be analogous to <strong>the</strong> many bacterial endosymbionts which lack<br />

functional <strong>CRISPR</strong>/Cas <strong>system</strong>s (Grissa et al., 2008; Mojica et al., 2009).<br />

There are also examples of viruses, which circumvent or <strong>in</strong>terfere with <strong>the</strong><br />

<strong>CRISPR</strong> <strong>system</strong>s. Some members of <strong>the</strong> viral families Rudiviridae and Lipothrixviridae,<br />

carry 12 bp <strong>in</strong>dels, probably deletions, <strong>in</strong> <strong>the</strong>ir genomes often ly<strong>in</strong>g with<strong>in</strong>,<br />

but not disrupt<strong>in</strong>g, open read<strong>in</strong>g frames (Peng et al., 2004; Vestergaard et al., 2008).<br />

Although <strong>the</strong> function of <strong>the</strong>se elements is unknown <strong>the</strong>y may be generated <strong>in</strong><br />

response to <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s to avoid crRNA target<strong>in</strong>g. The<br />

presence of multiple recomb<strong>in</strong>ation sites <strong>in</strong> some archaeal viruses and conjugative<br />

plasmids may also facilitate genomic rearrangements and sequence changes (Greve<br />

et al., 2004; Garrett et al., 2010a).<br />

Analysis of <strong>the</strong> genome of S. islandicus stra<strong>in</strong> M.16.4 isolated <strong>in</strong> Kamchatka,<br />

Russia (Reno et al., 2009), revealed <strong>the</strong> presence of a more direct viral <strong>in</strong>terference<br />

where an M164 provirus 1, has <strong>in</strong>tegrated <strong>in</strong>to, and disrupted, <strong>the</strong> csa3 gene encod<strong>in</strong>g<br />

a putative transcriptional regulator of <strong>the</strong> group 1 cas genes (Figure 8). The<br />

<strong>in</strong>sertion event seems to be recent s<strong>in</strong>ce <strong>the</strong> truncated parts of <strong>the</strong> csa3 gene show<br />

high sequence similarity to genes of closely related species, and it may be reversible.<br />

The closely related stra<strong>in</strong> M.16.27 carries and <strong>in</strong>tact csa3 gene (Figure 8A) but<br />

also, unlike stra<strong>in</strong> M.16.4, carries a <strong>CRISPR</strong> spacer sequence perfectly match<strong>in</strong>g<br />

<strong>the</strong> provirus.<br />

a M1627<br />

b<br />

c<br />

M164<br />

attL<br />

attR<br />

csa1 cas1 cas2 cas4 csa3<br />

csa1 cas1 cas2 cas4<br />

GTAAATTTTCTTCTGCACAGAAAGAAGAT----------AATCTT<br />

CGAAA----CTTCTGCACAGAAAGAGTATTTGACGTCAAAACATT<br />

*** **************** ** **<br />

sugar b<strong>in</strong>d<strong>in</strong>g<br />

<strong>in</strong>tegrase<br />

phospholipase D<br />

M164 provirus I<br />

13,908 bp<br />

DNA primase/polymerase<br />

csa provirus 3<br />

attL attR<br />

Fig. 8. An example of a cas gene<br />

cassette that has been <strong>in</strong>activated<br />

<strong>in</strong> <strong>the</strong> gene for <strong>the</strong> putative transcriptional<br />

regulator csa3 of S.<br />

islandicus M.16.4 by <strong>the</strong> <strong>in</strong>tegration<br />

of an M164 provirus 1.<br />

a) Stra<strong>in</strong> M.16.27, lack<strong>in</strong>g <strong>the</strong><br />

<strong>in</strong>tegrated provirus, carries a<br />

<strong>CRISPR</strong> spacer with a perfect<br />

match to <strong>the</strong> provirus whereas S.<br />

islandicus M.16.4, carry<strong>in</strong>g <strong>the</strong><br />

<strong>in</strong>tegrated provirus, conta<strong>in</strong>s no<br />

spacer sequence match<strong>in</strong>g <strong>the</strong><br />

proviral sequence.<br />

b) The <strong>in</strong>tegration att site <strong>in</strong> <strong>the</strong><br />

csa3 gene.<br />

c) Gene map of <strong>the</strong> <strong>in</strong>tegrated<br />

provirus show<strong>in</strong>g some predicted<br />

functional assignments


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 67<br />

9 Evolutionary Considerations<br />

The view that archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s are closely related<br />

has prevailed s<strong>in</strong>ce <strong>the</strong>ir discovery and was underp<strong>in</strong>ned by <strong>the</strong> similar order<strong>in</strong>g<br />

of spacer-repeat units <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci and by extensive sequence similarities<br />

between Cas prote<strong>in</strong>s (Haft et al., 2005; Godde and Bickerton, 2006; Makarova<br />

et al., 2006). This view has been re<strong>in</strong>forced by <strong>the</strong> shared mechanism of elongation<br />

of <strong>CRISPR</strong> loci at <strong>the</strong> leader-repeat junction as well as similarities <strong>in</strong> <strong>the</strong> process<strong>in</strong>g<br />

mechanisms of crRNAs <strong>in</strong> both Doma<strong>in</strong>s (Tang et al., 2002; Tang et al., 2005;<br />

Brouns et al., 2008; Hale et al., 2008; Hale et al., 2009). Never<strong>the</strong>less, <strong>the</strong>re are<br />

dist<strong>in</strong>ctive features. <strong>CRISPR</strong>/Cas modules are more common amongst archaea and<br />

tend to be larger, structurally more complex and more labile (Lillestøl et al., 2006;<br />

Grissa et al., 2008; Shah and Garrett, 2010). Many repeat sequences show a bias to<br />

archaea or bacteria <strong>CRISPR</strong> loci, and many archaeal repeats lack <strong>in</strong>verted repeats<br />

common to those of bacteria suggest<strong>in</strong>g that different RNA process<strong>in</strong>g signals occur<br />

with<strong>in</strong> transcript repeats (Lillestøl et al., 2006; Kun<strong>in</strong> et al., 2007). Moreover, many<br />

crenarchaea encode <strong>the</strong> <strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> of elusive function (Peng<br />

et al., 2003).<br />

Phylogenetic analyses imply that periodic <strong>in</strong>ter-Doma<strong>in</strong> exchange of <strong>CRISPR</strong>/<br />

Cas modules has occurred (Haft et al., 2005; Godde and Bickerton, 2006; Makarova<br />

et al., 2006). Clearly, cross<strong>in</strong>g Doma<strong>in</strong> boundaries would be a very complex process<br />

given <strong>the</strong> basic differences <strong>in</strong> <strong>the</strong> transcriptional and translational mechanisms of<br />

archaea and bacteria (Torar<strong>in</strong>sson et al., 2005; Santangelo et al., 2009). Moreover,<br />

conjugal DNA transfer would also have to overcome <strong>the</strong> major barriers of different<br />

membrane and cell wall structures, and different conjugative <strong>system</strong>s, of archaea<br />

and bacteria (Greve et al., 2004; Veith et al., 2009). Never<strong>the</strong>less, coevolution<br />

of archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s would only require cross doma<strong>in</strong><br />

events to succeed rarely. The more archaea-specific components may be associated<br />

with <strong>system</strong>s that have evolved <strong>in</strong> environments of high temperature, extremes of<br />

pH, or hypersal<strong>in</strong>e conditions where levels of bacteria are relatively low, which is<br />

also supported by <strong>the</strong> cas gene compositions of different <strong>CRISPR</strong>/Cas families.<br />

O<strong>the</strong>r mechanistic differences may surface as <strong>the</strong> different <strong>system</strong>s are studied <strong>in</strong><br />

more depth. Importantly, however, crenarchaeal viruses have radically different<br />

virus-host relationships from those of bacteria that may require altered responses<br />

from <strong>the</strong> <strong>immune</strong> <strong>system</strong>s (Prangishvili et al., 2006a; Bize et al., 2009) and it is<br />

likely that <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s have ma<strong>in</strong>ta<strong>in</strong>ed and/or undergone<br />

Doma<strong>in</strong>-specific adaptations dur<strong>in</strong>g evolution.<br />

Small <strong>in</strong>terference RNA <strong>system</strong>s (siRNA) are widespread <strong>in</strong> eukarya where <strong>the</strong>y<br />

have multiple roles <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> discrim<strong>in</strong>ation and target<strong>in</strong>g of “foreign” genetic<br />

elements <strong>in</strong>clud<strong>in</strong>g viruses and transposons (Hannon 2002; J<strong>in</strong>ek and Doudna,<br />

2009). There are broad mechanistic parallels between <strong>the</strong>se eukaryal siRNA <strong>system</strong>s<br />

and <strong>the</strong> DNA- and RNA-target<strong>in</strong>g <strong>CRISPR</strong> <strong>system</strong>s. They all have to dist<strong>in</strong>guish<br />

foreign DNA from self-DNA, and target nucleic acids which show little<br />

sequence similarity and can undergo cont<strong>in</strong>ual sequence change. However, whereas<br />

<strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s employ ssRNAs for target<strong>in</strong>g foreign elements, <strong>the</strong> eukaryal


68<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

anti-viral <strong>system</strong>s generate small 21–22 bp dsRNAs for target<strong>in</strong>g viruses which are<br />

subsequently converted to ssRNAs by an Argonaute prote<strong>in</strong>-RISC complex.<br />

The closest parallel to <strong>the</strong> crRNAs and <strong>CRISPR</strong> loci amongst <strong>the</strong> eukaryal siRNA<br />

<strong>system</strong>s are <strong>the</strong> Argonaute Piwi-<strong>in</strong>teract<strong>in</strong>g RNAs (piRNAs) directly processed<br />

from large transcripts of piRNA clusters which are rich <strong>in</strong> transposons and repeatsequence<br />

elements and, as for <strong>the</strong> <strong>CRISPR</strong> loci, occur at specific chromosomal sites<br />

(Lillestøl et al., 2009; Karg<strong>in</strong>ov and Hannon, 2010). This eukaryal <strong>system</strong> probably<br />

plays a role <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g germl<strong>in</strong>e <strong>in</strong>tegrity and development (Arav<strong>in</strong> et al., 2007;<br />

Klattenhoff and Theurkauf, 2008). As for <strong>CRISPR</strong> loci, <strong>the</strong> piRNA clusters <strong>in</strong>crease<br />

<strong>the</strong>ir <strong>in</strong>formational capacity by <strong>the</strong> <strong>in</strong>sertion of transposon sequences which provide<br />

novel sequence content and are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> <strong>the</strong> piRNA clusters by selection.<br />

Thus, cont<strong>in</strong>ual expansion of piRNA clusters occurs, as for <strong>CRISPR</strong> loci, but <strong>the</strong><br />

process is passive ra<strong>the</strong>r than directed. Moreover, as for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>,<br />

<strong>the</strong> newly <strong>in</strong>corporated DNA derives exclusively from genetic elements that are to<br />

be targeted. No homologous prote<strong>in</strong>s have been detected from sequence analyses<br />

between prote<strong>in</strong>s of <strong>the</strong> eukaryal siRNA <strong>system</strong>s and those of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>,<br />

although similarities may appear at a tertiary structural level.<br />

10 Conclusions<br />

The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>ery provide an effective<br />

defence aga<strong>in</strong>st foreign genetic elements <strong>in</strong> archaea and some bacteria. The <strong>system</strong><br />

is dynamic and hereditable, although <strong>the</strong> benefit for <strong>the</strong> cell <strong>in</strong> evolutionary terms is<br />

transitional because DNA from extra chromosomal elements taken up as spacers <strong>in</strong><br />

<strong>CRISPR</strong> loci, have a rapid turnover and are lost aga<strong>in</strong> via recomb<strong>in</strong>ation at repeats<br />

and/or transpositional events. Current evidence suggests that <strong>CRISPR</strong>/Cas and Cmr<br />

modules can behave like <strong>in</strong>tegral genetic elements. They tend to be located <strong>in</strong> <strong>the</strong><br />

most variable regions of chromosomes, sometimes physically l<strong>in</strong>ked, and are frequently<br />

displaced as a result of genome shuffl<strong>in</strong>g, <strong>in</strong>clud<strong>in</strong>g possibly transposition<br />

of whole modules. <strong>CRISPR</strong> loci may be broken up, and dispersed, <strong>in</strong> chromosomes<br />

with <strong>the</strong> potential for creat<strong>in</strong>g genetic novelty. Small leaderless <strong>CRISPR</strong>-like loci<br />

are commonly found <strong>in</strong> chromosomes, and <strong>in</strong> plasmids, and some can be transcribed<br />

and processed and <strong>the</strong>refore constitute potentially functional accessories to<br />

<strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s. Both <strong>CRISPR</strong>/Cas and Cmr modules appear<br />

to exchange readily between closely related organisms, possibly via chromosomal<br />

conjugation, where <strong>the</strong>y may be subjected to strong selective pressure. While universal<br />

phylogenetic trees based on <strong>the</strong> Cas1 and Cmr2 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/<br />

Cas and CMR modules, respectively, suggest that transfers between archaea and<br />

bacteria have occurred, <strong>the</strong> relatively large number of archaea-specific Cas/Cmr<br />

prote<strong>in</strong>s suggests that <strong>the</strong>se may have been very rare events, consistent with <strong>the</strong><br />

<strong>in</strong>compatibility of <strong>the</strong> transcriptional, translational and conjugative <strong>system</strong>s of <strong>the</strong><br />

two Doma<strong>in</strong>s (Shah and Garrett, 2010). Parallels to <strong>the</strong> eukaryal siRNAs exist, and<br />

especially germ cell piRNAs which are also directed by effector prote<strong>in</strong>s to silence<br />

or destroy <strong>in</strong>vad<strong>in</strong>g foreign DNA and transposons.


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 69<br />

References<br />

Arav<strong>in</strong> AA, Hannon GJ, Brennecke J (2007) The Piwi-piRNA pathway provides an adaptive<br />

defense <strong>in</strong> <strong>the</strong> transposon arms race. Science 318: 761–764<br />

Arnold HP, She Q, Phan H, Stedman K, Prangishvili D, Holz I et al. (1999) The genetic element<br />

pSSVx of <strong>the</strong> extremely <strong>the</strong>rmophilic crenarchaeon Sulfolobus is a hybrid between a plasmid<br />

and a virus. Mol Microbiol 34: 217–226<br />

Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Mo<strong>in</strong>eau S et al. (2007) <strong>CRISPR</strong><br />

provides acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315: 1709–1712<br />

Basta T, Smyth J, Forterre P, Prangishvili D, Peng X (2009) Novel archaeal plasmid pAH1 and its<br />

<strong>in</strong>teractions with <strong>the</strong> lipothrixvirus AFV1. Mol Microbiol 71: 23–34<br />

Bettstetter M, Peng X, Garrett RA, Prangishvili D (2003) AFV1, a novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic<br />

archaea of <strong>the</strong> genus Acidianus. Virology 315: 68–79<br />

Bize A, Karlsson EA, Ekefjard K, Quax TE, P<strong>in</strong>a M, Prevost MC et al. (2009) A unique virus<br />

release mechanism <strong>in</strong> <strong>the</strong> <strong>Archaea</strong>. Proc Natl Acad Sci U S A 106: 11306–11311<br />

Bize A, Peng X, Prokofeva M, Maclellan K, Lucas S, Forterre P et al. (2008) Viruses <strong>in</strong> acidic<br />

geo<strong>the</strong>rmal environments of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159: 358–366<br />

Bolot<strong>in</strong> A, Qu<strong>in</strong>quis B, Sorok<strong>in</strong> A, Ehrlich SD (2005) Clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome<br />

repeats (<strong>CRISPR</strong>s) have spacers of extrachromosomal orig<strong>in</strong>. Microbiology 151:<br />

2551–2561<br />

Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP et al. (2008) Small<br />

<strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes. Science 321: 960–964<br />

Brügger K, Torar<strong>in</strong>sson E, Redder P, Chen L, Garrett RA (2004) Shuffl<strong>in</strong>g of Sulfolobus genomes<br />

by autonomous and non-autonomous mobile elements. Biochem Soc Trans 32: 179–183<br />

Carte J, Wang R, Li H, Terns RM, Terns MP (2008) Cas6 is an endoribonuclease that generates<br />

guide RNAs for <strong>in</strong>vader defense <strong>in</strong> prokaryotes. Genes Dev 22: 3489–3496<br />

Chen L, Brugger K, Skovgaard M, Redder P, She Q, Torar<strong>in</strong>sson E et al. (2005) The genome of<br />

Sulfolobus acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota. J Bacteriol 187: 4992–<br />

4999<br />

Cortez D, Forterre P, Gribaldo S (2009) A hidden reservoir of <strong>in</strong>tegrative elements is <strong>the</strong> major<br />

source of recently acquired foreign genes and ORFans <strong>in</strong> archaeal and bacterial genomes.<br />

Genome Biol 10: R65<br />

Garrett RA, Prangishvili D, Shah SA, Reuter M, Stetter KO, Peng X (2010a) Metagenomic analyses<br />

of novel viruses and plasmids from a cultured environmental sample of hyper<strong>the</strong>rmophilic<br />

neutrophiles. Environ Microbiol doi: 10.1111/j.1462–2920.2010.02266.x<br />

Garrett RA, Shah SA, Vestergaard G, Deng L, Gudbergsdottir S, Kenchappa CS et al. (2010b)<br />

<strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales – complexity and diversity. Biochem<br />

Soc Trans <strong>in</strong> press<br />

Godde JS and Bickerton A (2006) The repetitive DNA elements called <strong>CRISPR</strong>s and <strong>the</strong>ir associated<br />

genes: evidence of horizontal transfer among prokaryotes. J Mol Evol 62: 718–729<br />

Greve B, Jensen S, Brugger K, Zillig W, Garrett RA (2004) Genomic comparison of archaeal conjugative<br />

plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239<br />

Grissa I, Vergnaud G, Pourcel C (2008) <strong>CRISPR</strong>compar: a website to compare clustered regularly<br />

<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats. Nucleic Acids Res 36: W145-W148<br />

Gudbergsdottir S, Deng L, Chen Z, Jensen JVK, Jensen LR, She Q et al. (2010) Dynamic properties<br />

of <strong>the</strong> Sulfolobus <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vectorborne<br />

viral and plasmid genes and protospacers. Mol Microbiol under review<br />

Guo L, Brugger K, Chao Liu C, Shah SA, Zheng H, Zhu Y et al. (2010) Comparative genomics of<br />

two stra<strong>in</strong>s of Sulfolobus islandicus from Iceland: Hosts for study<strong>in</strong>g crenarchaeal genetics<br />

and virus life cycles. submitted<br />

Haft DH, Selengut J, Mongod<strong>in</strong> EF, Nelson KE (2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas)<br />

prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong> prokaryotic genomes. PLoS<br />

Comput Biol 1: e60


70<br />

<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />

Hale C, Kleppe K, Terns RM, Terns MP (2008) Prokaryotic silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus<br />

furiosus. RNA 14: 2572–2579<br />

Hale CR, Zhao P, Olson S, Duff MO, Graveley BR, Wells L et al. (2009) RNA-guided RNA cleavage<br />

by a <strong>CRISPR</strong> RNA-Cas prote<strong>in</strong> complex. Cell 139: 945–956<br />

Hannon GJ (2002) RNA <strong>in</strong>terference. Nature 418: 244–251<br />

Held NL and Whitaker RJ (2009) Viral biogeography revealed by signatures <strong>in</strong> Sulfolobus islandicus<br />

genomes. Environ Microbiol 11: 457–466<br />

Horvath P and Barrangou R (2010) <strong>CRISPR</strong>/Cas, <strong>the</strong> <strong>immune</strong> <strong>system</strong> of bacteria and archaea.<br />

Science 327: 167–170<br />

Jansen R, Embden JD, Gaastra W, Schouls LM (2002) Identification of genes that are associated<br />

with DNA repeats <strong>in</strong> prokaryotes. Mol Microbiol 43: 1565–1575<br />

J<strong>in</strong>ek M and Doudna JA (2009) A three-dimensional view of <strong>the</strong> molecular mach<strong>in</strong>ery of RNA<br />

<strong>in</strong>terference. Nature 457: 405–412<br />

Karg<strong>in</strong>ov FV and Hannon GJ (2010) The <strong>CRISPR</strong> <strong>system</strong>: small RNA-guided defense <strong>in</strong> bacteria<br />

and archaea. Mol Cell 37: 7–19<br />

Klattenhoff C and Theurkauf W (2008) Biogenesis and germl<strong>in</strong>e functions of piRNAs. Development<br />

135: 3–9<br />

Krupovic M, Forterre P, Bamford DH (2010) Comparative analysis of <strong>the</strong> mosaic genomes of<br />

tailed archaeal viruses and proviruses suggests common <strong>the</strong>mes for virion architecture and<br />

assembly with tailed viruses of bacteria. J Mol Biol 397: 144–160<br />

Kun<strong>in</strong> V, Sorek R, Hugenholtz P (2007) Evolutionary conservation of sequence and secondary<br />

structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol 8: R61<br />

Lawrence CM, Menon S, Eilers BJ, Bothner B, Khayat R, Douglas T et al. (2009) Structural and<br />

functional studies of archaeal viruses. J Biol Chem 284: 12599–12603<br />

Lillestøl RK, Redder P, Garrett RA, Brugger K (2006) A putative viral defence mechanism <strong>in</strong><br />

archaeal cells. <strong>Archaea</strong> 2: 59–72<br />

Lillestøl RK, Shah SA, Brugger K, Redder P, Phan H, Christiansen J et al. (2009) <strong>CRISPR</strong> families<br />

of <strong>the</strong> crenarchaeal genus Sulfolobus: bidirectional transcription and dynamic properties.<br />

Mol Microbiol 72: 259–272<br />

Makarova KS, Grish<strong>in</strong> NV, Shabal<strong>in</strong>a SA, Wolf YI, Koon<strong>in</strong> EV (2006) A putative RNA-<strong>in</strong>terference-based<br />

<strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />

mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms<br />

of action. Biol Direct 1: 7<br />

Marraff<strong>in</strong>i LA and Son<strong>the</strong>imer EJ (2008) <strong>CRISPR</strong> <strong>in</strong>terference limits horizontal gene transfer <strong>in</strong><br />

staphylococci by target<strong>in</strong>g DNA. Science 322: 1843–1845<br />

Marraff<strong>in</strong>i LA and Son<strong>the</strong>imer EJ (2010) Self versus non-self discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong><br />

RNA-directed immunity. Nature 463: 568–571<br />

Mojica FJ, Diez-Villasenor C, Garcia-Mart<strong>in</strong>ez J, Almendros C (2009) Short motif sequences<br />

determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic <strong>CRISPR</strong> defence <strong>system</strong>. Microbiology 155: 733–<br />

740<br />

Mojica FJ, ez-Villasenor C, Garcia-Mart<strong>in</strong>ez J, Soria E (2005) Interven<strong>in</strong>g sequences of regularly<br />

spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60: 174–182<br />

Peng X, Blum H, She Q, Mallok S, Brugger K, Garrett RA et al. (2001) Sequences and replication<br />

of genomes of <strong>the</strong> archaeal rudiviruses SIRV1 and SIRV2: relationships to <strong>the</strong> archaeal<br />

lipothrixvirus SIFV and some eukaryal viruses. Virology 291: 226–234<br />

Peng X, Brugger K, Shen B, Chen L, She Q, Garrett RA (2003) Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g<br />

to <strong>the</strong> large clusters of DNA repeats (short regularly spaced repeats) present <strong>in</strong> Sulfolobus<br />

genomes. J Bacteriol 185: 2410–2417<br />

Peng X, Kessler A, Phan H, Garrett RA, Prangishvili D (2004) Multiple variants of <strong>the</strong> archaeal<br />

DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel mechanism of genomic variation. Mol<br />

Microbiol 54: 366–375<br />

Porter K, Russ BE, Dyall-Smith ML (2007) Virus-host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol<br />

10: 418–424<br />

Portillo MC and Gonzalez JM (2009) <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong> Thermococcales: evidence for<br />

associated horizontal gene transfer <strong>in</strong> Pyrococcus furiosus. J Appl Genet 50: 421–430


Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 71<br />

Pourcel C, Salvignol G, Vergnaud G (2005) <strong>CRISPR</strong> elements <strong>in</strong> Yers<strong>in</strong>ia pestis acquire new<br />

repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary<br />

studies. Microbiology 151: 653–663<br />

Prangishvili D, Forterre P, Garrett RA (2006a) Viruses of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev<br />

Microbiol 4: 837–848<br />

Prangishvili D, Garrett RA, Koon<strong>in</strong> EV (2006b) Evolutionary genomics of archaeal viruses:<br />

unique viral genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–67<br />

Rachel R, Bettstetter M, Hedlund BP, Har<strong>in</strong>g M, Kessler A, Stetter KO et al. (2002) Remarkable<br />

morphological diversity of viruses and virus-like particles <strong>in</strong> hot terrestrial environments.<br />

Arch Virol 147: 2419–2429<br />

Redder P and Garrett RA (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong> genome of Sulfolobus solfataricus<br />

P2. J Bacteriol 188: 4198–4206<br />

Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ (2009) Biogeography of <strong>the</strong> Sulfolobus<br />

islandicus pan-genome. Proc Natl Acad Sci U S A 106: 8605–8610<br />

Santangelo TJ, Cubonova L, Sk<strong>in</strong>ner KM, Reeve JN (2009) <strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation<br />

<strong>in</strong> vivo. J Bacteriol 191: 7102–7108<br />

Shah SA and Garrett RA (2010) <strong>CRISPR</strong>/Cas and Cmr modules, mobility and evolution of adaptive<br />

<strong>immune</strong> <strong>system</strong>s. Res Microbiol<br />

Shah SA, Hansen NR, Garrett RA (2009) Distribution of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and<br />

plasmids of crenarchaeal acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />

Biochem Soc Trans 37: 23–28<br />

She Q, Peng X, Zillig W, Garrett RA (2001) Gene capture <strong>in</strong> archaeal chromosomes. Nature 409:<br />

478<br />

She Q, Phan H, Garrett RA, Albers SV, Stedman KM, Zillig W (1998) Genetic profile of pNOB8<br />

from Sulfolobus: <strong>the</strong> first conjugative plasmid from an archaeon. Extremophiles 2: 417–425<br />

Tang TH, Bachellerie JP, Rozhdestvensky T, Bortol<strong>in</strong> ML, Huber H, Drungowski M et al. (2002)<br />

Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong> archaeon Archaeoglobus<br />

fulgidus. Proc Natl Acad Sci U S A 99: 7536–7541<br />

Tang TH, Polacek N, Zywicki M, Huber H, Brugger K, Garrett R et al. (2005) Identification of<br />

novel non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus.<br />

Mol Microbiol 55: 469–481<br />

Torar<strong>in</strong>sson E, Klenk HP, Garrett RA (2005) Divergent transcriptional and translational signals <strong>in</strong><br />

<strong>Archaea</strong>. Environ Microbiol 7: 47–54<br />

Tyson GW and Banfield JF (2008) Rapidly evolv<strong>in</strong>g <strong>CRISPR</strong>s implicated <strong>in</strong> acquired resistance of<br />

microorganisms to viruses. Environ Microbiol 10: 200–207<br />

Veith A, Kl<strong>in</strong>gl A, Zolghadr B, Lauber K, Mentele R, Lottspeich F et al. (2009) Acidianus, Sulfolobus<br />

and Metallosphaera surface layers: structure, composition and gene expression. Mol<br />

Microbiol 73: 58–72<br />

Vestergaard G, Shah SA, Bize A, Reitberger W, Reuter M, Phan H et al. (2008) Stygiolobus rodshaped<br />

virus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses with <strong>the</strong> <strong>CRISPR</strong> antiviral <strong>system</strong>.<br />

J Bacteriol 190: 6837–6845<br />

Wang Y, Duan Z, Zhu H, Guo X, Wang Z, Zhou J et al. (2007) A novel Sulfolobus non-conjugative<br />

extrachromosomal genetic element capable of <strong>in</strong>tegration <strong>in</strong>to <strong>the</strong> host genome and spread<strong>in</strong>g<br />

<strong>in</strong> <strong>the</strong> presence of a fusellovirus. Virology 363: 124–133<br />

Zillig W, Arnold HP, Holz I, Prangishvili D, Schweier A, Stedman K et al. (1998) Genetic elements<br />

<strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus. Extremophiles 2: 131–140


Chapter 7<br />

<strong>Archaea</strong>l Type II TA Loci<br />

Shiraz A. Shah and Roger A. Garrett<br />

Correspond<strong>in</strong>g: garrett@bio.ku.dk<br />

<strong>Archaea</strong> Centre, Department of Biology, Ole Maaløes Vej 5, DK-2200<br />

Copenhagen N, Denmark<br />

Abstract A few of <strong>the</strong> bacterial type II TA <strong>system</strong>s, primarily those<br />

<strong>in</strong>volved <strong>in</strong> tranaslational <strong>in</strong>hibition, occur widely throughout <strong>the</strong><br />

archaeal doma<strong>in</strong>. Us<strong>in</strong>g a bio<strong>in</strong>formatic approach, <strong>the</strong> frequency and<br />

diastribution of <strong>the</strong>se diverse TA loci were exam<strong>in</strong>ed with<strong>in</strong><br />

completed genomes of 124 archaea, that are distributed fairly evenly<br />

throughout <strong>the</strong> major archaeal phyla. Results for <strong>the</strong> frequency and<br />

diversity of TA loci are summarised for archaea isolated from<br />

environmental niches generally characterised by extreme conditions<br />

<strong>in</strong>clud<strong>in</strong>g high temperature, high salt concentrations, high pressures,<br />

extremes of pH or strictly anaerobic conditions. No clear correlations<br />

were found between <strong>the</strong> number of TA loci present and ei<strong>the</strong>r <strong>the</strong><br />

genome size or particular environmental conditions. Multiple TA loci<br />

tend to be concentrated <strong>in</strong> variable genomic regions where <strong>the</strong><br />

occurrence of <strong>in</strong>tra- or <strong>in</strong>ter-genomic gene transfer is most prevalent.<br />

For members of <strong>the</strong> Sulfolobales which are uniformly rich <strong>in</strong> TA loci,<br />

a case is made for some TA <strong>system</strong>s facilitat<strong>in</strong>g ma<strong>in</strong>tenance of<br />

important genomic regions.<br />

7.1. Introduction<br />

Until recently, type II TA <strong>system</strong>s have received relatively little<br />

attention <strong>in</strong> comparative genomic studies of archaea. This reflects a<br />

general uncerta<strong>in</strong>ty regard<strong>in</strong>g <strong>the</strong>ir functions, <strong>the</strong> significance of<br />

<strong>the</strong>ir structural diversity and, to some degree, <strong>the</strong>ir identities.<br />

Moreover, this uncerta<strong>in</strong>ty was compounded by <strong>the</strong> small gene sizes,<br />

especially for <strong>the</strong> antitox<strong>in</strong>s, which rendered <strong>the</strong>ir annotation<br />

difficult. This widespread deficiency was first highlighted by Gerdes'<br />

1


group who identified large numbers of non annotated TA loci <strong>in</strong><br />

archaeal and bacterial genomes and demonstrated <strong>the</strong> structural<br />

diversity of <strong>the</strong> prote<strong>in</strong> components with<strong>in</strong> different TA families<br />

(Pandey et al., 2005; Gerdes et al., 2005; Jørgensen et al., 2009). This<br />

development, comb<strong>in</strong>ed with contemporary <strong>in</strong>sights ga<strong>in</strong>ed <strong>in</strong>to<br />

molecular mechanisms of tox<strong>in</strong> <strong>in</strong>hibitory activity (reviewed <strong>in</strong><br />

Gerdes et al., 2005), served to focus attention on <strong>the</strong> profound<br />

importance of TA <strong>system</strong>s for cellular viability and survival.<br />

Genome-based surveys of bacterial type II TA <strong>system</strong>s,<br />

carry<strong>in</strong>g two genes, have identified eight major families denoted<br />

vapBC, relBE, hicBA, mazEF, phd/doc, parDE, ccdAB and higBA with an<br />

additional <strong>system</strong> <strong>in</strong> a Streptococcus plasmid carry<strong>in</strong>g three genes (,<br />

and , a repressor, antitox<strong>in</strong> and tox<strong>in</strong>, respectively). VapC, RelE,<br />

MazE and HicA tox<strong>in</strong>s have all been demonstrated experimentally to<br />

<strong>in</strong>hibit translation and Doc has also been implicated, at least<br />

<strong>in</strong>directly, <strong>in</strong> affect<strong>in</strong>g translation. In contrast, ParE and CcdB target<br />

<strong>the</strong> bacterial DNA gyrase <strong>the</strong>reby block<strong>in</strong>g DNA replication<br />

(reviewed <strong>in</strong> Gerdes et al., 2005).<br />

Only three of <strong>the</strong>se tox<strong>in</strong> families VapC, RelE and HicA, each<br />

target<strong>in</strong>g translation, occur commonly amongst archaea and this<br />

chapter is ma<strong>in</strong>ly focussed on <strong>the</strong>se three TA <strong>system</strong>s. In <strong>the</strong><br />

bacterium Shigella flexneri VapC tox<strong>in</strong>s act by cleav<strong>in</strong>g <strong>in</strong>itiator tRNA<br />

with<strong>in</strong> <strong>the</strong> anticodon loop <strong>the</strong>reby <strong>in</strong>hibit<strong>in</strong>g translational <strong>in</strong>itation<br />

(Dienemann et al. 2011; W<strong>in</strong><strong>the</strong>r and Gerdes, 2011), while RelE b<strong>in</strong>ds<br />

at <strong>the</strong> ribosomal A-site cutt<strong>in</strong>g <strong>the</strong> bound mRNA with<strong>in</strong> <strong>the</strong> codon<br />

(Neubauer et al., 2009), and HicA is a translation-dependent mRNA<br />

transferase (Jørgensen et al., 2009; Makarova et al., 2009a). MazF and<br />

Doc have also been implicated <strong>in</strong> target<strong>in</strong>g translation, but <strong>the</strong>ir<br />

homologs are rarely found amongst archaea (Pandey and Gerdes,<br />

2005; Makarova et al., 2009b). Most archaea do not carry a homolog<br />

of <strong>the</strong> bacterial gyrase, <strong>the</strong> target of <strong>the</strong> ParE and CcdB tox<strong>in</strong>s,<br />

employ<strong>in</strong>g <strong>in</strong>stead <strong>the</strong> archaea-specific topoisomerase VI (Gadelle et<br />

al., 2003; Yamashiro and Yamagishi, 2005).<br />

An extensive genomic survey of bacterial and archaeal type II TA<br />

<strong>system</strong>s by Makarova et al., (2009b), that did not take <strong>in</strong>to account <strong>the</strong><br />

many non annotated genes, re<strong>in</strong>forced <strong>the</strong> considerable structural<br />

diversity of <strong>the</strong> major TA families and identified additional subtypes,<br />

especially of <strong>the</strong> antitox<strong>in</strong> components. This study also provided<br />

bio<strong>in</strong>formatical evidence for a possible additional TA locus encod<strong>in</strong>g<br />

MNT (M<strong>in</strong>imal Nucleotidyl Transferase) and HEPN (Higher<br />

Eukaryote and Prokaryote Nucleotide b<strong>in</strong>d<strong>in</strong>g). Although <strong>the</strong>re is<br />

currently no experimental support for any tox<strong>in</strong> activity (Makarova<br />

et al., 2009b), we never<strong>the</strong>less <strong>in</strong>cluded MNT/HEPN gene pairs <strong>in</strong> <strong>the</strong><br />

2


present analysis because <strong>the</strong>y occur commonly <strong>in</strong> archaea, especially<br />

amongst <strong>the</strong> hyper<strong>the</strong>rmophiles and, moreover, <strong>the</strong>ir frequency of<br />

genome occurrence mirrors partially that of vapBC gene pairs.<br />

A bio<strong>in</strong>formatical approach was employed to identify archaeal<br />

type II TA loci with<strong>in</strong> 124 completed archaeal genomes. Exhaustive<br />

searches were made for <strong>the</strong> major families of TA gene loci, vapBC,<br />

relBE, and hicAB and for <strong>the</strong> HEPN/MNT gene pairs and attempts<br />

were made to identify non annotated antitox<strong>in</strong> and tox<strong>in</strong> genes.<br />

7.2. The archaeal perspective<br />

<strong>Archaea</strong> differ from bacteria <strong>in</strong> <strong>the</strong>ir cellular biology <strong>in</strong><br />

fundamental ways and <strong>the</strong>y share many cellular processes<br />

exclusively with eukaryotes albeit generally <strong>in</strong> less complex forms.<br />

Although <strong>the</strong> evolutionary history of archaea and <strong>the</strong>ir relationship<br />

to early eukarya rema<strong>in</strong>s enigmatic (Gribaldi et al., 2010; Kurland,<br />

2006), <strong>the</strong> ma<strong>in</strong>tenance of unique cellular properties amongst archaea<br />

is likely to be due to <strong>the</strong>ir successful adaptation to extreme<br />

environmental conditions. These <strong>in</strong>clude high temperature, extremes<br />

of pH, high salt, high pressures and strictly anaerobic conditions;<br />

and such environments that also tend to be low <strong>in</strong> energy sources<br />

(Kletz<strong>in</strong>, 2007). It has been argued that some of <strong>the</strong> properties unique<br />

to archaea arose from adaptation to chronic energy stress through<br />

modify<strong>in</strong>g catabolic pathways and by conserv<strong>in</strong>g energy via <strong>the</strong>ir<br />

low permeability e<strong>the</strong>r-l<strong>in</strong>ked lipid membranes (Valent<strong>in</strong>e, 2007).<br />

Thus, stress <strong>in</strong> bacteria and archaea cannot simply be equated when<br />

consider<strong>in</strong>g <strong>the</strong> modes of action of tox<strong>in</strong>s.<br />

TA <strong>system</strong>s that are shared between bacteria and archaea appear<br />

primarily to <strong>in</strong>hibit translation, cleav<strong>in</strong>g ei<strong>the</strong>r mRNA bound <strong>in</strong> <strong>the</strong><br />

ribosomal A-site (RelE), <strong>the</strong> anticodon of <strong>the</strong> <strong>in</strong>itiator tRNA (VapC)<br />

or mRNA directly (HicA). The ribosomal tRNA b<strong>in</strong>d<strong>in</strong>g sites,<br />

decod<strong>in</strong>g site and peptidyl transferase centre constitute <strong>the</strong> most<br />

conserved regions of <strong>the</strong> translational apparatus, <strong>in</strong> both bacteria and<br />

archaea (and also <strong>in</strong> eukarya), as judged by <strong>the</strong>ir shared sensitivities<br />

to a wide range of antibiotics which specifically target <strong>the</strong>se sites <strong>in</strong><br />

both Doma<strong>in</strong>s (e.g. Rodriguez-Fonseca et al., 1995). Experimental<br />

studies <strong>in</strong>dicate that bacterial TAs have alternative cellular targets,<br />

<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> bacterial DNA gyrase, but it rema<strong>in</strong>s unknown<br />

whe<strong>the</strong>r <strong>the</strong>re are unidentified archaeal tox<strong>in</strong>s which b<strong>in</strong>d to<br />

archaea-specific cellular sites.<br />

7.3 A bio<strong>in</strong>formatical approach<br />

All archaeal genomes publicly available at <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g of 2012<br />

were screened for <strong>the</strong> presence
of type TA loci of <strong>the</strong> superfamilies<br />

3


vapBC, relBE and hicAB, as well as gene pairs of <strong>the</strong> predicted<br />

HEPN/MNT TA locus, by first
construct<strong>in</strong>g tox<strong>in</strong>-specific hidden<br />

markov models (HMMs), us<strong>in</strong>g <strong>the</strong>
jackhmmer program (Eddy,<br />

2011), aga<strong>in</strong>st <strong>the</strong> genomes us<strong>in</strong>g known tox<strong>in</strong> genes as queries.<br />

Subsequently, all open read<strong>in</strong>g frames (ORFs)
between 50 and 250<br />

aa that did not overlap previously
annotated ORFs above 250 aa <strong>in</strong><br />

length were extracted and screened
us<strong>in</strong>g <strong>the</strong> constructed HMMs.<br />

Every upstream or downstream ORF, depend<strong>in</strong>g on <strong>the</strong> TA family<br />

type, located with<strong>in</strong> a fixed
distance of <strong>the</strong> match<strong>in</strong>g tox<strong>in</strong> ORF,<br />

was
extracted and clustered accord<strong>in</strong>g to sequence similarity. Some<br />

of <strong>the</strong>se clusters were judged to comprise antitox<strong>in</strong> gene families<br />

based on manual <strong>in</strong>spection of <strong>the</strong>ir genomic contexts, and <strong>the</strong>y were<br />

paired with <strong>the</strong> correspond<strong>in</strong>g
tox<strong>in</strong> genes to generate TA<br />

loci.
Subsequently, we found that significant numbers of TA loci<br />

were partially overlapp<strong>in</strong>g with larger annotated ORFs, particularly<br />

for members of <strong>the</strong> Thermococcales and, <strong>the</strong>refore, we extended <strong>the</strong><br />

analyses to <strong>in</strong>clude <strong>the</strong>se genes, which <strong>in</strong>volved extensive manual<br />

<strong>in</strong>spection of <strong>the</strong> genomes.<br />

7.4. Phylogenetic distribution and frequency of archaeal TA loci<br />

A phylogenetic tree based on 16S rRNA sequences was<br />

generated for 124 archaea for which genome sequences were<br />

available. The genome size and natural habitat is given for each<br />

organism, and <strong>the</strong>rmophiles are dist<strong>in</strong>guished from mesophiles with<br />

a border for optimal growth of 50 o C (Table 1). More details of <strong>the</strong><br />

natural environments and optimal growth conditions for many of<br />

<strong>the</strong> organisms are given by Kletz<strong>in</strong> (2007). Some orders, <strong>in</strong>clud<strong>in</strong>g<br />

<strong>the</strong> hyper<strong>the</strong>rmophilic Sulfolobales, Thermoproteales and<br />

Thermococcales, are relatively overrepresented by closely related<br />

organisms <strong>in</strong>clud<strong>in</strong>g several Sulfolobus islandicus, Pyrobaculum and<br />

Thermococcus stra<strong>in</strong>s, while <strong>the</strong> less well characterised Korarchaea (K)<br />

and Thaumarchaea (T) are underrepresented. This bias primarily<br />

reflects that <strong>the</strong> former group are relatively easy to isolate and<br />

culture and that some of <strong>the</strong>m have been employed as model<br />

organisms for molecular, cellular and genetic studies. The total<br />

numbers of identified TA loci are given for vapBC, relBE and hicAB<br />

families and for <strong>the</strong> HEPN/MNT gene pairs <strong>in</strong> Table 1.<br />

The results reveal a wide range of type II TA contents. Several<br />

organisms carry 30 or more TA loci but many have very few or no<br />

detectable loci. vapBC constitute <strong>the</strong> dom<strong>in</strong>ant TA family and <strong>the</strong>y<br />

are most prevalent amongst <strong>the</strong>rmophiles, <strong>in</strong> particular <strong>in</strong> members<br />

of <strong>the</strong> <strong>the</strong>rmoacidophilic Sulfolobales (Pandey and Gerdes, 2005; Guo<br />

et al., 2011) and <strong>in</strong> some Thermococcus species. In contrast relBE or<br />

4


hicAB gene pairs are quite rare especially amongst <strong>the</strong> 40<br />

crenarchaeal genomes. For <strong>the</strong> euryarchaea relBE gene pairs were<br />

observed <strong>in</strong> about half of <strong>the</strong> genomes and several of <strong>the</strong>se carried 1<br />

to 9 copies. Similarly, hicAB pairs were identified <strong>in</strong> about half <strong>the</strong><br />

euryarchaeal genomes with multiple copies occurr<strong>in</strong>g ma<strong>in</strong>ly<br />

amongst <strong>the</strong> Methanomicrobiales and Methanosarc<strong>in</strong>ales.<br />

MNT/HEPN gene pairs occur much more frequently but are<br />

irregularly distributed. They are most common amongst<br />

crenarchaeal <strong>the</strong>rmoacidophiles and <strong>the</strong>rmoneutrophiles and <strong>the</strong><br />

euryarchaeal hyper<strong>the</strong>rmophiles (Table 1).<br />

7.5. TA loci frequency and <strong>the</strong>ir relationship to genome size and<br />

environmental factors<br />

Generally, <strong>the</strong>re is no simple correlation between genome<br />

size and TA 
 locus frequency for <strong>the</strong> different archaeal phyla.<br />

For example, for most members of 
 <strong>the</strong> Sulfolobales <strong>the</strong><br />

estimated number of TA loci varies from 17 to 49 but <strong>the</strong><br />

m<strong>in</strong>imal genome of Acidianus hospitalis (2.2 Mb) carries 38<br />

while
 <strong>the</strong> largest genome of Sulfolobus solfataricus P2 (3 Mb)<br />

conta<strong>in</strong>s 33. For 
 o<strong>the</strong>r phyla, a clearer picture emerges when<br />

compar<strong>in</strong>g stra<strong>in</strong>s with<strong>in</strong>
 <strong>the</strong> same genus e.g. <strong>the</strong> seven<br />

Thermococcus stra<strong>in</strong>s which have genome 
 sizes rang<strong>in</strong>g from<br />

1.8 to 2.1 Mb. When ordered accord<strong>in</strong>g to <strong>in</strong>creas<strong>in</strong>g
<br />

approximate size (Table 1), <strong>the</strong>se genomes carry 4, 10, 24, 25, 26,<br />

47
 and 58 TA loci respectively, show<strong>in</strong>g that <strong>the</strong> TA frequency<br />

<strong>in</strong>creases 
 disproportionately with genome size. A similar<br />

pattern is seen with 
 Pyrobaculum stra<strong>in</strong>s. These results also<br />

underl<strong>in</strong>e <strong>the</strong> often large 
 differences <strong>in</strong> <strong>the</strong> TA contents of<br />

pairs of closely related organisms.<br />

There is little correlation between TA loci numbers and optimum<br />

growth temperatures. Although Hyper<strong>the</strong>rmus butylicus which can<br />

grow up to 108 o C has a relatively high TA locus content of 18 (ma<strong>in</strong>ly<br />

vapBC loci) for a member of <strong>the</strong> Thermoproteales, Methanopyrus<br />

kandleri grow<strong>in</strong>g up to 110 o C has no detectable TA loci and some of<br />

<strong>the</strong> hyper<strong>the</strong>rmophilic Methanocaldococcus stra<strong>in</strong>s also exhibit few TA<br />

loci.<br />

More difficult to assess is <strong>the</strong> impact of <strong>the</strong> natural environments<br />

and <strong>the</strong> available nutrients, although <strong>in</strong> this respect <strong>the</strong> S. islandicus<br />

stra<strong>in</strong>s may be <strong>in</strong>formative (Reno et al., 2009; Guo et al., 2011). They<br />

were all isolated from terrestial acidic hot spr<strong>in</strong>gs with similar<br />

maximum growth temperatures and pH ranges but widely<br />

5


separated, and isolated, geographically; on Iceland, <strong>in</strong> Kamchatka,<br />

Russia and <strong>in</strong> Yellowstone and Lassen National Parks, USA while <strong>the</strong><br />

related S. solfataricus P2 stra<strong>in</strong> derives from Naples, Italy. Each of <strong>the</strong><br />

stra<strong>in</strong>s carry 26 to 36 TA loci which suggests that <strong>the</strong> nature of <strong>the</strong><br />

environment is important. Moreover, active terrestial hot spr<strong>in</strong>gs are<br />

likely to be particularly challeng<strong>in</strong>g for cells because temperatures<br />

can cont<strong>in</strong>uously change from maxima of around 80 o C to 0 o C, if<br />

surrounded by ice, and pH values and nutrient availability can also<br />

change rapidly. A def<strong>in</strong>itive answer to <strong>the</strong> effect of environmental<br />

factors on TA activity would require detailed and time consum<strong>in</strong>g<br />

experimental analyses of archaea cultivated under a wide range of<br />

conditions.<br />

7.6. Orphan tox<strong>in</strong> and antitox<strong>in</strong> genes<br />

Many orphan tox<strong>in</strong> and some orphan antitox<strong>in</strong> genes were<br />

detected <strong>in</strong> <strong>the</strong> genomes and <strong>the</strong> numbers tend to be proportional to<br />

<strong>the</strong> numbers of type II TA loci. For example, <strong>the</strong>re are many orphan<br />

tox<strong>in</strong> genes amongst <strong>the</strong> Sulfolobales. Some of <strong>the</strong>se may have been<br />

classed as orphans because <strong>the</strong> adjacent antitox<strong>in</strong> prote<strong>in</strong> gene was<br />

not identified (Pandey and Gerdes, 2005) and o<strong>the</strong>rs may be located<br />

adjacent to unidentified type III RNA antitox<strong>in</strong> genes (see Chapter<br />

14).<br />

Presumably, over time, antitox<strong>in</strong>s or tox<strong>in</strong>s may become<br />

associated with o<strong>the</strong>r cellular functions by selection. One such<br />

example could be provided by a s<strong>in</strong>gle vapC-like gene (Ahos0712) of<br />

A. hospitalis. It lies <strong>in</strong> an operon with genes encod<strong>in</strong>g prote<strong>in</strong>s<br />

<strong>in</strong>volved <strong>in</strong> transcription and <strong>in</strong>itiator tRNA b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> ribosome<br />

(You et al., 2011). This gene cassette is highly conserved <strong>in</strong> gene<br />

content, gene synteny and sequence <strong>in</strong> o<strong>the</strong>r Sulfolobus genomes<br />

(Guo et al. 2011). A possible explanation is that this orphan VapC-like<br />

prote<strong>in</strong> acts as a VapC competitor and may regulate or <strong>in</strong>hibit<br />

<strong>in</strong>itiator tRNA cleavage.<br />

7.7. Locations with<strong>in</strong> genomes<br />

Earlier comparative genomic analyses of closely related Sulfolobus<br />

species <strong>in</strong>dicated that TA gene pairs tend to be concentrated <strong>in</strong><br />

relatively large genomic regions (0.7 to 1 Mbp). These regions are <strong>the</strong><br />

most variable <strong>in</strong> gene synteny and gene content (Guo et al., 2011)<br />

consistent with <strong>the</strong> extensive exchange of genes hav<strong>in</strong>g occurred<br />

<strong>in</strong>tra- and/or <strong>in</strong>ter-genomically. This is illustrated for <strong>the</strong> genomes of<br />

S. islandicus REY15A and <strong>the</strong> related S. solfataricus P2 where a high<br />

level of gene synteny is ma<strong>in</strong>ta<strong>in</strong>ed throughout about two thirds of<br />

<strong>the</strong> genome while <strong>the</strong> rema<strong>in</strong><strong>in</strong>g one third is extensively shuffled<br />

6


Figure 7.1. Comparison of genomes from pairs of closely related archaea. Dot<br />

plots of (A) Sulfolobus species S. islandicus REY15A and S. solfataricus P2, and (B)<br />

Thermococcus species T. onnur<strong>in</strong>eus and T. kodakarensis show<strong>in</strong>g regions of gene<br />

synteny (red) and <strong>in</strong>verted synteny (blue). The total genome sizes are given and<br />

<strong>the</strong> large variable regions <strong>in</strong> each genome are shaded. TA loci are denoted by black<br />

l<strong>in</strong>es along <strong>the</strong> correspond<strong>in</strong>g genome axes.<br />

Figure 7.2. Phylogenetic trees for VapB antitox<strong>in</strong>s and VapC tox<strong>in</strong>s of <strong>the</strong><br />

acido<strong>the</strong>rmophile A. hospitalis W1. (A) The VapB tree demonstrates that <strong>the</strong><br />

highly diverse antitox<strong>in</strong>s can be classified <strong>in</strong>to three ma<strong>in</strong> subfamilies AbrB,<br />

CcdA/CopG and DUF217. In (B) <strong>the</strong> VapC tree shows <strong>the</strong> highly diverse tox<strong>in</strong><br />

sequences fall<strong>in</strong>g <strong>in</strong>to one major group<strong>in</strong>g. The VapB subfamily l<strong>in</strong>ked to each<br />

VapC is given. Moreover <strong>the</strong> number of closely similar VapC prote<strong>in</strong>s present <strong>in</strong><br />

<strong>the</strong> available 13 Sulfolobales genomes (Table 1) is listed - 0 <strong>in</strong>dicates that no VapC<br />

with a similar sequence is encoded <strong>in</strong> <strong>the</strong> genomes, while 13 <strong>in</strong>dicates that a VapC<br />

with a closely similar sequence is encoded <strong>in</strong> each <strong>the</strong> genomes. Ahos Genbank<br />

numbers are given for each prote<strong>in</strong>. Modified from You et al., (2011).<br />

7


(Figure 1A). Most of <strong>the</strong> TA loci of both species fall with<strong>in</strong> <strong>the</strong><br />

variable region. Although few pairs of genome sequences from<br />

closely related archaeal species are available which show extensive<br />

gene synteny, a comparable analysis was possible for <strong>the</strong> genomes of<br />

two Thermococcus species. T. kodakarensis carry<strong>in</strong>g many TA loci and<br />

T. onnur<strong>in</strong>eus that exhibits very few TA loci (Figure 1B). Here <strong>the</strong><br />

gene synteny is more limited and extends only over about one half of<br />

<strong>the</strong> genome but aga<strong>in</strong> <strong>the</strong> TA loci of T. kodakarensis are concentrated<br />

<strong>in</strong> <strong>the</strong> shuffled genome region. The latter example also illustrates <strong>the</strong><br />

stark differences <strong>in</strong> <strong>the</strong> numbers of TA loci between some fairly<br />

closely related species.<br />

Although several genomes, <strong>in</strong>clud<strong>in</strong>g some Sulfolobus species,<br />

conta<strong>in</strong> many transposable elements and TA loci <strong>the</strong>re is no general<br />

proportionality between <strong>the</strong> two. For example, both Thermococcus<br />

genomes carry few IS elements but one of <strong>the</strong> species, T. kadakarensis,<br />

exhibts several TA loci (Figure 1B). Moreover, several of <strong>the</strong> genomes<br />

carry many transposable elements but few TA loci (e.g. Pyrococcus<br />

furiosus, Halobacterium NRC1 and Thermoplasma volcanium) while<br />

o<strong>the</strong>rs exhibit few transposable elements but conta<strong>in</strong> many TA loci<br />

(e.g. Sulfolobus acidocaldarius, H. butylicus and Thermococcus sp. AM4)<br />

(Brügger et al., 2002; Filée et al., 2007).<br />

7.8. TA sequence diversity with<strong>in</strong> genomes<br />

The A. hospitalis genome carries 24 vapBC loci concentrated with<strong>in</strong><br />

<strong>the</strong> genomic regions 350-410 kb and 1,374-1,912 kb (You et al., 2011).<br />

Whereas <strong>the</strong> VapC tox<strong>in</strong>s are all PIN doma<strong>in</strong> prote<strong>in</strong>s (PilT Nterm<strong>in</strong>al<br />

doma<strong>in</strong>), <strong>the</strong> VapB antitox<strong>in</strong>s were classified <strong>in</strong>to three<br />

families of transcriptional regulators, AbrB, CcdA/CopG and<br />

DUF217 (Figure 2A) (You et al., 2011). Tree build<strong>in</strong>g based on<br />

sequence alignments demonstrated that <strong>the</strong> sequences of <strong>the</strong>se<br />

antitox<strong>in</strong>s and tox<strong>in</strong>s are all highly diverse, with sequence identities<br />

between <strong>the</strong>m rarely exceed<strong>in</strong>g 30%, as <strong>in</strong>dicated by all <strong>the</strong> long tree<br />

branches for each prote<strong>in</strong> (Figure 2). A parallel tree build<strong>in</strong>g study of<br />

<strong>the</strong> closely related S. islandicus stra<strong>in</strong>s REY15A and HVE10/4 carry<strong>in</strong>g<br />

18 and 19 vapBC gene pairs, respectively, yielded a similar pattern of<br />

long branches for each VapB and VapC prote<strong>in</strong> (Guo et al., 2011).<br />

Thus all antitox<strong>in</strong>s and tox<strong>in</strong>s with<strong>in</strong> each archaeon are highly<br />

diverse <strong>in</strong> sequence.<br />

In contrast, when <strong>in</strong>tergenomic comparisons were made for o<strong>the</strong>r<br />

members of <strong>the</strong> Sulfolobales, isolated from both closely and distantly<br />

separated geographical terrestial hot spr<strong>in</strong>gs, several VapBC<br />

complexes showed high sequence similarity. For example, 11 of <strong>the</strong><br />

24 VapBC prote<strong>in</strong> pairs identified <strong>in</strong> A. hospitalis (Figure 2), exhibit<br />

8


closely similar sequences to homologs encoded <strong>in</strong> at least seven of<br />

<strong>the</strong> 13 available Sulfolobus genomes (You et al., 2011). A fur<strong>the</strong>r<br />

example is illustrated for <strong>the</strong> VapC tox<strong>in</strong>s of Pyrococcus species<br />

(Figure 3A) and for <strong>the</strong> predicted MNT tox<strong>in</strong> of <strong>the</strong> MNT/HEPN<br />

gene pairs for Pyrobaculum species (Figure 3B). The result shows that<br />

<strong>the</strong> VapC and MNT sequences with<strong>in</strong> each cluster of short branches<br />

derive from different species. Thus, <strong>the</strong>re is apparently selection<br />

aga<strong>in</strong>st <strong>the</strong> uptake of closely similar vapBC loci or MNT/HEPN gene<br />

pairs <strong>in</strong> a given genome, despite <strong>the</strong> abundance of many similar gene<br />

pairs <strong>in</strong> <strong>the</strong> environment.<br />

The tree-build<strong>in</strong>g results of <strong>the</strong> analysis demonstrated fur<strong>the</strong>r that<br />

for given gene pairs <strong>the</strong> subtypes of VapB and VapC do not always<br />

correspond imply<strong>in</strong>g that gene pairs exchange partners <strong>the</strong>reby<br />

potentially creat<strong>in</strong>g <strong>in</strong>creased functional diversity of <strong>the</strong> TA <strong>system</strong>s<br />

(You et al., 2011), consistent with an earlier hypo<strong>the</strong>sis (Gerdes et al.,<br />

2005).<br />

Figure 7.3 Phylogenetic trees for <strong>the</strong> tox<strong>in</strong> VapC and <strong>the</strong> predicted tox<strong>in</strong> MNT.<br />

(A) VapC prote<strong>in</strong>s encoded <strong>in</strong> different <strong>in</strong> Pyrococcus species, and (B) MNT<br />

prote<strong>in</strong>s encoded <strong>in</strong> diverse Pyrobaculum species. Prote<strong>in</strong>s that fall with<strong>in</strong> <strong>the</strong> small<br />

clusters of short branches derive from different organisms. Trees generated for<br />

prote<strong>in</strong>s deriv<strong>in</strong>g exclusively from one organism yield long branches. Gene<br />

numbers are given for each of <strong>the</strong> genomes analysed (see Table 1).<br />

9


7.9. Stress response<br />

Antitox<strong>in</strong>-tox<strong>in</strong>s were orig<strong>in</strong>ally shown to enhance plasmid<br />

ma<strong>in</strong>tenance as a consequence of <strong>the</strong> growth of plasmid-free cells<br />

be<strong>in</strong>g preferentially <strong>in</strong>hibited, post segregation, by free tox<strong>in</strong>s that<br />

are <strong>in</strong>herently more stable than antitox<strong>in</strong>s (Gerdes et al. 2005). To<br />

date, relatively few archaeal plasmids have been sequenced and<br />

<strong>the</strong>re is no current evidence for type II TA loci occurr<strong>in</strong>g widely <strong>in</strong><br />

plasmids. Never<strong>the</strong>less, <strong>the</strong> plasmid ma<strong>in</strong>tenance mechanism led to<br />

<strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> TA <strong>system</strong>s encoded widely <strong>in</strong> chromosomes<br />

facilitate retention of local DNA regions carry<strong>in</strong>g important genes<br />

that might o<strong>the</strong>rwise be prone to loss (Magnuson, 2007; Van<br />

Melderen 2010).<br />

This hypo<strong>the</strong>sis receives support from <strong>the</strong> observation that vapBC<br />

loci and <strong>the</strong> HEPN/MNT gene pairs are concentrated with<strong>in</strong> variable<br />

genomic regions of members of <strong>the</strong> Sulfolobales and Thermococcales<br />

where <strong>in</strong>tergenomic DNA exchange appears to be most active<br />

(Figure 1). Fur<strong>the</strong>rmore, <strong>the</strong> hypo<strong>the</strong>sis is re<strong>in</strong>forced by <strong>the</strong> high<br />

sequence diversity of each of <strong>the</strong> numerous VapC prote<strong>in</strong>s encoded<br />

with<strong>in</strong> <strong>the</strong>se genomes, exemplified for A. hospitalis (Figure 2). For any<br />

pair of similar VapBC complexes, <strong>the</strong> loss of one would be<br />

compensated for by <strong>the</strong> presence of <strong>the</strong> o<strong>the</strong>r, <strong>the</strong>reby underm<strong>in</strong><strong>in</strong>g<br />

any DNA ma<strong>in</strong>tenance capability.<br />

For bacteria which grow slowly <strong>in</strong> nutrient poor environments,<br />

multiple tox<strong>in</strong>s are strongly implicated <strong>in</strong> respond<strong>in</strong>g to different<br />

types of nutrient deficiency and/or <strong>in</strong> enhanc<strong>in</strong>g quality control<br />

(Gerdes, 2000; Pandey and Gerdes, 2005). Involvement <strong>in</strong> stress<br />

response entails that <strong>the</strong> tox<strong>in</strong>s <strong>in</strong>hibit growth, allow<strong>in</strong>g <strong>the</strong> host to<br />

lie <strong>in</strong> a dormant state dur<strong>in</strong>g <strong>the</strong> period of environmental stress<br />

(Pedersen et al., 2002; Gerdes et al. 2005). In this context, tox<strong>in</strong>s have<br />

also been implicated <strong>in</strong> produc<strong>in</strong>g persistent cells which are able to<br />

rema<strong>in</strong> dormant for longer periods and to withstand prolonged<br />

exposure to stress factors <strong>in</strong>clud<strong>in</strong>g antibiotics (Maisonneuve et al.,<br />

2011).<br />



 There may well be a negative effect on host growth as a<br />

consequence of carry<strong>in</strong>g large numbers of TA loci (30 to 40 TA loci<br />

for some Sulfolobus species and a few o<strong>the</strong>r archaea (Table 1))<br />

because of <strong>the</strong> likelihood of <strong>the</strong> cont<strong>in</strong>uous presence of low levels of<br />

free tox<strong>in</strong> (Wilbur et al. 2005). Although only highly diverse vapBC<br />

loci are present, presumably <strong>in</strong> order to avoid redundancy, <strong>the</strong> total<br />

number of TA loci present per genome may reflect a compromise<br />

between <strong>the</strong> ability to ma<strong>in</strong>ta<strong>in</strong> important genes and to survive<br />

10


different environmental stresses while reta<strong>in</strong><strong>in</strong>g an adequate growth<br />

rate under normal conditions.<br />

In conclusion, <strong>the</strong>re is a major deficit <strong>in</strong> experimental work on<br />

archaeal TA <strong>system</strong>s, especially with regard to stress responses.<br />

Almost all research to date has focussed on bacteria. One exception<br />

was <strong>the</strong> demonstration that <strong>the</strong> mode of action of a bacterial RelE<br />

tox<strong>in</strong> <strong>in</strong> M. jannaschii and bacteria were similar <strong>in</strong> vitro (Christensen<br />

and Gerdes, 2003). Moreover, heat shock of S. solfataricus (from 80 o C<br />

to 90 o C) was shown to <strong>in</strong>duce expression of some TA loci while<br />

knockout of a s<strong>in</strong>gle vapBC locus <strong>in</strong>creased heat shock lability<br />

(Cooper et al., 2009). Clearly, however, many challeng<strong>in</strong>g<br />

experiments rema<strong>in</strong> to be performed <strong>in</strong> this rapidly develop<strong>in</strong>g field.<br />

7.10. Type II TA <strong>system</strong>s and viral defence<br />

It has been proposed that bacterial TA <strong>system</strong>s could be <strong>in</strong>volved<br />

<strong>in</strong> combat<strong>in</strong>g bacteriophage <strong>in</strong>fection by, for example, block<strong>in</strong>g<br />

ribosomes and prevent<strong>in</strong>g <strong>the</strong> viruses from dom<strong>in</strong>at<strong>in</strong>g <strong>the</strong><br />

translational apparatus, prior to <strong>the</strong>ir propagat<strong>in</strong>g and lys<strong>in</strong>g cells<br />

(see Chapter 5). The <strong>in</strong>ferred result would be that only <strong>the</strong> phage<strong>in</strong>fected<br />

cells would die. In pr<strong>in</strong>ciple, archaeal TA <strong>system</strong>s which<br />

primarily target translation could act similarly. However most<br />

archaeal viruses, and especially those from extremely <strong>the</strong>rmophilic<br />

and halophilic environments, show morphotypes and genomic<br />

properties dist<strong>in</strong>ct from bacterial and eukaryal viruses and <strong>the</strong>y<br />

generally exist <strong>in</strong> stable relationships with <strong>the</strong>ir hosts at low copy<br />

numbers, <strong>in</strong>frequently, if ever, caus<strong>in</strong>g cell lysis (Prangishvili et al.,<br />

2006; Porter et al., 2007). Consistent with <strong>the</strong>se properties,<br />

circumstantial evidence suggests that <strong>the</strong> level of free viruses, at least<br />

<strong>in</strong> extreme <strong>the</strong>rmoacidophilic environments, tend to be low relative<br />

to cellular levels suggest<strong>in</strong>g that <strong>the</strong>se viruses prefer to rema<strong>in</strong><br />

with<strong>in</strong> cells under <strong>the</strong>se challeng<strong>in</strong>g conditions (Snyder et al., 2010).<br />

Ano<strong>the</strong>r <strong>in</strong>trigu<strong>in</strong>g possibility arises from juxtaposition<strong>in</strong>g of TA<br />

loci and <strong>CRISPR</strong> loci (Clustered Regularly Interspaced Short<br />

Pal<strong>in</strong>dromic Repeats) <strong>in</strong> some archaea. <strong>CRISPR</strong>-based adaptive<br />

<strong>immune</strong> <strong>system</strong>s target <strong>in</strong>vad<strong>in</strong>g genetic elements, primarily viruses<br />

and conjugative plasmids, and <strong>the</strong>y have been classified <strong>in</strong>to three<br />

major types, of which only two (types I and III) occur <strong>in</strong> archaea,<br />

often with both major types present <strong>in</strong> <strong>the</strong> same archaeon (Garrett et<br />

al., 2011). The <strong>CRISPR</strong> arrays carry spacer regions taken up from<br />

<strong>in</strong>vad<strong>in</strong>g genetic elements and <strong>the</strong>ir processed transcripts are able to<br />

facilitate target<strong>in</strong>g and cleavage of genetic elements with match<strong>in</strong>g<br />

sequences. An example of a complex assembly of a type III <strong>CRISPR</strong>based<br />

<strong>system</strong>, present <strong>in</strong> <strong>the</strong> A. hospitalis genome, is shown <strong>in</strong> Figure<br />

11


4. The <strong>CRISPR</strong> arrays and associated gene cassettes are <strong>in</strong>terwoven<br />

with four vapBC loci for which all <strong>the</strong> antitox<strong>in</strong>s and tox<strong>in</strong>s carry<br />

highly divergent sequences (Figure 2). Thus, <strong>the</strong>se <strong>CRISPR</strong>associated<br />

TA <strong>system</strong>s could play a secondary role <strong>in</strong> combat<strong>in</strong>g<br />

<strong>in</strong>vad<strong>in</strong>g genetic elements by help<strong>in</strong>g to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> functional<br />

<strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s, which also tend to be located with<strong>in</strong> <strong>the</strong><br />

variable chromosomal regions. Ano<strong>the</strong>r <strong>in</strong>terest<strong>in</strong>g aspect of this<br />

<strong>system</strong> is that one vapBC locus associated with <strong>the</strong> type III<br />

<strong>in</strong>terference <strong>system</strong> <strong>in</strong> A. hospitalis (Figure 4B) shows a high level of<br />

sequence identity with vapBC loci specifically associated with a<br />

different subclass of type III <strong>in</strong>terference <strong>system</strong>s found <strong>in</strong> <strong>the</strong> S.<br />

islandicus stra<strong>in</strong>s REY15A and HVE10/4 (Figure 4C) (Guo et al., 2011)<br />

suggest<strong>in</strong>g that <strong>in</strong>dividual types of TA loci may coevolve with genes<br />

exhibit<strong>in</strong>g specific functions.<br />

Figure 7.4 Type III <strong>CRISPR</strong> <strong>system</strong>s l<strong>in</strong>ked to vapBC gene pairs. (A) <strong>CRISPR</strong> loci<br />

and genes of <strong>the</strong> acido<strong>the</strong>rmophile A. hospitalis W1. <strong>CRISPR</strong> loci (black) show <strong>the</strong><br />

numbers of repeats present. Genes encode prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> uptake of new<br />

spacers (adaptation) labelled aCas, a gene encod<strong>in</strong>g <strong>the</strong> RNA process<strong>in</strong>g enzyme<br />

Cas6, and a gene cassette encod<strong>in</strong>g type III <strong>in</strong>terference prote<strong>in</strong>s. Four vapBC gene<br />

pairs that are highly divergent <strong>in</strong> sequence are also present. (B) Expansion of <strong>the</strong><br />

type III <strong>in</strong>terference cassette of A. hospitalis, and (C) location of a highly similar<br />

vapBC gene pair located next to a different class of type III <strong>CRISPR</strong> <strong>in</strong>terference<br />

cassette (denoted Cmr) <strong>in</strong> S. islandicus HV E10/4. Numbers of repeats are <strong>in</strong>dicated<br />

for each <strong>CRISPR</strong> locus.<br />

7.11. Conclusions<br />

Clearly, <strong>the</strong>se are early days for studies of archaeal TA loci.<br />

Almost all of <strong>the</strong> experimental work to date has been performed on<br />

different bacterial TA <strong>system</strong>s some of which have no equivalent<br />

amongst archaea. Support is provided here for a role <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />

important regions of chromsomal DNA for those organisms,<br />

particularly members of <strong>the</strong> Sulfolobales and Thermococcales, which<br />

exhibit large variable genomic regions and often carry many TA loci.<br />

Involvement <strong>in</strong> response to nutrient deficiency and o<strong>the</strong>r stress<br />

factors are highly probable and <strong>the</strong>se potential functional roles are<br />

not mutually exclusive. A rationale is provided for parts of <strong>the</strong><br />

highly conserved translational apparatus be<strong>in</strong>g <strong>the</strong> primary target<br />

for some tox<strong>in</strong>s that archaea share with bacteria. F<strong>in</strong>ally, it rema<strong>in</strong>s to<br />

12


e seen whe<strong>the</strong>r <strong>the</strong>re are undiscovered archaea-specific TA <strong>system</strong>s,<br />

or possibly hybrid <strong>system</strong>s with bacterial and archaeal antitox<strong>in</strong>tox<strong>in</strong><br />

components, which exclusively target archaeal cellular<br />

components.<br />

References<br />

Brügger,K., Redder,P., She,Q., Confalonieri,F., Zivanovic,Y. and<br />

Garrett,R.A. (2002) Mobile elements <strong>in</strong> archaeal genomes. FEMS<br />

Microbiol Letts 206: 131-141.<br />

Christensen,S.K., and Gerdes,K. (2003) RelE tox<strong>in</strong>s from bacteria and<br />

<strong>Archaea</strong> cleave mRNAs on translat<strong>in</strong>g ribosomes which are<br />

rescued by tmRNA. Mol Microbiol 48: 1389-1400.<br />

Cooper,C.R., Daugherty,A.J., Tachdjian,S., Blum,P.H., and Kelly,R.M.<br />

(2009) Role of vapBC tox<strong>in</strong>-antitox<strong>in</strong> loci <strong>in</strong> <strong>the</strong> <strong>the</strong>rmal stress<br />

response of Sulfolobus solfataricus. Biochem Soc Trans 37: 123-126.<br />

Dienemann,C., Bøggild,A., W<strong>in</strong><strong>the</strong>r,K.S. , Gerdes,K., and<br />

Brodersen,D. (2011) Crystal structure of VapBC tox<strong>in</strong>-antitox<strong>in</strong><br />

complex from Shigella flexneri reveals a hetero-octameric DNAb<strong>in</strong>d<strong>in</strong>g<br />

assembly. J. Mol Biol 414: 713-722.<br />

Eddy,S.R. (2011) Accelerated profile HMM searches. PLoS Comput<br />

Biol 7: 10.<br />

Filée,J., Siguier,P., and Chandler,M. (2007) Insertion sequence<br />

diversity <strong>in</strong> archaea. Microbiol Molec Biol Revs 71: 121-157.<br />

Gadelle,D., Filee,J., Buhler,C., and Forterre,P. (2003) Phylogenomics<br />

of type II DNA topoisomerases. Bioessays 25: 232-242.<br />

Garrett,R.A., Vestergaard,G., and Shah,S.A. (2011) <strong>Archaea</strong>l <strong>CRISPR</strong>based<br />

<strong>immune</strong> <strong>system</strong>s: exchangeable functional modules. Trends<br />

Microbiol 19: 549-556.<br />

Gerdes,K. (2000) Tox<strong>in</strong>-antitox<strong>in</strong> modules may regulate synthsis of<br />

macromolecules dur<strong>in</strong>g nutritional stress. J Bacteriol 182: 561-572.<br />

Gerdes,K., Christensen,S.K., and Lobner-Olesen,A. (2005)<br />

Prokaryotic tox<strong>in</strong>-antitox<strong>in</strong> stress response loci Nat Rev Microbiol 3:<br />

371-382.<br />

Gribaldo,S., Poole,A.M., Daub<strong>in</strong>,V., Forterre,P., and Brochier-<br />

Armanet,C. (2010) The orig<strong>in</strong> of eukaryotes and <strong>the</strong>ir<br />

relationship with <strong>the</strong> <strong>Archaea</strong>: are we at a phylogenomic<br />

impasse? Nat Rev Microbiol 8: 743-752.<br />

Guo,L., Brügger,K., Liu,C., Shah,S.A., Zheng,H., Zhu,Y., et al. (2011)<br />

Genome analyses of Icelandic stra<strong>in</strong>s of Sulfolobus islandicus:<br />

Model organisms for genetic and virus-host <strong>in</strong>teraction studies. J<br />

Bacteriol 193: 1672-1680.<br />

13


Jørgensen,M.G., Pandey,D.P., Jaskolska,M., and Gerdes,K. (2009)<br />

HicA of Escherichia coli def<strong>in</strong>es a novel family of translation<strong>in</strong>dependent<br />

mRNA transferases <strong>in</strong> bacteria and archaea. J<br />

Bacteriol 191: 1191-1199.<br />

Kletz<strong>in</strong>,A. (2007) General characteristics and important model<br />

organisms. In: <strong>Archaea</strong> Molecular and Cellular Biology (Ed. R.<br />

Cavicchioli) pp. 14-92. ASM press, Wash. DC, USA<br />

Kurland,C.G., Coll<strong>in</strong>s,L.J., and Penny,D. (2006) Genomics and <strong>the</strong><br />

irreducible nature of eukaryotic cells. Science 312: 1011-1014.<br />

Magnuson,R.D. (2007) Hypo<strong>the</strong>tical functions of tox<strong>in</strong>-antitox<strong>in</strong><br />

<strong>system</strong>s. J Bacteriol 189: 6089-6092.<br />

Maisonneuve, E., Shakespeare,L.J., Jørgensen,M.G., and Gerdes,K.<br />

(2011) Bacterial persistence by RNA endonucleases. Proc Natl Acad<br />

Sci USA 108: 13206-13211.<br />

Makarova,K.S., Grish<strong>in</strong>,N.V., and Koon<strong>in</strong>,E.V. (2009a) The HicAB<br />

cassette, a putative novel, RNA target<strong>in</strong>g tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong> <strong>in</strong><br />

archaea and bacteria. Bio<strong>in</strong>formatics 22: 2581-2584.<br />

Makarova,K.S., Wolf,Y.I., and Koon<strong>in</strong>,E.V. (2009b) Comprehensive<br />

comparative-genomic analysis of Type 2 tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s<br />

and related mobile stress response <strong>system</strong>s <strong>in</strong> prokaryotes. Biol<br />

Direct 4: 19.<br />

Melderen,L.V. (2010) Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s: why so many, what<br />

for? Curr Op<strong>in</strong> Microbiol 13: 781-785.<br />

Neubauer,C., Gao,Y.G., Andersen,K.R., Dunham,C.M., Kelley,A.C.,<br />

Hentschel,J., et al. (2009) The structural basis for mRNA<br />

recognition and cleavage by <strong>the</strong> ribosome-dependent<br />

endonuclease RelE. Cell 139: 1084-1095.<br />

Pandey,D.P., and Gerdes,K. (2005) Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly<br />

abundant <strong>in</strong> free-liv<strong>in</strong>g but lost from host-associated prokaryotes.<br />

Nucleic Acids Res 33: 966-976.<br />

Pedersen,K., Christensen,S.K., and Gerdes,K. (2002) Rapid <strong>in</strong>duction<br />

and reversal of a bacteriostatic condition by controlled expression<br />

of tox<strong>in</strong>s and antitox<strong>in</strong>s. Mol Microbiol 45: 501-510.<br />

Porter,K., Russ,B.E., and Dyall-Smith,M.L. (2007) Virus-host<br />

<strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10: 418-424.<br />

Prangishvili,D., Forterre,P., and Garrett,R.A. (2006) Viruses of <strong>the</strong><br />

<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev Microbiol 11: 837-848.<br />

Reno,M.L., Held,N.L., Fields,C.J., Burke,P.V., and Whitaker,R.J.<br />

(2009) Sulfolobus islandicus pan-genome. Proc Natl Acad Sci USA<br />

106: 8605-8610.<br />

Rodriguez-Fonseca,C., Amils,R., and Garrett,R.A. (1995) F<strong>in</strong>e<br />

structure of <strong>the</strong> peptidyl transferase centre on 23 S-like rRNAs<br />

14


deduced from chemical prob<strong>in</strong>g of antibiotic-ribosome complexes.<br />

J Molec Biol 247: 224-235.<br />

Snyder,J.C. Bateson M.M., Lav<strong>in</strong> M., and Young M.J. (2010) Use of<br />

cellular <strong>CRISPR</strong> (clusters of regularly <strong>in</strong>terspaced short<br />

pal<strong>in</strong>dromic repeats) spacer-based microarrays for detection of<br />

viruses <strong>in</strong> environmental samples. Appl Environ Microbiol 76:<br />

7251-7258.<br />

Valent<strong>in</strong>e,D.L. (2007) Adaptations to energy stress dictate <strong>the</strong><br />

ecology and evolution of archaea. Nat Rev Microbiol 5: 316-323.<br />

Wilbur,J.S., Chivers,P.T., Mattison,K., Potter,L., Brennan,R.G., and<br />

So,M. (2005) Neisseria gonorrheae FitA <strong>in</strong>teracts with FitB to b<strong>in</strong>d<br />

DNA through its ribbon-helix-helix motif. Biochem 44: 12515–<br />

12524.<br />

W<strong>in</strong><strong>the</strong>r,K.S., and Gerdes,K. (2011) Enteric virulence associated<br />

prote<strong>in</strong> VapC <strong>in</strong>hibits translation by cleavage of <strong>in</strong>itiator tRNA.<br />

Proc Natl Acad Sci USA 108: 7403-7407.<br />

Yamashiro,K., and Yamagishi,A. (2005) Characterization of <strong>the</strong> DNA<br />

gyrase from <strong>the</strong> <strong>the</strong>rmoacidophilic Archaeon Thermoplasma<br />

acidophilum. J Bacteriol 8531-8536.<br />

You,X-Y., Liu,C., Wang,S-Y., Jiang,C-Y., Shah,S.A., Prangishvili,D. et<br />

al. (2011) Genomic studies of Acidianus hospitalis W1 a host for<br />

study<strong>in</strong>g crenarchaeal virus and plasmid life cycles. Extremophiles<br />

15: 487-497.<br />

15


Table 1 Phylogenetic tree of archaea for which complete genome<br />

sequences are available toge<strong>the</strong>r with <strong>the</strong> estimated number of TA<br />

loci of <strong>the</strong> vapBC, relBE and hicAB families, and <strong>the</strong> numbers of<br />

MNT/HEPN gene pairs. In <strong>the</strong> k<strong>in</strong>gdom phyla column (P) C denotes<br />

Crenarchaeota, E - Euryarchaeota, T - Thaumarchaeota, K -<br />

Korarchaeota and N - Nanoarchaeota. In <strong>the</strong> Order column (O) S<br />

denotes Sulfolobales, D - Desulfurococcales, O - Acidolobales, P -<br />

Thermoproteales, Y - Methanopyrales, T - Thermococcales, A -<br />

Archaeoglobales, C - Methanococcales, B - Methanobacteriales, M -<br />

Methanomicrobiales, N - Methanosarc<strong>in</strong>ales, E - Methanocellales, H -<br />

Halobacteriales and L - Thermoplasmatales. The ecological niches of<br />

<strong>the</strong> different organisms are <strong>in</strong>dicated toge<strong>the</strong>r with <strong>the</strong>ir degree of<br />

<strong>the</strong>rmophilicity, with a border of optimal growth of 50 o C. The<br />

numbers of <strong>the</strong> different TA loci and MNT/HEPN gene pairs are<br />

colour-shaded extend<strong>in</strong>g from bright red (> 20), light red (20 to 11),<br />

p<strong>in</strong>k (10 to 6), violet (5 to 3) and light blue (2 to 1). Approximate<br />

genome sizes and <strong>the</strong> Genbank/EMBL accession numbers are given<br />

for <strong>the</strong> genomes.<br />

17


BIBLIOGRAPHY<br />

[1] A F Andersson and J F Banfield. Virus population dynamics<br />

and acquired virus resistance <strong>in</strong> natural microbial communities.<br />

Science, 320(5879):1047–1050, May 2008.<br />

[2] Kathryne S Auernik, Yukari Maezato, Paul H Blum,<br />

and Robert M Kelly. The genome sequence of <strong>the</strong><br />

metal-mobiliz<strong>in</strong>g, extremely <strong>the</strong>rmoacidophilic archaeon<br />

metallosphaera sedula provides <strong>in</strong>sights <strong>in</strong>to bioleach<strong>in</strong>gassociated<br />

metabolism. Appl Environ Microbiol, 74(3):682–92,<br />

Feb 2008.<br />

[3] R Barrangou, C Fremaux, H Deveau, M Richards, P Boyaval,<br />

S Mo<strong>in</strong>eau, D A Romero, and P Horvath. Crispr provides<br />

acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science,<br />

315(5819):1709–1712, Mar 2007.<br />

[4] Elizabeth R Barry and Stephen D Bell. Dna replication <strong>in</strong><br />

<strong>the</strong> archaea. Microbiol Mol Biol Rev, 70(4):876–87, Dec 2006.<br />

[5] C Bath and M L Dyall-Smith. His1, an archaeal virus of<br />

<strong>the</strong> fuselloviridae family that <strong>in</strong>fects haloarcula hispanica. J<br />

Virol, 72(11):9392–5, Nov 1998.<br />

[6] David L Bernick, Courtney L Cox, Patrick P Dennis, and<br />

Todd M Lowe. Comparative genomic and transcriptional<br />

analyses of crispr <strong>system</strong>s across <strong>the</strong> genus pyrobaculum.<br />

Front Microbiol, 3:251, 2012.<br />

[7] A Bolot<strong>in</strong>, B Qu<strong>in</strong>quis, A Sorok<strong>in</strong>, and S D Ehrlich. Clustered<br />

regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome repeats (crisprs)<br />

have spacers of extrachromosomal orig<strong>in</strong>. Microbiology,<br />

151(Pt 8):2551–2561, Aug 2005.<br />

[8] Cel<strong>in</strong>e Brochier, Simonetta Gribaldo, Yvan Zivanovic, Fabrice<br />

Confalonieri, and Patrick Forterre. Nanoarchaea: representatives<br />

of a novel archaeal phylum or a fast-evolv<strong>in</strong>g<br />

euryarchaeal l<strong>in</strong>eage related to <strong>the</strong>rmococcales? Genome<br />

Biol, 6(5):R42, 2005.<br />

[9] C Brochier-Armanet, B Boussau, S Gribaldo, and P Forterre.<br />

Mesophilic crenarchaeota: proposal for a third archaeal<br />

195


196 Bibliography<br />

phylum, <strong>the</strong> thaumarchaeota. Nat Rev Microbiol, 6(3):245–<br />

252, Mar 2008.<br />

[10] S J Brouns, M M Jore, M Lundgren, E R Westra, R J Slijkhuis,<br />

A P Snijders, M J Dickman, K S Makarova, E V Koon<strong>in</strong>, and<br />

J van der Oost. Small crispr rnas guide antiviral defense <strong>in</strong><br />

prokaryotes. Science, 321(5891):960–964, Aug 2008.<br />

[11] Kimberly K Busiek and William Margol<strong>in</strong>. Split decision:<br />

a thaumarchaeon encod<strong>in</strong>g both ftsz and cdv cell division<br />

prote<strong>in</strong>s chooses cdv for cytok<strong>in</strong>esis. Mol Microbiol, 82(3):535–<br />

8, Nov 2011.<br />

[12] J Carte, R Wang, H Li, R M Terns, and M P Terns. Cas6 is<br />

an endoribonuclease that generates guide rnas for <strong>in</strong>vader<br />

defense <strong>in</strong> prokaryotes. Genes Dev, 22(24):3489–3496, Dec<br />

2008.<br />

[13] L Chen, K Brügger, M Skovgaard, P Redder, Q She, E Torar<strong>in</strong>sson,<br />

B Greve, M Awayez, A Zibat, H P Klenk, and R A<br />

Garrett. The genome of sulfolobus acidocaldarius, a model<br />

organism of <strong>the</strong> crenarchaeota. J Bacteriol, 187(14):4992–4999,<br />

Jul 2005.<br />

[14] Benoît Dayrat. The roots of phylogeny: how did haeckel<br />

build his trees? Syst Biol, 52(4):515–27, Aug 2003.<br />

[15] L<strong>in</strong>g Deng, Haojun Zhu, Zhengjun Chen, Yun Xiang Liang,<br />

and Qunx<strong>in</strong> She. Unmarked gene deletion and host-vector<br />

<strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon sulfolobus<br />

islandicus. Extremophiles, 13(4):735–46, Jul 2009.<br />

[16] James G Elk<strong>in</strong>s, Mircea Podar, David E Graham, Kira S<br />

Makarova, Yuri Wolf, Lennart Randau, Brian P Hedlund,<br />

Cél<strong>in</strong>e Brochier-Armanet, Victor Kun<strong>in</strong>, Ia<strong>in</strong> Anderson, Alla<br />

Lapidus, Eugene Goltsman, Kerrie Barry, Eugene V Koon<strong>in</strong>,<br />

Phil Hugenholtz, Nikos Kyrpides, Gerhard Wanner, Paul<br />

Richardson, Mart<strong>in</strong> Keller, and Karl O Stetter. A korarchaeal<br />

genome reveals <strong>in</strong>sights <strong>in</strong>to <strong>the</strong> evolution of <strong>the</strong> archaea.<br />

Proc Natl Acad Sci U S A, 105(23):8102–7, Jun 2008.<br />

[17] Susanne Erdmann and Roger A Garrett. Selective and hyperactive<br />

uptake of foreign dna by adaptive <strong>immune</strong> <strong>system</strong>s<br />

of an archaeon via two dist<strong>in</strong>ct mechanisms. Mol Microbiol,<br />

Jul 2012.


Bibliography 197<br />

[18] Thijs J G Ettema, Ann-Christ<strong>in</strong> L<strong>in</strong>dås, and Rolf Bernander.<br />

An act<strong>in</strong>-based cytoskeleton <strong>in</strong> archaea. Mol Microbiol,<br />

80(4):1052–61, May 2011.<br />

[19] Roger A Garrett, David Prangishvili, Shiraz A Shah, Monika<br />

Reuter, Karl O Stetter, and Xu Peng. Metagenomic analyses<br />

of novel viruses and plasmids from a cultured environmental<br />

sample of hyper<strong>the</strong>rmophilic neutrophiles. Environ<br />

Microbiol, 12(11):2918–30, Nov 2010.<br />

[20] Roger A Garrett, Shiraz A Shah, Gisle Vestergaard, L<strong>in</strong>g<br />

Deng, Soley Gudbergsdottir, Chandra S Kenchappa, Susanne<br />

Erdmann, and Qunx<strong>in</strong> She. Crispr-based <strong>immune</strong> <strong>system</strong>s<br />

of <strong>the</strong> sulfolobales: complexity and diversity. Biochem Soc<br />

Trans, 39(1):51–7, Jan 2011.<br />

[21] Roger A Garrett, Gisle Vestergaard, and Shiraz A Shah.<br />

<strong>Archaea</strong>l crispr-based <strong>immune</strong> <strong>system</strong>s: exchangeable functional<br />

modules. Trends Microbiol, 19(11):549–56, Nov 2011.<br />

[22] Aurore Gorlas, Eugene V Koon<strong>in</strong>, Nadège Bienvenu, Daniel<br />

Prieur, and Claire Gesl<strong>in</strong>. Tpv1, <strong>the</strong> first virus isolated<br />

from <strong>the</strong> hyper<strong>the</strong>rmophilic genus <strong>the</strong>rmococcus. Environ<br />

Microbiol, 14(2):503–16, Feb 2012.<br />

[23] I Grissa, G Vergnaud, and C Pourcel. The crisprdb database<br />

and tools to display crisprs and to generate dictionaries of<br />

spacers and repeats. BMC Bio<strong>in</strong>formatics, 8(1):172–172, May<br />

2007.<br />

[24] Soley Gudbergsdottir, L<strong>in</strong>g Deng, Zhengjun Chen, Jaide<br />

V K Jensen, L<strong>in</strong>da R Jensen, Qunx<strong>in</strong> She, and Roger A<br />

Garrett. Dynamic properties of <strong>the</strong> sulfolobus crispr/cas<br />

and crispr/cmr <strong>system</strong>s when challenged with vector-borne<br />

viral and plasmid genes and protospacers. Mol Microbiol,<br />

79(1):35–49, Jan 2011.<br />

[25] Li Guo, Kim Brügger, Chao Liu, Shiraz A Shah, Huajun<br />

Zheng, Yongqiang Zhu, Shengyue Wang, Reidun K Lillestøl,<br />

Lanm<strong>in</strong>g Chen, Jeremy Frank, David Prangishvili, Lars<br />

Paul<strong>in</strong>, Qunx<strong>in</strong> She, Li Huang, and Roger A Garrett. Genome<br />

analyses of icelandic stra<strong>in</strong>s of sulfolobus islandicus,<br />

model organisms for genetic and virus-host <strong>in</strong>teraction studies.<br />

J Bacteriol, 193(7):1672–80, Apr 2011.


198 Bibliography<br />

[26] D H Haft, J Selengut, E F Mongod<strong>in</strong>, and K E Nelson. A<br />

guild of 45 crispr-associated (cas) prote<strong>in</strong> families and multiple<br />

crispr/cas subtypes exist <strong>in</strong> prokaryotic genomes. PLoS<br />

Comput Biol, 1(6), Nov 2005.<br />

[27] Caryn R Hale, Peng Zhao, Sara Olson, Michael O Duff,<br />

Brenton R Graveley, Lance Wells, Rebecca M Terns, and<br />

Michael P Terns. Rna-guided rna cleavage by a crispr rnacas<br />

prote<strong>in</strong> complex. Cell, 139(5):945–56, Nov 2009.<br />

[28] M Här<strong>in</strong>g, R Rachel, X Peng, R A Garrett, and D Prangishvili.<br />

Viral diversity <strong>in</strong> hot spr<strong>in</strong>gs of pozzuoli, italy, and characterization<br />

of a unique archaeal virus, acidianus bottleshaped<br />

virus, from a new family, <strong>the</strong> ampullaviridae. J Virol,<br />

79(15):9904–9911, Aug 2005.<br />

[29] P Horvath, A C Coûté-Monvois<strong>in</strong>, D A Romero, P Boyaval,<br />

C Fremaux, and R Barrangou. Comparative analysis of crispr<br />

loci <strong>in</strong> lactic acid bacteria genomes. Int J Food Microbiol, Jul<br />

2008.<br />

[30] Y Ish<strong>in</strong>o, H Sh<strong>in</strong>agawa, K Mak<strong>in</strong>o, M Amemura, and A Nakata.<br />

Nucleotide sequence of <strong>the</strong> iap gene, responsible<br />

for alkal<strong>in</strong>e phosphatase isozyme conversion <strong>in</strong> escherichia<br />

coli, and identification of <strong>the</strong> gene product. J Bacteriol,<br />

169(12):5429–33, Dec 1987.<br />

[31] R Jansen, J D Embden, W Gaastra, and L M Schouls. Identification<br />

of genes that are associated with dna repeats <strong>in</strong><br />

prokaryotes. Mol Microbiol, 43(6):1565–1575, Mar 2002.<br />

[32] Matthijs M Jore, Magnus Lundgren, Es<strong>the</strong>r van Duijn, Jelle B<br />

Bultema, Edze R Westra, Sakharam P Waghmare, Blake<br />

Wiedenheft, Umit Pul, Re<strong>in</strong>hild Wurm, Rolf Wagner, Marieke<br />

R Beijer, Arjan Barendregt, Kaihong Zhou, Ambrosius<br />

P L Snijders, Mark J Dickman, Jennifer A Doudna, Egbert J<br />

Boekema, Albert J R Heck, John van der Oost, and Stan J J<br />

Brouns. Structural basis for crispr rna-guided dna recognition<br />

by cascade. Nat Struct Mol Biol, 18(5):529–36, May<br />

2011.<br />

[33] Y Kawarabayasi, Y H<strong>in</strong>o, H Horikawa, K J<strong>in</strong>-no, M Takahashi,<br />

M Sek<strong>in</strong>e, S Baba, A Ankai, H Kosugi, A Hosoyama,<br />

S Fukui, Y Nagai, K Nishijima, R Otsuka, H Nakazawa,<br />

M Takamiya, Y Kato, T Yoshizawa, T Tanaka, Y Kudoh,<br />

J Yamazaki, N Kushida, A Oguchi, K Aoki, S Masuda,


Bibliography 199<br />

M Yanagii, M Nishimura, A Yamagishi, T Oshima, and<br />

H Kikuchi. Complete genome sequence of an aerobic <strong>the</strong>rmoacidophilic<br />

crenarchaeon, sulfolobus tokodaii stra<strong>in</strong>7.<br />

DNA Res, 8(4):123–140, Aug 2001.<br />

[34] M Kessel and F Kl<strong>in</strong>k. Archaebacterial elongation factor is<br />

adp-ribosylated by diph<strong>the</strong>ria tox<strong>in</strong>. Nature, 287(5779):250–1,<br />

Sep 1980.<br />

[35] Eugene V Koon<strong>in</strong> and Kira S Makarova. Crispr-cas: an<br />

adaptive immunity <strong>system</strong> <strong>in</strong> prokaryotes. F1000 Biol Rep,<br />

1:95, Dec 2009.<br />

[36] V Kun<strong>in</strong>, R Sorek, and P Hugenholtz. Evolutionary conservation<br />

of sequence and secondary structures <strong>in</strong> crispr<br />

repeats. Genome Biol, 8(4), Apr 2007.<br />

[37] R K Lillestol, P Redder, R A Garrett, and K Brügger. A<br />

putative viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong>,<br />

2(1):59–72, Aug 2006.<br />

[38] R K Lillestol, S A Shah, K Brügger, P Redder, H Phan,<br />

J Christiansen, and R A Garrett. Crispr families of <strong>the</strong><br />

crenarchaeal genus sulfolobus: bidirectional transcription<br />

and dynamic properties. Mol Microbiol, 72(1):259–272, Apr<br />

2009.<br />

[39] Nathanael G L<strong>in</strong>tner, Mel<strong>in</strong>a Kerou, Susan K Brumfield,<br />

Shirley Graham, Huant<strong>in</strong>g Liu, James H Naismith, Mat<strong>the</strong>w<br />

Sdano, Nan Peng, Qunx<strong>in</strong> She, Valérie Copié, Mark J<br />

Young, Malcolm F White, and C Mart<strong>in</strong> Lawrence. Structural<br />

and functional characterization of an archaeal clustered<br />

regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat (crispr)associated<br />

complex for antiviral defense (cascade). J Biol<br />

Chem, 286(24):21643–56, Jun 2011.<br />

[40] G Lipps. Plasmids and viruses of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />

crenarchaeote sulfolobus. Extremophiles, 10(1):17–28, Feb<br />

2006.<br />

[41] Li-Jun Liu, Xiao-Yan You, Huajun Zheng, Shengyue<br />

Wang, Cheng-Y<strong>in</strong>g Jiang, and Shuang-Jiang Liu. Complete<br />

genome sequence of metallosphaera cupr<strong>in</strong>a, a metal<br />

sulfide-oxidiz<strong>in</strong>g archaeon from a hot spr<strong>in</strong>g. J Bacteriol,<br />

193(13):3387–8, Jul 2011.


200 Bibliography<br />

[42] M Lundgren, A Andersson, L Chen, P Nilsson, and<br />

R Bernander. Three replication orig<strong>in</strong>s <strong>in</strong> sulfolobus species:<br />

synchronous <strong>in</strong>itiation of chromosome replication and asynchronous<br />

term<strong>in</strong>ation. Proc Natl Acad Sci U S A, 101(18):7046–<br />

7051, May 2004.<br />

[43] K S Makarova, N V Grish<strong>in</strong>, S A Shabal<strong>in</strong>a, Y I Wolf, and E V<br />

Koon<strong>in</strong>. A putative rna-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong><br />

<strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted<br />

enzymatic mach<strong>in</strong>ery, functional analogies with eukaryotic<br />

rnai, and hypo<strong>the</strong>tical mechanisms of action. Biol Direct,<br />

1:7–7, 2006.<br />

[44] Kira S Makarova, L Arav<strong>in</strong>d, Yuri I Wolf, and Eugene V<br />

Koon<strong>in</strong>. Unification of cas prote<strong>in</strong> families and a simple<br />

scenario for <strong>the</strong> orig<strong>in</strong> and evolution of crispr-cas <strong>system</strong>s.<br />

Biol Direct, 6:38, 2011.<br />

[45] Kira S Makarova, Daniel H Haft, Rodolphe Barrangou, Stan<br />

J J Brouns, Emmanuelle Charpentier, Philippe Horvath,<br />

Sylva<strong>in</strong> Mo<strong>in</strong>eau, Francisco J M Mojica, Yuri I Wolf, Alexander<br />

F Yakun<strong>in</strong>, John van der Oost, and Eugene V Koon<strong>in</strong>.<br />

Evolution and classification of <strong>the</strong> crispr-cas <strong>system</strong>s. Nat<br />

Rev Microbiol, 9(6):467–77, Jun 2011.<br />

[46] Aron Marchler-Bauer, Shennan Lu, John B Anderson,<br />

Farideh Chitsaz, Myra K Derbyshire, Carol DeWeese-Scott,<br />

Jessica H Fong, Lewis Y Geer, Renata C Geer, Noreen R<br />

Gonzales, Marc Gwadz, David I Hurwitz, John D Jackson,<br />

Zhaoxi Ke, Christopher J Lanczycki, Fu Lu, Gabriele H<br />

Marchler, Mikhail Mullokandov, Mar<strong>in</strong>a V Omelchenko,<br />

Cynthia L Robertson, James S Song, Narmada Thanki, Roxanne<br />

A Yamashita, Dachuan Zhang, Naigong Zhang, Chanjuan<br />

Zheng, and Stephen H Bryant. Cdd: a conserved<br />

doma<strong>in</strong> database for <strong>the</strong> functional annotation of prote<strong>in</strong>s.<br />

Nucleic Acids Res, 39(Database issue):D225–9, Jan 2011.<br />

[47] L A Marraff<strong>in</strong>i and E J Son<strong>the</strong>imer. Crispr <strong>in</strong>terference limits<br />

horizontal gene transfer <strong>in</strong> staphylococci by target<strong>in</strong>g dna.<br />

Science, 322(5909):1843–1845, Dec 2008.<br />

[48] Luciano A Marraff<strong>in</strong>i and Erik J Son<strong>the</strong>imer. Self versus<br />

non-self discrim<strong>in</strong>ation dur<strong>in</strong>g crispr rna-directed immunity.<br />

Nature, Jan 2010.


Bibliography 201<br />

[49] F J Mojica, C Díez-Villaseñor, J García-Martínez, and C Almendros.<br />

Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of<br />

<strong>the</strong> prokaryotic crispr defence <strong>system</strong>. Microbiology, 155(Pt<br />

3):733–740, Mar 2009.<br />

[50] F J Mojica, C Díez-Villaseñor, J García-Martínez, and E Soria.<br />

Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats<br />

derive from foreign genetic elements. J Mol Evol,<br />

60(2):174–182, Feb 2005.<br />

[51] F J Mojica, C Díez-Villaseñor, E Soria, and G Juez. Biological<br />

significance of a family of regularly spaced repeats <strong>in</strong><br />

<strong>the</strong> genomes of archaea, bacteria and mitochondria. Mol<br />

Microbiol, 36(1):244–246, Apr 2000.<br />

[52] Sab<strong>in</strong> Mulepati, Amberly Orr, and Scott Bailey. Crystal<br />

structure of <strong>the</strong> largest subunit of a bacterial rna-guided<br />

<strong>immune</strong> complex and its role <strong>in</strong> dna target b<strong>in</strong>d<strong>in</strong>g. J Biol<br />

Chem, 287(27):22445–9, Jun 2012.<br />

[53] Ki Hyun Nam, Charles Haitjema, Xueqi Liu, Fran D<strong>in</strong>g,<br />

Hongwei Wang, Mat<strong>the</strong>w P Delisa, and Ailong Ke. Cas5d<br />

prote<strong>in</strong> processes pre-crrna and assembles <strong>in</strong>to a cascadelike<br />

<strong>in</strong>terference complex <strong>in</strong> subtype i-c/dvulg crispr-cas<br />

<strong>system</strong>. Structure, 20(9):1574–84, Sep 2012.<br />

[54] Takuro Nunoura, Yoshihiro Takaki, Jungo Kakuta, Sh<strong>in</strong>ro<br />

Nishi, Junichi Sugahara, Hiromi Kazama, Gab-Joo Chee,<br />

Masahira Hattori, Akio Kanai, Haruyuki Atomi, Ken Takai,<br />

and Hideto Takami. Insights <strong>in</strong>to <strong>the</strong> evolution of archaea<br />

and eukaryotic prote<strong>in</strong> modifier <strong>system</strong>s revealed by <strong>the</strong> genome<br />

of a novel archaeal group. Nucleic Acids Res, 39(8):3204–<br />

23, Apr 2011.<br />

[55] Maija K Pietilä, El<strong>in</strong>a Ro<strong>in</strong>e, Lars Paul<strong>in</strong>, Nisse Kalkk<strong>in</strong>en,<br />

and Dennis H Bamford. An ssdna virus <strong>in</strong>fect<strong>in</strong>g archaea:<br />

a new l<strong>in</strong>eage of viruses with a membrane envelope. Mol<br />

Microbiol, 72(2):307–19, Apr 2009.<br />

[56] André Plagens, Britta Tjaden, Anna Hagemann, Lennart<br />

Randau, and Re<strong>in</strong>hard Hensel. Characterization of <strong>the</strong> crispr/cas<br />

subtype i-a <strong>system</strong> of <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon<br />

<strong>the</strong>rmoproteus tenax. J Bacteriol, 194(10):2491–500,<br />

May 2012.


202 Bibliography<br />

[57] C Pourcel, G Salvignol, and G Vergnaud. Crispr elements<br />

<strong>in</strong> yers<strong>in</strong>ia pestis acquire new repeats by preferential uptake<br />

of bacteriophage dna, and provide additional tools for<br />

evolutionary studies. Microbiology, 151(Pt 3):653–663, Mar<br />

2005.<br />

[58] D Prangishvili, P Forterre, and R A Garrett. Viruses of <strong>the</strong><br />

archaea: a unify<strong>in</strong>g view. Nat Rev Microbiol, 4(11):837–848,<br />

Nov 2006.<br />

[59] D Prangishvili, G Vestergaard, M Här<strong>in</strong>g, R Aramayo,<br />

T Basta, R Rachel, and R A Garrett. Structural and genomic<br />

properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal virus atv<br />

with an extracellular stage of <strong>the</strong> reproductive cycle. J Mol<br />

Biol, 359(5):1203–1216, Jun 2006.<br />

[60] G Pühler, H Leffers, F Gropp, P Palm, H P Klenk, F Lottspeich,<br />

R A Garrett, and W Zillig. Archaebacterial dnadependent<br />

rna polymerases testify to <strong>the</strong> evolution of <strong>the</strong><br />

eukaryotic nuclear genome. Proc Natl Acad Sci U S A,<br />

86(12):4569–73, Jun 1989.<br />

[61] Marco Punta, Penny C Coggill, Ruth Y Eberhardt, Ja<strong>in</strong>a<br />

Mistry, John Tate, Chris Boursnell, N<strong>in</strong>gze Pang, Kristoffer<br />

Forslund, Goran Ceric, Jody Clements, Andreas Heger, Liisa<br />

Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman,<br />

and Robert D F<strong>in</strong>n. The pfam prote<strong>in</strong> families database.<br />

Nucleic Acids Res, 40(Database issue):D290–301, Jan 2012.<br />

[62] P Redder, X Peng, K Brügger, S A Shah, F Roesch,<br />

B Greve, Q She, C Schleper, P Forterre, R A Garrett, and<br />

D Prangishvili. Four newly isolated fuselloviruses from<br />

extreme geo<strong>the</strong>rmal environments reveal unusual morphologies<br />

and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation mechanism.<br />

Environ Microbiol, Jul 2009.<br />

[63] W D Reiter, P Palm, S Yeats, and W Zillig. Gene expression<br />

<strong>in</strong> archaebacteria: physical mapp<strong>in</strong>g of constitutive and uv<strong>in</strong>ducible<br />

transcripts from <strong>the</strong> sulfolobus virus-like particle<br />

ssv1. Mol Gen Genet, 209(2):270–5, Sep 1987.<br />

[64] M L Reno, N L Held, C J Fields, P V Burke, and R J Whitaker.<br />

Biogeography of <strong>the</strong> sulfolobus islandicus pan-genome. Proc<br />

Natl Acad Sci U S A, 106(21):8605–8610, May 2009.<br />

[65] Christ<strong>in</strong>e Rousseau, Jacques Nicolas, and Mathieu Gonnet.<br />

Crispi: a crispr <strong>in</strong>teractive database. Bio<strong>in</strong>formatics, Oct 2009.


Bibliography 203<br />

[66] Rachel Y Samson, Takayuki Obita, Stefan M Freund, Roger L<br />

Williams, and Stephen D Bell. A role for <strong>the</strong> escrt <strong>system</strong> <strong>in</strong><br />

cell division <strong>in</strong> archaea. Science, 322(5908):1710–3, Dec 2008.<br />

[67] Ekater<strong>in</strong>a Semenova, Matthijs M Jore, Kirill A Datsenko,<br />

Anna Semenova, Edze R Westra, Barry Wanner, John van der<br />

Oost, Stan J J Brouns, and Konstant<strong>in</strong> Sever<strong>in</strong>ov. Interference<br />

by clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat<br />

(crispr) rna is governed by a seed sequence. Proc Natl Acad<br />

Sci U S A, 108(25):10098–103, Jun 2011.<br />

[68] S A Shah, N R Hansen, and R A Garrett. Distribution of<br />

crispr spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal<br />

acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory<br />

mechanism. Biochem Soc Trans, 37(Pt 1):23–28, Feb 2009.<br />

[69] Shiraz A Shah and Roger A Garrett. Crispr/cas and cmr<br />

modules, mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s.<br />

Res Microbiol, 162(1):27–38, Jan 2011.<br />

[70] Q She, R K S<strong>in</strong>gh, F Confalonieri, Y Zivanovic, G Allard,<br />

M J Awayez, C C Chan-Weiher, I G Clausen, B A Curtis,<br />

A De Moors, G Erauso, C Fletcher, P M Gordon, I Heikampde<br />

Jong, A C Jeffries, C J Kozera, N Med<strong>in</strong>a, X Peng, H P<br />

Thi-Ngoc, P Redder, M E Schenk, C Theriault, N Tolstrup,<br />

R L Charlebois, W F Doolittle, M Duguet, T Gaasterland, R A<br />

Garrett, M A Ragan, C W Sensen, and J Van der Oost. The<br />

complete genome of <strong>the</strong> crenarchaeon sulfolobus solfataricus<br />

p2. Proc Natl Acad Sci U S A, 98(14):7835–7840, Jul 2001.<br />

[71] Daan C Swarts, Cas Mosterd, Mark W J van Passel, and Stan<br />

J J Brouns. Crispr <strong>in</strong>terference directs strand specific spacer<br />

acquisition. PLoS One, 7(4):e35888, 2012.<br />

[72] T H Tang, J P Bachellerie, T Rozhdestvensky, M L Bortol<strong>in</strong>,<br />

H Huber, M Drungowski, T Elge, J Brosius, and A Hüttenhofer.<br />

Identification of 86 candidates for small nonmessenger<br />

rnas from <strong>the</strong> archaeon archaeoglobus fulgidus.<br />

Proc Natl Acad Sci U S A, 99(11):7536–7541, May 2002.<br />

[73] David L Valent<strong>in</strong>e. Adaptations to energy stress dictate <strong>the</strong><br />

ecology and evolution of <strong>the</strong> archaea. Nat Rev Microbiol,<br />

5(4):316–23, Apr 2007.<br />

[74] John van der Oost, Matthijs M Jore, Edze R Westra, Magnus<br />

Lundgren, and Stan J J Brouns. Crispr-based adaptive


204 Bibliography<br />

and heritable immunity <strong>in</strong> prokaryotes. Trends Biochem Sci,<br />

34(8):401–7, Aug 2009.<br />

[75] G Vestergaard, S A Shah, A Bize, W Reitberger, M Reuter,<br />

H Phan, A Briegel, R Rachel, R A Garrett, and D Prangishvili.<br />

Stygiolobus rod-shaped virus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />

rudiviruses with <strong>the</strong> crispr antiviral <strong>system</strong>. J Bacteriol,<br />

190(20):6837–6845, Oct 2008.<br />

[76] Michaela Wagner, Silvia Berkner, Malgorzata Ajon, Arnold<br />

J M Driessen, Georg Lipps, and Sonja-Verena Albers. Expand<strong>in</strong>g<br />

and understand<strong>in</strong>g <strong>the</strong> genetic toolbox of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />

genus sulfolobus. Biochem Soc Trans, 37(Pt<br />

1):97–101, Feb 2009.<br />

[77] F<strong>in</strong>n Werner and D<strong>in</strong>a Grohmann. Evolution of multisubunit<br />

rna polymerases <strong>in</strong> <strong>the</strong> three doma<strong>in</strong>s of life. Nat Rev<br />

Microbiol, 9(2):85–98, Feb 2011.<br />

[78] Edze R Westra, Benedikt Nilges, Paul B G van Erp, John<br />

van der Oost, Remus T Dame, and Stan J J Brouns. Cascademediated<br />

b<strong>in</strong>d<strong>in</strong>g and bend<strong>in</strong>g of negatively supercoiled<br />

dna. RNA Biol, 9(9), Sep 2012.<br />

[79] Edze R Westra, Paul B G van Erp, Tim Künne, Shi Pey Wong,<br />

Raymond H J Staals, Christel L C Seegers, Sander Bollen,<br />

Matthijs M Jore, Ekater<strong>in</strong>a Semenova, Konstant<strong>in</strong> Sever<strong>in</strong>ov,<br />

Willem M de Vos, Remus T Dame, Renko de Vries, Stan<br />

J J Brouns, and John van der Oost. Crispr immunity relies<br />

on <strong>the</strong> consecutive b<strong>in</strong>d<strong>in</strong>g and degradation of negatively<br />

supercoiled <strong>in</strong>vader dna by cascade and cas3. Mol Cell,<br />

46(5):595–605, Jun 2012.<br />

[80] Blake Wiedenheft, Gabriel C Lander, Kaihong Zhou, Matthijs<br />

M Jore, Stan J J Brouns, John van der Oost, Jennifer A<br />

Doudna, and Eva Nogales. Structures of <strong>the</strong> rna-guided surveillance<br />

complex from a bacterial <strong>immune</strong> <strong>system</strong>. Nature,<br />

477(7365):486–9, Sep 2011.<br />

[81] Blake Wiedenheft, Es<strong>the</strong>r van Duijn, Jelle B Bultema, Jelle<br />

Bultema, Sakharam P Waghmare, Sakharam Waghmare,<br />

Kaihong Zhou, Arjan Barendregt, Wiebke Westphal, Albert<br />

J R Heck, Albert Heck, Egbert J Boekema, Egbert Boekema,<br />

Mark J Dickman, Mark Dickman, and Jennifer A Doudna.


Bibliography 205<br />

Rna-guided complex from a bacterial <strong>immune</strong> <strong>system</strong> enhances<br />

target recognition through seed sequence <strong>in</strong>teractions.<br />

Proc Natl Acad Sci U S A, 108(25):10092–7, Jun 2011.<br />

[82] C R Woese and G E Fox. Phylogenetic structure of <strong>the</strong><br />

prokaryotic doma<strong>in</strong>: <strong>the</strong> primary k<strong>in</strong>gdoms. Proc Natl Acad<br />

Sci U S A, 74(11):5088–90, Nov 1977.<br />

[83] C R Woese, O Kandler, and M L Wheelis. Towards a natural<br />

<strong>system</strong> of organisms: proposal for <strong>the</strong> doma<strong>in</strong>s archaea,<br />

bacteria, and eucarya. Proc Natl Acad Sci U S A, 87(12):4576–<br />

9, Jun 1990.<br />

[84] Ido Yosef, Moran G Goren, and Udi Qimron. Prote<strong>in</strong>s and<br />

dna elements essential for <strong>the</strong> crispr adaptation process <strong>in</strong><br />

escherichia coli. Nucleic Acids Res, 40(12):5569–76, Jul 2012.<br />

[85] Xiao-Yan You, Chao Liu, Sheng-Yue Wang, Cheng-Y<strong>in</strong>g Jiang,<br />

Shiraz A Shah, David Prangishvili, Qunx<strong>in</strong> She, Shuang-<br />

Jiang Liu, and Roger A Garrett. Genomic analysis of acidianus<br />

hospitalis w1 a host for study<strong>in</strong>g crenarchaeal virus<br />

and plasmid life cycles. Extremophiles, 15(4):487–97, Jul 2011.<br />

[86] J<strong>in</strong>g Zhang, Christophe Rouillon, Mel<strong>in</strong>a Kerou, Judith<br />

Reeks, Kim Brugger, Shirley Graham, Julia Reimann,<br />

Giuseppe Cannone, Huant<strong>in</strong>g Liu, Sonja-Verena Albers,<br />

James H Naismith, Laura Spagnolo, and Malcolm F White.<br />

Structure and mechanism of <strong>the</strong> cmr complex for crisprmediated<br />

antiviral immunity. Mol Cell, 45(3):303–13, Feb<br />

2012.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!