2007-8

student.portal.chalmers.se

2007-8

Using Bioinformatics to Study

Cellular Internal Ribosomal Entry Sites

Tejashwari Meerupati

Thesis for the Degree of MSc Bioinformatics

Supervisor

Tore Samuelsson

Dept of Medical Biochemistry

International Master’s Programme in Bioinformatics

Chalmers University of Technology and Göteborg University

SE – 41296 Göteborg, Sweden

Göteborg, January 2007

1

Göteborg University


Acknowledgements

I would like to thank my Parents for their love, caring and continual support throughout

my postgraduate years of studies.

To my Supervisor, Prof.Tore Samuelsson, thank you for allowing me to work on this

interesting project and for encouraging me to explore. Your faultless guidance and

unwavering trust will remain a standard of comparison for me in my future academic

endeavours. And also like to thank people at Lundberg Laboratory, Göteborgs

University.

To Prof.Olle Nerman for providing continuous encouragement in completing my

studies. Thank you for your succour and faith.

Finally thanks to all the teachers of the Bioinformatics Masters Programme, and to my

fellow students for making the past 18 months in Bioinformatics Masters Program a great

learning experience for me.

Tejashwari

January 2007

2


Dedication

To my Parents, Sister and Brother

3


Contents

ACKNOWLEDGEMENTS---------------------------------------------------------------------- 2

ABSTRACT -------------------------------------------------------------------------------------------- 6

BACKGROUND------------------------------------------------------------------------------------- 7

Basic mechanism of protein synthesis -----------------------------------------------7

Prokaryotic translation-----------------------------------------------------------------8

Initiation -------------------------------------------------------------------------------8

Elongation-----------------------------------------------------------------------------9

Termination---------------------------------------------------------------------------9

Recycling ------------------------------------------------------------------------------9

Polysomes ---------------------------------------------------------------------------- 10

Eukaryotic translation------------------------------------------------------------------10

Cap-dependent initiation ---------------------------------------------------------- 10

Eukaryotic initiation factors and their functions ----------------------------- 11

Cap-independent initiation---------------------------------------------------------12

Internal ribosome entry sites (IRES) ------------------------------------------------- 12

IRES structures---------------------------------------------------------------------------- 13

Viral IRES------------------------------------------------------------------------------- 13

Cellular IRES -------------------------------------------------------------------------- 16

Factors regulating cellular IRES ------------------------------------------------------ 16

Growth regulatory genes transcribed in

Response to stress contain IRES elements ------------------------------------------ 18

MATERIALS AND METHODS--------------------------------------------------------------17

Materials ------------------------------------------------------------------------------- 17

Methods -------------------------------------------------------------------------------- 19

RESULTS AND DISCUSSIONS ------------------------------------------------------ 20

Identification of homologues to

Previously known human and mouse IRESs------------------------------------- 20

Analysis of novel sequences with Rfam CMs------------------------------------- 21

CONCLUSIONS---------------------------------------------------------------------------- 22

REFERENCES ------------------------------------------------------------------------------- 23

4


APPENDIX A. IRES description ------------------------------------------------------ 34

APPENDIX B. UCSC Genome browser -------------------------------------------- 37

APPENDIX C. Multiple sequence alignment example of IRES VEGF A --- 38

APPENDIX D. cmsearch output example with interpretation ---------------- 39

APPENDIX E. Comparison of Rfam model VS RNAalifold--------------------41

List of Figures

1. Process of Translation initiation in prokaryotes---------------------------------8

2. Cap-dependent ribosome scanning------------------------------------------------10

3. Internal ribosome entry sites---------------------------------------------------------13

List of Tables

Table 1. Eukaryotic initiation factors and their functions -------------------- 11

Table 2. Functional RNA –protein interactions in viral IRES elements --- 14

Table 3. List of cellular IRES families in Rfam ---------------------------------- 26

Table 4.Listof organisms present in each Rfam and the Novel

Homologs with the accession no, start, end positions with

cmsearch Bit scores -------------------------------------------------------------------- 27

5


ABSTRACT

Ribosome recruitment via an Internal Ribosome Entry Site (IRES) is an alternative way to

initiate translation in eukaryotic cells. IRESs were originally discovered in some viruses

but several cellular mRNAs have been proposed to contain IRES sequences as well.

These sequences are very diverse and are present in a growing list of mRNAs. The Rfam

database is a comprehensive resource for internal ribosome entry sites and presents

currently available general information as well as detailed data for each IRES. Several

subsets of data are classified according to the viral taxon (for viral IRESs), to the gene

product function (for cellular IRESs), to the possible cellular regulation or to the transacting

factor that mediates IRES function.

Each Rfam IRES family is associated with a secondary structure model. In many cases

this model is based on a limited number of primary sequences. The aim of this study was

to examine the reliability of the secondary structure models by considering a larger

number of sequences. Therefore, a number of novel IRES 'homologues' were identified in

a range of different vertebrates and using different bioinformatics tools secondary

structure was predicted or examined using these novel sequences. In general however,

there was no support for the secondary structure models proposed in Rfam.

6


BACKGROUND

This chapter provides an overview of how the genes are crucial for metabolism, growth

and differentiation in living cells. They do this by promoting the synthesis of proteins

which in turn catalyze numerous biological functions. The multi-step biochemical

pathway of translation is divided into the main steps of initiation, (cap-dependent and

cap–independent pathways in eukaryotes) elongation and termination. The regulation of

protein synthesis is the basis of the cellular growth and differentiation and translation

regulation contributes significantly to overall regulation of gene expression in cells.

Basic mechanism of Protein synthesis

The mRNA carries genetic information encoded as a ribonucleotide sequence from the

chromosomes to the ribosomes. The ribonucleotides is "read" by the translational

machinery in a sequence of nucleotide triplets called codons. Each of those triplets codes

for a specific aminoacid. The ribosome and tRNA molecules translate the mRNA

information to produce proteins. The ribosome is a multisubunit structure containing

rRNA and proteins. It is the "factory" where amino acids are assembled into proteins.

tRNAs are small noncoding RNA chains (74-93 nucleotides) and have a site for amino

acid attachment, as well as a site called an anticodon. The anticodon is an RNA triplet

complementary to the mRNA triplet that codes for their cargo aminoacid of the tRNA.

Aminoacyl tRNA synthetases (enzymes) catalyze the binding between specific tRNAs

and the aminoacids that their anticodons sequences specify. The product of this reaction

is an aminoacyl-tRNA molecule. At the ribosome, mRNA codons are matched through

complementary base pairing to specific tRNA anticodons. Thus the ribosome uses the

sequence of codons in mRNA to produce the polypeptide with a particular sequence of

amino acids [2].

7


Prokaryotic translation

Initiation

Figure1: Process of initiation of translation in Prokaryotes (Mitra et al. [2])

Initiation of translation in prokaryotes involves the assembly of the components of the

translation system which are; the two ribosomal subunits, the mRNA to be translated,

the first aminoacyl tRNA (the tRNA charged with the first aminoacid), GTP as a source

of energy, and initiation factors which help the assembly of the initiation complex.

Prokaryotic initiation results in the association of the small and large ribosomal subunits

and binding of first aminoacyl tRNA (fmet-tRNA) through anticodon-codon base pairing

with the initiation codon of mRNA. The ribosome consists of three sites -the A site, the P

site, and the E site. The A site is the point of entry for the aminoacyl tRNA (except for the

first aminoacyl tRNA, fmet-tRNA, which enters at the P site). The P site is where the

peptidyl tRNA is formed in the ribosome. The E site is the exit site of the uncharged

tRNA after it has delivered its amino acid to the growing peptide chain [2].

Initiation of translation begins with the 50s and 30s ribosomal subunits dissociated. IF1

(initiation factor 1) blocks the A site to ensure that the fMet-tRNA can bind only to the P

site and that no other aminoacyl-tRNA can bind in the A site during initiation, while IF3

blocks the E site and prevents the two subunits from associating. IF-2 is a small GTPase

which binds fmet-tRNA and facilitates its binding to the small ribosomal subunit. The

16s rRNA of the small 30S ribosomal subunit recognises the ribosomal binding site on

mRNA (the Shine-Dalgarno sequence 5-10 base pairs upstream of the start codon

8


((AUG). The Shine-Dalgarno sequence is found only in prokaryotes. This helps to

correctly position the ribosome onto the mRNA so that the P site is at the AUG initiation

codon. IF-3 helps to position fmet-tRNA into the P site, such that fmet-tRNA interacts via

base pairing with the mRNA initiation codon (AUG). Initiation ends as the large

ribosomal subunit joins the complex causing the dissociation of initiation factors. The

important thing is that prokaryotes can differentiate between a normal AUG (coding for

methionine) and an AUG initiation codon (coding for formal methionine and indicating

the start of the new translation process) [2].

Elongation

Elongation of the polypeptide chain involves addition of amino acids to the carboxyl end

of the growing chain. Elongation starts when the fmet-tRNA enters the P site, causing a

conformational change which opens the A site for the new aminoacyl-tRNA to bind. This

binding is facilitated by elongation factor-Tu (EF-Tu), a small GTPase. The P site contains

the beginning of the peptide chain of the protein to be encoded and the A site has the

next aminoacid to be added to the peptide chain. The growing polypeptide connected to

the tRNA in the P site is detached from the tRNA in the P site and a peptide bond is

formed between the last amino acid of the of the polypeptide and the amino acid still

attached to the tRNA in the A site. This process, known as peptide bond formation, is

catalyzed by a ribozyme peptidyltransferase, an activity intrinsic to the 23S rRNA in the

50s ribosomal subunit. Now, the A site has newly formed peptide, while the P site has an

unloaded tRNA (tRNA with no amino acids). In the final stage of elongation,

translocation, the ribosome moves 3 nucleotides towards the 3' end of mRNA. Since

tRNAs are linked to mRNA by codon-anticodon base-pairing, tRNAs move relative to

the ribosome taking the nascent polypeptide from the A site to the P site and moving the

uncharged tRNA to the E exit site. This process is catalyzed by elongation factor G (EF-

G). The ribosome continues to translate the remaining codons on the mRNA as more

aminoacyl-tRNA bind to the A site, until the ribosome reaches a stop codon on mRNA

(UAA, UGA, or UAG).

Termination

Termination occurs when one of the three termination codons moves into the A site.

These codons are not recognized by any tRNAs. Instead, they are recognized by proteins

called release factors, namely RF1 (recognizing the UAA and UAG stop codons) or RF2

(recognizing the UAA and UGA stop codons). A third release factor RF-3 catalyzes the

release of RF-1 and RF-2 at the end of the termination process. These factors trigger the

hydrolysis of the ester bond in peptidyl-tRNA and the release of the newly synthesized

protein from the ribosome [2].

Recycling

The post-termination complex formed by the end of the termination step consists of

mRNA with the termination codon at the A-site, tRNAs and the ribosome. Ribosome

recycling step is responsible for the disassembly of the post-termination ribosomal

complex once the nascent protein is released. Ribosome recycling Factor and Elongation

9


Factor G (EF-G) function to release mRNA and tRNAs from ribosomes and dissociate the

70S ribosomes into the 30S and 50S subunits. IF-3 also helps the ribosome-recycling

process by converting transiently dissociated subunits into stable subunits by binding to

the 30S subunits. This "recycles" the ribosomes for additional rounds of translation.

Polysomes

Translation is carried out by more than one ribosome simultaneously. Because of the

relatively large size of ribosomes, they can only attach to sites on mRNA 35 nucleotides

apart. The complex of one mRNA and a number of ribosomes is called a polysome or

polyribosome [2].

Eukaryotic translation

The fundamental difference in initiation between prokaryotes and eukaryotes

translation is that in prokaryotes, the small ribosomal subunit recognises the Shine-

Dalgarno sequence of the mRNA whereas in eukaryotes, the small ribosomal subunit

recognises the 5' cap structure on the mRNA and translation initiates at the closest AUG.

The met tRNA is N-formylated in prokaryotes, but not in eukaryotes and polycistrony is

common in prokaryotes and known, but rare, eukaryotes.

Cap–dependent Initiation

Figure 2. Schematic representation of the cap dependent ribosome

Scanning (left panel ) and internal initiation (right panel)

Pathways for the formation of 80 S initiation complexes

(Anton A. Komar and Maria Hatzoglou [1])

10


The cap structure m7GpppN (where m is the methyl group and N any nucleotide) is

present at the 5' end of all nuclear transcribed mRNAs, and plays an important role in

the initiation process. The cap is recognised by the initiation factor eIF4E. eIF4E, via an

interaction with eIF4G (protein) directs the translation machinery to the 5'end of mRNA.

The eukaryotic initiation factor 3 (elF3) is associated with the small ribosomal subunit,

and plays a role in keeping the large ribosomal subunit from prematurely binding. elF3

also interacts with the eIF4A, eIF4E and eIF4G. eIF4G is called the scaffolding protein

which directly associates with both eIF3 and the other two components. eIF3 is the cap

binding protein, and the binding is the rate limiting step of cap-dependent initiation.

This is also an method of hijacking the host machinary in favour of the (cap–

independent) messages. eIF4A is an ATP-dependent RNA helicase which aids the

ribosome in resolving certain secondary structures formed by the mRNa transcripts.

Another protein associated with the eIF4F complex is called the Poly-A Binding Protein

(PABP), which binds the poly-A tail of most eukaryotic mRNA molecules. The protein

plays a crucial role in circularization of the mRNA during translation. The pre-initiation

complex (43Ssubunit, or the 40S and mRNA) accompanied by the protein factors move

along the mRNa chain towards its 3'-end, scanning for the start codon (AUG) on the

mRNA,which indicates where the mRNA will begin coding for the protein. In

eukaryotes and prokaryotes , the aminoacid encoded by the start codon is methionine.

The initiator tRNA charged with Met forms part of the ribosomal complex and thus all

proteins start with this amino acid. The Met-charged initiator tRNA is brought to the Psite

of the small ribosomal subunit by eukaryotic initiation factor (elF2). This initiation

factor hydrolyses GTP to GDP in the presence of eIF5, and signals for the dissociation of

several factors from the small ribosomal subunit which results in the association of the

large subunit (60S subunit). The complete ribosome(80S) then commences translation

elongation, during which the sequence between the start and stop codons is translated

from mRNA into an aminoacid sequence which results in the synthesis of the protein [2]

(Refer to the Table 1 [39]).

Table 1: Eukaryotic Initiation factors and their functions. The specific non–ribosomal

associated proteins required for accurate translation initiation are termed initiation

factors. In Eukaryotes they are termed as eIFs. ( Michael E.King [39])

Initiation factors Functions

eIF-1 Repositioning of met-tRNA to facilitate

mRNA binding

eIF-2 Ternary complex formation

eif-2A AUG-dependent met-tRNAmeti binding to

40S ribosome

eIF-2B (GEF) Guanine nucleotide GTP/GDP exchange during eIF-2 recycling

exchange factor

eIF-3 (composed of ~ 10 subunits) Ribosome subunit antiassociation, binding

to 40S subunit

Initiation factor complex often referred

to as eIF-4F

11

mRNA binding to 40S subunit,ATPase

dependent RNA helicase activity,

interaction between polyA tail and cap


structure

PABP: polyA-binding protein Binds to polyA tail of mRNAs and provide

a link to eIF-4G

Mnk1 and Mnk2 eIF-4E kinases Phosphorylate eEIF-4E increasing

association with the cap structure

eIF-4A ATPase dependent RNA helicase

eIF-4E 5’ cap recognition

4E-BP (also called PHAS)

3 known forms

12

When dephosphorylated 4E-BP binds eIF-

4e and represses its activity,

phosphorylation of 4E-BP occurs in

response many growth stimuli leading to

release of eIF-4e and increased translation

initiation.

eIF-4G Acts as scaffold for the assembly of eIF-4E

and -4A in the eIF-4F complex, interaction

with PABP allows 5’-end and 3’ends of

mRNAs to interact.

eIF-4B Stimulates helicase, binds simultaneously

with eIF-4F.

eIF-5 Release of eIF-2 and eIF-3, ribosome

dependent GTPase.

eIF-6 Ribosome subunit antiassociation .

Cap-independent initiation

The best studied example of the cap-independent mode of translation initiation in

eukaryotes is that mediated by an Internal Ribosome Entry Site (IRES). What

differentiates cap-independent translation from cap-dependent translation is that capindependent

translation does not require the ribosome to start scanning from the 5' end

of the mRNA cap until the start codon. The ribosome can be trafficked to the start site by

ITAFs (IRES trans-acting factors) by passing the need to scan from the 5' end of the

untranslated region of the mRNA. This method of translation has been recently

discovered, and has found to be important in conditions that require the translation of

specific mRNAs, despite cellular stress or the inability to translate most mRNAs.

Examples include factors responding to apoptosis and stress-induced responses [2].

Internal ribosome entry sites (IRESs)

An internal ribosome entry site, abbreviated IRES, is a nucleotide sequence that allows

for translation initiation internally in a messenger RNA (mRNA) sequence. IRES mimics

the 5' cap structure, and is recognized by the 40S pre-initiation complex [40]. IRES

sequences were first discovered in poliovirus RNA in 1986 in the labs of Sonenberg and

Wimmer. The IRESs are described as distinct regions of RNA molecules being able to

attract the eukaryotic ribosome to close vicinity of the mRNA molecule and make it

initiate translation. The process got known as 'internal initiation of translation'. The

regions having the IRES activity possibly have a distinct secondary or even tertiary

structure, but a primary or secondary structure feature common to all IRES segments has

so far not been identified.


IRES are located in the 5'UTR of RNA viruses and allow translation of the RNAs in a capindependent

manner, and it was later discovered that also some mammalian mRNAs

have IRES.

Viral IRESs are much more documented than cellular IRESs and the evidence that viral

sequences can promote internal initation is much stronger. Many RNA viruses possess a

protein covalently linked to the 5' end of their mRNAs, which does not functionally

substitute for cap-structure of cellular mRNAs. Therefore, the translation of such

messages can only occur though internal initiation. For example, hepatitis C-virusrelated

IRESs directly bind 40S ribosomal subunit in such a way that their initiator

codons are located in ribosomal P-site without mRNA scanning. These IRESs do not

require initiation factors eIF1, 1A, 4A, 4B, and 4F. Another group of IRESs have been

identified in picorna virus mRNA. They do not attract 40S directly, but rather through a

high-affinity eIF4G-binding site [3].

Figure 3. Internal ribosome entry sites (IRES) (Matthews and Vanholde [41])

RNA motifs recognizable by RNA binding protein are often made up primary sequence

motifs or of a secondary or tertiary structure. Structures can be preserved without a

conserved primary sequence, but proteins often also recognize specific primary sequence

motifs. Therefore, in order to search for common attributes of known IRES sequences,

one should take into account both these principles. Although it is the 5'UTR sequence

that is responsible for recruiting the ribosome, the 3'UTR may play a similar role [15, 16].

Motifs in UTRs 3'are known to play a role in translation [17]. Virtually all the

characterized IRESs function independently of their 3' UTRs; for this reason, we may

consider only the 5'UTR here.

IRES Structures - Viral IRES

Several viral IRESs share similar secondary structures, signifying that similar structures

instead of specific sequence recognition sites are used to bind initiation factors used for

the cap-independent translation. The viral IRES are separated into three groups.

13


Type I Viral IRESs: includes entero and rhinoviruses; translate poorly in rabbit

reticulocyte lysates (RRL) and require the ribosome to bind and then scan downstream to

a start codon 30 -150 nt away.

Type II viral IRESs: includes cardio and apthoviruses; translate very efficiently in

RRL, encompass the AUG start codon, and do not require scanning

.

Type III Viral IRESs: are typified by hepatitis A, do not translate at all in the RRL, and

encompass the AUG start codon.

Type I IRESs is simulated by the eIF4G cleavage products by the viral protease, but Type

II are not. Comparative studies involving covariation analysis of the structures

demonstrated a conservation of structures between members of the picornaviridae

enterovirus family, poliovirus and coxsackievirus B3, and the human rhinovirus family.

Similar studies initiate sequence and structural conservation between the picornaviridae

cardio virus family (EMCV, TMEV) and apthovirus FMDV [18].

The HCV IRES is an example that both tertiary and secondary structure is important for

IRES function. Conservation of specific stems in domain II preserves IRES function

regardless of the sequence used [19]. Point mutations that alter the tertiary structure of

interdomain interactions [21] and mutations to the pseudo knot [22] severely affect the

IRES activity. The importance of these domains has become clearer as domain III has

been shown to directly interact with eIF3 and the 40S ribosomal subunit [2].

Cellular IRES

Table2: Functional RNA-protein interactions in viral IRES elements

(Kaminski & Jackson, 1998 [17])

IRES Translation initiation

factors

Trans-acting factors

EMCV eIF4G-Ct, eIF4A, eIF3, eIF2 Polypyrimidine tract

binding protein (PTB*)

FMDV eIF4G-Ct, eIF4A, eIF3, eIF2 PTB, ITAF45

Poliovirus Not known PTB, PCBP2, La, unr

Rhinovirus Not Known PTB, PCBP2, La, unr

HCV eIF3, eIF2 PTB, PCBP2, La

CrPV None Not known

The structural features of cellular IRES elements remain largely unknown. A common Y

structure (Le and Maizel 1997) has been predicted for cellular IRESs based on the

computational comparison of the several orthologs of Bip and FGF2 UTRs. Some studies

have proposed that mRNA binding proteins open up the natural RNA structure of the

IRES and present single stranded RNA for other ITAFs or the small ribosomal subunit to

bind to. For example the poly rc binding Protein 1 (PCBP1) appears to open Bag-1 IRES

structure allowing PTB-1 to bind. Mutations that open up the binding region of the

14


PCBP1 on the Bag-1 IRES seem to remove any requirement for this factor, even

enhancing IRES activity after PTB-1 is added [24].

The c-myc IRES structure has been studied in some detail. Initially full activity was

assigned to a 394 base region upstream of the AUG whose structure was derived using

chemical and enzymatic probing [24]. Mapping of the ribosome landing site by

engineering the 5’UTR revealed that start codons that were at least 220 base up streams

in of the standard AUG had no effect on the downstream reporter ORF. Interestingly,

domain1 (-380 to -221) upstream of the proposed ribosome binding site could have a

negative effect on the IRES-initiated translation if a couple of the dsRNA regions were

disrupted, suggesting some structural requirements. Mutations that disrupt the

predicted pseudo knots in position -185 to -347 seemed to enhance IRES activity, perhaps

showing in the case of these IRES that the pseudoknots attenuate IRES activity and

proteins that would open them up would allow the IRES activity.

Recent work has shown that the major portion of c-myc IRES activity resides in a 50–base

region from -143 to -94 [26]. The overall structure of the UTR may provide a role for IRES

regulation, as increased levels of PCBP1 and PCBP2 enhances the c-myc IRES activity,

possibly opening up the structure for other ITAFs and the ribosome as with Bag-1.

Whereas FGF2 has a UTR with a unique G-quartet structure within its IRES, deletion of

this element reduces the relative IRES activity to 50% and the stem-loop II has the

greatest effect on overall activity. The minimal IRES sequences of the FGF1 seem to be

conserved over several mammalian species, computationally predicted structures from

the mouse, rat, cow and two primate sequences seem very similar to the empirically

derived human sequence structure [27].

A number of mRNAs have been identified that contain IRES. Little is known about the

mechanism by which naturally occurring cellular IRES elements capture 40S subunits.

On the other hand, substantial evidence points to the translational regulation of IREScontaining

mRNAs during the cell cycle and during various stress situations that can lead

to cell death. Le and Maizel have predicted that a Y-shaped double hairpin structure

followed by a small hairpin constitutes an RNA motif that can be found upstream of the

start site codon in a variety of cellular IRES elements.

IRES elements in the viral RNA genomes contain higher ordered structures whose

integrities are essential for the IRES activity. The IRES elements in Bip [6] vascular

endothelial growth factor and c-myc contain several non-contiguous sequence elements

that separately display IRES activity. These findings suggested that elusive SD-like

sequences may exist in IRES elements and that they can be isolated by functional means

from complex IRES elements. Many eukaryotic mRNAs contain rRNA like sequences in

both since and antisence orientations in their UTRs, coding and intron sequences [7]. A

functional role for these sequences was examined in the mRNAs encoding ribosomal

protein S15 [8], and the homeodomain GTX [9] both of which contain several sequence

motifs with complementarity to the 3' end of 18S rRNA. Cross linking of this sequence to

40S subunits could be accomplished and cell-free translation assays showed that the

strength of the mRNA-rRNA interactions was correlated inversely with mRNA

translation efficiency. A nine-nucleotide sequence element with complementarity to 18S

15


RNA also displayed IRES activity in the dicistronic assay that was approximately

threefold over background; however, 10 linked copies of this mini-IRES stimulated

second cistron translation by ~500-fold [10]. The same nine nucleotide sequences can

also function as a translational repressor when present in the 5' UTR of a capped

monocistronic mRNA.

Factors regulating cellular IRES elements

It is likely that the canonical eIFs that recruit 40S subunits to picornaviral IRES elements

are used in the recruitment of 40S subunits to most cellular IRES elements [11]. The role

of the noncanonical ITAFs recently explored the cellular IRES elements. Elroy-Stein

provided evidence that a phosphorylated form of hnRNP C interacts with the

differentiation-induced IRES in PDGF2 mRNA [12].

Another striking property of many cellular IRESs is that their activity shows strong cell

type-specific variation and in some instances is developmentally controlled [13]. The

inability of IRES to mediate initiation in specific cells or under specific physiological

conditions could be due to the positive or negative regulatory ITAFs that influence IRES

function but not cap-mediated initiation.

Growth regulatory genes and genes transcribed in response

to stress contain IRES elements

To identify mRNAs that require reduced concentrations of intact cap-binding complex

eIF4F, the polyribosomal association of mRNAs was examined in poliovirus-infected cells

at a time when both isoforms of eIF4G were significantly proteolyzed [14]. The analysis

revealed the classes of mRNA that were over expressed as a consequence of the viral

infection, in which the translational elongation rates were slowed and that they require

the low concentration of intact eIF4F such as the shunting-promoting late leader of

adenovirus or that contained IRES elements. (Ribosome shunting describes a pathway of

translation initiation in which ribosomes bind to the mRNA in a cap-dependent manner

but then "jump” over large regions of the mRNA containing RNA secondary structure,

upstream AUGs and open reading frames to "land" at or upstream of the initiator AUG).

Approximately 3% of the mRNAs examined remained on polysomes in infected cells;

some of those have been shown to harbour IRES elements. These IRES-containing

mRNAs encode proteins that are produced as a response to a variety of stress situations,

such as inflammation, angiogenesis, and response to serum [6]. Although the studies

have the caveat of being performed in virus-infected cells; it is significant that some of the

IRES-containing mRNAs were identified by different experimental approaches as well.

For example, the c-myc IRES have been shown to be active both during mitosis [15] and

during the induction of apoptosis [1].

16


MATERIALS AND METHODS

MATERIALS

The web resources used in this work are:

• Rfam: Rfam is a large collection of multiple sequence alignments and

covariance models representing many common non-coding RNA

families. For each family in Rfam we can:

View and download multiple sequence alignments

Read family annotation

Examine species distribution of family members

Follow links to other databases [28].

• UCSC Genome Browser- Blat: BLAT is an alignment tool like Blast

but it is structured differently. On DNA BLAT works by keeping an

index of an entire genome in memory. And also it quickly finds

sequences of 95% and greater similarity of length 40 bases or more. It

may miss more divergent or shorter sequence alignments. It finds

perfect sequence matches of 33 bases, and sometimes finds them down

to 20 bases. BLAT on proteins finds sequences of 80% and greater

similarity of length 20 amino acids or more. In practice DNA BLAT

works well on primates, and protein blat on land vertebrates. The index

consists of all non-overlapping 11-mers except for those heavily

involved in repeats. The index is used to find areas of probable

homology, which are then loaded into memory for a detailed

alignment. Protein BLAT works in a similar manner, except with 4-mers

rather than 11-mers. The protein index takes a little more than 2

gigabytes [29].

• Clustalw: A fully automatic program that produces biologically

meaningful global multiple sequence alignments of divergent

sequences where identities, similarities and differences can be seen [30].

• T Coffee: T-Coffee is a multiple sequence alignment program.

Multiple sequence alignment programs are meant to align a set of

sequences previously gathered using other programs such as Blast,

Fasta. The main characteristic of T-Coffee is that it will allow combining

results obtained with several alignment methods. For instance if we

have an alignment coming from Clustalw, an other alignment from

Dialign, and a structural alignment of some of your sequences, T-Coffee

will combine all that information and produce a new multiple sequence

having the best agreement with all these methods. By default T-Coffee

will compare all the sequences two by two, producing a global

alignment and a series of local alignments (using lailign). The program

will then combine all these alignments into a multiple alignment [31].

17


Other Bioinformatics tools used in this work are:

• cmsearch: cmsearch uses the covariance model (CM) in cmfile to

search for homologous RNAs in seqfile, and outputs the high-scoring

alignments. CM files are profiles of RNA consensus secondary

structure. A CM file is produced by the cmbuild program, from a given

RNA sequence alignment of known consensus structure. All hits of

score greater than zero bits are outputs as alignments, in the order they

are found [32].

Some of the options are:

- top only: only searches the top strand of the sequences in seqfile.

By default both strands are searched.

- Local option: Turns on the local alignment algorithm, which

means an alignment of two sub sequences of the query and the

target. In aligning the query RNA structure to a target sequence,

local alignment means starting and ending at points inside the

query structure.

• RNAalifold: RNAalifold implements an extension of the Zuker-

Stiegler algorithm for computing a consensus structure from RNA

alignments. It reads aligned RNA sequences from stdin or file.aln and

calculates their minimum free energy (mfe) structure, partition function

(pf) and base pairing probability matrix. It predicts the consensus

secondary structure for sets of aligned RNA or DNA sequences. The

algorithm works by combining the standard secondary structure energy

model with a covariance term. Each possible consensus secondary

structure is assigned the mean energy averged over all sequences in the

alignment plus bonus energies for compensatory and consistent

mutations and penalties for pairs that cannot be formed by all

sequences. Currently the input alignment has to be in ClustalW format.

It returns the mfe structure in bracket notation, its energy, the free

energy of the thermodynamic ensemble and the frequency of the mfe

structure in the ensemble to stdout. It also produces Postscript files with

plots of the resulting secondary structure graph (“alirna.ps”) and a

“dotplot” of the base pairing matrix (“alidot.ps”). The “alifold.out”will

contain a list of likely pairs sorted by credibility, suitable for viewing

with “Alidot.pl” [33].

18


METHODS

All IRES sequences were retrieved from the Rfam database. The sequences were

renamed so that the organism is clearly shown.

> AF23232……=> >Homo_sapiens

When more than one sequence was retrieved from an organism they were

distinguished by adding a number to the name.

> Homo_sapiens_1 AF3232…..

Next novel homologous sequences were identified with the UCSC genome

browser using two different methods. First, the human homologue was used to

BLAT against different genomes. Second, precomputed multiple genome

alignments available in the UCSC browser was used to identify homologues to the

human IRES sequence (see Appendix B).

In the first approach sequences used as queries in BLAT searches were retrieved

from the Rfam database (Appendix A, Table 3). Blat was used to search available

genomes, such as human, rhesus, chimp, cow, dog, rat, and chicken.

The BLAT search returns a list of one or more genome locations that match the

input sequence. In order to view the output of the alignments the “browser” link

for the match is followed. The 'details' link can be used to preview the alignment to

determine if it is of sufficient match quality. For some example see Appendix B.

For each of the IRES families a set of sequences was thus obtained that included

both previously known sequences from Rfam as well as novel sequences. Gaps

were removed and ClustalW was carried out in order to produce a multiple

alignment. In addition, cmsearch was used to analyze the individual sequences.

The local parameter used was:

% cmsearch --local [cm_model] [sequence]

Secondary structures were predicted with RNAalifold.

19


RESULTS AND DISCUSSION

A number of IRES families are listed in Rfam. Secondary structures have been

inferred for each of these families. It appears that in many instances Pfold has been

used to predict the secondary structure. In a majority of cases, there is no

experimental support for the proposed structure. Furthermore, the number of

organisms represented in most families is very small. For instance, In Bag1, L-myc,

FGF1, HIF1, VEGF A, IGF2 and mnt there are only human and mouse sequences,

whereas in Bip, FGF2 there are only human sequences.

In this project we have examined the IRES families from a bioinformatics

perspective and have attempted to find support for the structures proposed in

Rfam. We have done this by identifying homologues in animals not present in

Rfam, typically other mammals than man and rodents and lower vertebrates like

fishes.

As a first step information about different “cellular IRES” families were collected

from Rfam. An overview of the different IRES families is shown in Table 3,

Appendix A. And when available, specific databases (UCSC Genome Browser)

were browsed to obtain novel homologs. Some of the results of this analysis are

described in Table 4.

Identification of homologues to previously known

human and mouse IRESs.

In order to identify as many homologs as possible to previously known IRES

families in Rfam we made use of information available at the UCSC browser as

described under 'Materials and Methods'. The resulting sequences are listed in

Table 4.

For a number of IRES families, homologues were identified in a large number of

organisms not previously listed in Rfam. Examples include Bag-1, Bip, n_myc,

FGF2, L_myc, APC, IGF2 and HSP 70. where homologues were identified in all

vertebrates, including fishes. In the case of other families, like FGF1, KV1.4, HIF1,

mnt, VEGF A, and c-myc no homologues could be detected in organisms more

deeply branching than man/rodents. Thus, it would therefore appear that the

primary sequences of these IRESs are poorly conserved in evolution.

As the aim of this study was to analyze the phylogenetic support for the proposed

secondary structures of the different IRESs, the examples where we found

homologues in all vertebrates are of particular interest. Therefore, the discussion

below will be focussed on this group of IRESs.

20


Analysis of novel sequences with Rfam CMs.

All available IRES sequences of the respective Rfam families, including the novel

sequences identified here, were aligned using ClustalW to further verify and

examine the evolutionary relationship between the members of each family. For

some examples of multiple alignments see the Appendix C.

In order to examine how all novel sequences adapt to the respective Rfam CM,

individual sequences were analyzed with the relevant CMs using cmsearch. The

results are shown in Table 4.

Rfam families where we identified a fish homologue are of particular interest. In a

majority of cases the fish sequence scored very poorly, in most cases such that no

hit was reported at all by cmsearch. The results as to the different IRESs where we

found a fish homologue may be summarized as follows.

For the Rfam families Bip, N-myc, FGF2 and IGF2 we had access to homologous

fish sequences that aligned well with previously available mammalian sequences

but there was no evidence from cmsearch that the fish sequences adapt to the

relevant Rfam models.

On the basis of multiple alignments for the families Bag1 and HSP70 we can be in

doubt as to whether the Tetraodon/Fugu sequences are actually true homologues

or not. Therefore, it is difficult to conclude from the cmsearch results in this case.

In the case of L-myc we obtained a very good score with cmsearch analyzing the

Fugu sequence. However, the primary sequence is nearly identical to the other

vertebrate sequences and for this reason we are not able to make conclusions about

conserved secondary structure in this case.

We also carried secondary structure predictions using RNAalifold. In this case we

used ClustalW alignments as input. In all cases (IRES Bag-1, bip, n-myc, c-myc, Lmyc,

FGF1, FGF2,mnt, kv1.4, VEGF A, HIF1, IGF2, APC, HSP 70,) the structure

predicted by RNAalifold was entirely inconsistent with the secondary structure

proposed in Rfam. (See the Appendix E).

21


CONCLUSIONS

For most of the Rfam families the secondary structure is not known. Pfold has been used

to predict the IRES structure of FGF2, MNT, BIP, HSP-70 and BAG-1 in the Rfam

database using UTRs from several transcripts or orthologs. Many of these models are not

reliable as they are based on a small number of sequences and the set of sequences is

highly biased against a very small number of organisms.

To improve on the prediction of secondary structure we have here identified a number of

novel homologous IRES sequences. A complete list of previously known Rfam families

corresponding to IRESs was first compiled. Novel homologs were identified using Blat

or by making use of precomputed genome alignments available through the UCSC

browser. Novel homologues which were not previously reported in Rfam were identified

in a number of vertebrates including Canis familiaris, Bos taurus, Opossum, Gallus

gallus, Xenopus tropicalis, Fugu rubripes and Tetradon nigroviridis.

For none of the Rfam families we were able to find support for the postulated secondary

structures using any of the novel sequences. Furthermore, the results of RNAalifold to

predict the secondary structure on the basis of a multiple alignment produced by the

clustalW were entirely inconsistent with the model proposed by Rfam. These results

would suggest that many of the secondary structures proposed by Rfam are not correct.

An alternative explanation to our results is that the secondary structure of the IRES is not

well conserved among the vertebrates. It should also be noted that for some of the IRES

families we lack the experimental support for a biologically significant secondary

structure. Hence, in some of these cases we should not expect to find a conserved

secondary structure at all.

22


REFERENCES:

[1] Anton A. Komar and Maria Hatzoglou. Internal Ribosome Entry sites in Cellular

mRNAs: Mystery of Their Existence. J. Biol. Chem., Vol. 280,

[2] K. Mitra, et al. Structure of the E.coli protein conducting channel bound to at

translating ribosome. Nature (2005), vol438.

[3] Christopher U.T. Hellen, 1, 3 and Peter Sarnow 2, 3.Internal ribosome entry sites in

eukaryotic mRNA molecules.

[4] Nomoto, A., Kitamura, N., Golini, F., and Wimmer, E. 1977. The 5'-terminal structures

of poliovirion RNA and poliovirus mRNA differs only in the genome-linked protein

VPg. Proc. Natl. Acad. Sci. 74: 5345-5349.

[5] Pelletier, J., Flynn, M.E., Kaplan, G., Racaniello, V., and Sonenberg, N. 1988.

Mutational analysis of upstream AUG codons of poliovirus RNA. J. Virol. 62: 4486-4492.

[6] Yang, Q. and Sarnow, P. 1997. Location of the internal ribosome entry site in the 5'

non-coding region of the immunoglobulin heavy-chain binding protein (BiP) mRNA:

Evidence for specific RNA-protein interactions. Nucleic Acids Res. 25: 2800-2807.

[7] Mauro, V.P. and Edelman, G.M. 1997. rRNA-like sequences occur in diverse primary

transcripts: Implications for the control of gene expression. Proc. Natl. Acad. Sci. 94: 422-

427.

[8] Tranque, P., Hu, M.C., Edelman, G.M., and Mauro, V.P. 1998. rRNA complementarity

within mRNAs: A possible basis for mRNA-ribosome interactions and translational

control. Proc. Natl. Acad. Sci. 95: 12238-12243.

[9] Hu, M.C., Tranque, P., Edelman, G.M., and Mauro, V.P. 1999. rRNA-complementarity

in the 5' untranslated region of mRNA specifying the Gtx homeodomain protein:

Evidence that base- pairing to 18S rRNA affects translational efficiency. Proc. Natl. Acad.

Sci. 96: 1339-1344.

[10] Chappell, S.A., Edelman, G.M., and Mauro, V.P. 2000a. A 9-nt segment of a cellular

mRNA can function as an internal ribosome entry site (IRES) and when present in linked

multiple copies greatly enhance IRES activity. Proc. Natl. Acad. Sci. 97: 1536-1541.

[11] Hayashi, S., Nishimura, K., Fukuchi-Shimogori, T., Kashiwagi, K., and Igarashi, K.

2000. Increase in cap- and IRES-dependent protein synthesis by overproduction of

translation initiation factor eIF4G. Biochem. Biophys. Res. Commun. 277: 117-123.

[12] Ghetti, A., Pinol-Roma, S., Michael, W.M., Morandi, C., and Dreyfuss, G. 1992.

HnRNP I, the polypyrimidine tract-binding protein: Distinct nuclear localization and

association with hnRNAs. Nucleic Acids Res. 20: 3671-3678.

23


[13] Creancier, L., Morello, D., Mercier, P., and Prats, A.C. 2000. Fibroblast growth factor

2 internal ribosome entry site (IRES) activity ex vivo and in transgenic mice reveals a

stringent tissue-specific regulation. J. Cell Biol. 150: 275-281.

[14] Johannes, G., Carter, M.S., Eisen, M.B., Brown, P.O., and Sarnow, P. 1999.

Identification of eukaryotic mRNAs that are translated at reduced cap binding complex

eIF4F concentrations using a cDNA micro array. Proc. Natl. Acad. Sci. 96: 13118-13123

[15] Pyronnet, S., Pradayrol, L., and Sonenberg, N. 2000. A cell cycle-dependent internal

ribosome entry site. Mol. Cell. 5: 607-616.

[16] Stoneley, M., Chappell, S.A., Jopling, C.L., Dickens, M., MacFarlane, M., and Willis,

A.E. 2000a. C-myc protein synthesis is initiated from the internal ribosome entry segment

during apoptosis. Mol. Cell. Biol. 20: 1162-1169.

[17] Lopez de Quinto,S and saiz M, de la Morena,D,sobrino,F,and Martinez-Salas E

2002.IRES driven translation is simulated separately by the FMDV 3’NCR and poly (A)

sequence .Nucleic acids Res 30:4398-4405.

[18] Dobrikov, E Florez, P, Bradrick, S., and Gromeirer, M.2003.Activity of the type

1picornavirus internal ribosome entry site is determined by sequences with the 3’

nontranslated region .proc.natl.acad.sci 100:15125-15130.

[19] Mazumder, B., Seshadri, V., and Fox, P.L .2003.translational control by the 3’UTR:

The ends specify the means. Trends Biochem.sci 28:91-98.

[20]Pilipenko, E.V., Blinov, V.M., Romanova, L.I., sinyakov, Y, N., Maslova, S., V and

Agol,V.I .1989b.conserved structural domains in the 5’-Untranslated region of

picornaviral genomes. An analysis of the segment controlling the translation and

neurovirulence.Virology 168:201-209.

[21] Honda, M., Beard, M.R., ping, L.H., and Lemon, S.M.1999.A phylogenetically

conserved stem-loop structure at the 5’border of the internal ribosome entry site of

hepatites C virus is required for the cap –independent viral translation. J Viorl 73:1165-

1174.

[22] Wang, C .,Le S.Y., Ali,N., and Siddiqui, A.1995.An RNA pseudoknot is an essential

structural element of the internal ribosome entry site located within the hepatites C virus

5’ noncoding regions.RNA1:526-537.

[23] Keift, J.S., Zhou, K., Jublin. And Doudna, J.A.2001. Mechanism of ribosome

recruitment by hepatitis C IRES RNA.RNA 7:194-196.

[24]Pickering, B.M., Mitchell,S.A sprigs, K,A., stonley,M., and Willis A,E.2004.Bag-

1internal ribosome entry segment activity is promoted by structural changes mediated

by poly(rC) binding protein 1 Mol cell.Biol.24:5595-5605.

24


[25] Le Quesne, J.P., stoneley, M., Fraser, G.A., and Willis, A.E 2001.Derivation of a

structural model for the c-myc IRES J.Mol.biol 310:111-126.

[26] Cencig ,S., Nanbru,C., Le,S,Y., Gueydan,C., Huez., G., and Kruys,V.2004 .Mapping

and characterization of the minimal internal ribosome entry segment in the human c-

Myc mRNA 5’ untranslated region. Oncogene 23: 267-277.

[27] Martineau ,Y., Le Bec, C., Monbrun ,L.,Allo,V., Chiu,I.M., Danos,O., Moine,H.,

Prats,H.,and Prats ,A.C. 2004 .Internal ribosome entry site structural motifs conserved

among mammalian fibroblast growth factor 1 alternatively spliced mRNAs Mol. Cell

.biol. 24:7622-7635.

[28] http://www.sanger.ac.uk/Software/Rfam

[29] http://genome.ucsc.edu/cgi-bin/hgBlat?command=start

[30] http://www2.ebi.ac.uk/clustalw

[31] www.tcoffee.org

[32] infernal.tar.gz, version 0.70.

[33] http://www.tbi.univie.ac.at/RNA/ALIDOT/

[34] www.cbse.ucsc.edu/~jsp/EvoFold

[35] Sam Griffiths-Jones*, Alex Bateman, Mhairi Marshall, Ajay Khanna 1 and Sean R.

Eddy 1. Rfam: an RNA family database.

[36] Stephen D. Baird 1,4, Marcel Turcotte 2, Robert G. Korneluk 1,3,4 and Martin Holcik 1,3,4 .

Searching for IRES.

[37]http://www.sanger.ac.uk//Software/Rfam/help/software.shtml#ViennaRNA

[38] http://www.tbi.univie.ac.at/~ivo/RNA/RNAalifold.html

[39] web.indstate.edu/thcme/mwking/protein-synthesis.html

Eukaryotic Initiation Factors and Their Functions. The specific non-ribosomally

associated proteins required for accurate translational initiation.

[40] en.wikipedia.org/wiki/Internal_ribosome_entry_site

Internal ribosome entry site. From Wikipedia, An internal ribosome entry site is a

nucleotide sequence that allows for translation initiation...

[41] http://departments.oxy.edu/biology/Stillman/bi221/091300/091300

Matthews and VanHolde, Biochemistry 4th ed. (2000))

25


Table 3: List of Cellular IRES families in Rfam

Family Description Accession

number

IRES _Bag1 Involved in the heat shock response,

enhances anti-apoptotic properties of Bc12.

RF00222

IRES _Bip

Involved in the heat shock response.

Essential for the survival of cells under

stress

RF00223

IRES _n-myc Involved in the control of cell growth, RF00226

differentiation and apoptosis

IRES_c-myc Involved in cell proliferation, RF00216

transformation and death of the c-myc

proto-oncogene.

IRES_FGF2 Involved in cell proliferation, RF00224

IRES_L-myc

differentiation, survival, wound healing

and embryogenesis.

Involved in the control of cell growth, RF00261

differentiation and apoptosis.

IRES _FGF1 Involved in cell proliferation, RF00387

differentiation, survival, wound healing,

and embryogenesis.

IRES_KV1.4 Mediates internal ribosome entry via RF00447

Voltage-gated potassium channel Kv1.4.

IRES_HIF1

Involved

hypoxia.

in the cellular response to RF00449

IRES_mnt

Transcriptional repressor related to the myc

/ mad family.

RF00457

IRES_VEGF A Involved in the embryogeneic development

and wound healing.

RF00461

IRES_APC

Tumour suppressor gene, associated with

APC disease.

RF00462

IRES_IGF2

Involved in rapidly dividing cells during

development.

RF00483

IRES_Hsp70 Allows translation during shock and stress. RF00495

26


Table 4: List of organisms present in different IRES

Families and novel homologs found.

IRES

Family in

Rfam

IRES Bag-1

IRES

Family in

Rfam

IRES_Bip

Organisms Organisms

Present in

Rfam

Mus

musculus_1

Homo

sapiens_1

Homo

sapiens_2

27

Novel

homologs

Accession no,

start and end

positions

+ - BC069918.1/9-386 27.58

cmsearch

Bit scores

+ - AF116273.1/1-363 140.20

- + chr15:43962590-

43963260

Rattus

- + chr5:58341786norvegicus

58342457

Canis familiaris - + chr11:53393383-

53393876

Bos taurus - + chr8:40364419-

40364929

Macaca mulatta - + chr11:53393383-

53393876

Mus

- + chr4:41136423musculus_2

41137091

Tetradon

- + chr13:9838659nigrovirides

9839140

Organisms Organisms

Present in

Rfam

Novel

homologs

Accession no,

start and end

positions

140.20

____

____

____

____

____

____

cmsearch

Bit scores

Homo

sapiens_1

+ - X87949.1/8-255 75.02

Homo

- + chr9:127043100- ____

sapiens_2

127043637

Pan troglodytes - + chr9:124964482-

124965011

70.84

Macaca mulatta - + chr15:101398420-

101398971

31.90

Tetradon

- + chr14611823- ____

nigroviridis

14612374


IRES

Families in

Rfam

IRES_nmyc

IRES Family

in Rfam

IRES_FGF2

IRES

Families in

Rfam

IRES_L-myc

Organisms Organisms

Present in

Rfam

28

Novel

homologs

Accession no,

start and end

positions

cmsearch

Bit scores

Homo sapiens + - X06993.1/138-376 90.28

Spermophilus

beechyie

+ - X93018.1/698-1022 96.44

Mus

musculus_1

+ - X06993.1/138-376 53.92

Mus

- + chr12:12967479- _____

musculus_2

12968004

Pan troglodytes - + chr2a:16232376-

16232846

_____

Macaca mulatta - + chr13:15939472-

15939941

90.28

Bos taurus - + 77787:30-499 65.81

Tetradon

- + chrUn_random:117 ____

nigroviridis

92490-11793057

Organisms Organisms

Present

in Rfam

Homo

sapiens_1

Homo

sapiens_2

Macaca

mulatta

Fugu

rubripes

Organisms Organisms

Present in

Rfam

Novel

homologs

Accession no,

start and end

positions

+ - S81809.1/991-1486 68.31

- + chr4:123967191-

123967978

- + chr4:126227469-

126228061

- + chrUn:202570304-

202571033

Novel

homologs

Accession no,

start and end

positions

cmsearch

Bit scores

70.53

66.53

_____

cmsearch

Bit scores

Murine + - X13945.1/710-931 11.19

Homo

sapiens_1

+ - M19720.1/224-444 14.65

Homo

- + chr1:40139960- 12.96

sapiens_2

40140470

Pan

- + chr1:40540474- 9.21

troglodytes

40540983

Canis

- + chr15:6011270- 11.19

familiaris

6011801

Bos taurus - + scaffold712:268597-

269106

11.19

Rattus

- + chr5:142310558- 7.94

norvegicus

142311069

Fugu

- + chrUn:202570304- 12.96

rubripes

202571033


IRES Family

in Rfam

IRES_FGF1

IRES Family

in Rfam

IRES_Kv1.4

Organisms Organisms

Present in

Rfam

Mus

musculus_1

Homo

sapiens_1

Pan

troglodytes

Macaca

mulatta

Canis

familiaris

29

Novel

homologs

Accession no,

start and end

positions

cmsearch

Bit scores

+ - AF067191.1/1071-

1232

111.24

+ - M60515.1/145-312 114.63

- + chr5:144505672-

144506021

- + chr6:139129398-

139129855

- + chr17:39193107-

39193554

Bos taurus - + chr22:30011022-

30011576

Rattus

- + chr18:31868289norvegicus

31868744

Mus

- + chr18:39054994musculus_2

39055446

Organisms Organisms

Present in

Rfam

Mus

musculus_1

Rattus

norvegicus_1

Homo

sapiens_1

Novel

homologs

Accession no,

start and end

positions

111.8

111.8

109.46

109.46

84.38

109.46

cmsearch

Bit scores

+ - BX293548.5/121402

-121143

69.94

+ - AX16002.1/236-494 80.34

+ - AC124657.5/13272

8-132987

Bos taurus_1 + - AF286022.1/918-

1158

Homo

- + chr11:29990705sapiens_2

29991254

Macaca

- + chr14:41761952mulatta

41762501

Pan

- + chr11:30186396troglodytes

30186945

Mus

- + chr2:107095448musculus_2

107095960

Bos Taurus_2 - + chr15:37814647-

37815177

Rattus

- + chr3:92781585norvegicus_2

92782070

76.31

75.97

76.31

76.00

76.31

75.71

75.9

80.34


IRES Family

in Rfam

IRES_HIF1

IRES Family

in Rfam

IRES_mnt

Organisms Organisms

Present in

Rfam

Homo

sapiens_1

Mus

musculus_1

Mus

musculus_2

Rattus

norvegicus

Organisms Organisms

Present in

Rfam

Mus

musculus_1

Homo

sapiens-1

Pan

troglodytes

Macaca

mulatta

Canis

familiaris

30

Novel

homologs

Accession no,

start and end

positions

+ - BC026139.1/3-278 72.18

+ - U59496.1/1-270 61.15

- + chr12:74826769-

74827326

- + chr6:96418687-

96419245

Novel

homologs

Accession no,

start and end

positions

cmsearch

Bit scores

71.49

38.68

cmsearch

Bit scores

+ - Y07609.1/151-354 124.97

+ - AC006435.7/74032-

73830

- + chr1:52080378-

52080702

- + chr16:2194298-

2194790

- + chr9:49809084-

49809577

Bos Taurus - + chr19:17937161-

17937649

Mus

- + chr11:74647230musculus_2

74647722

Rattus

- + chr10:62146693norvegicus

62147184

Neomysis

- + chr8:34772600mercedis

34772948

Homo

- + chr17:2250660sapiens-2

2251152

129.39

127.07

127.07

103.48

106.17

129.39

124.75

118.27

129.39


IRES Family

in Rfam

IRES_APC

IRES Family

in Rfam

IRES_VEGF

A

Organisms Organisms

Present in

Rfam

31

Novel

homologs

Accession

no, start and

end

positions

cmsearch

Bit scores

Xenopus laevis + - 1U64442.1/5

11-563

41.43

Gallus gallus + - BX932893.1 40.23

Mus musculus_1 + - M88127.1/50

5-557

38.33

Homo sapiens_1 + - M73548.1/54

9-601

42.64

Homo sapiens_2 - + chr5:1121392

19-112144613

5.66

Pan troglodytes - + chr5:1140404

37-114040755

5.66

Macaca mulatta - + chr6:1090963

14-109096632

7.56

Mus musculus_2 - + chr18:343972

34-34401592

6.24

Fugu rubripes - + chrUn:929509

95-92951444

____

Organisms Organisms

Present in

Rfam

Novel

homologs

Accession

no, start and

end

positions

Mus musculus_1 + - AC127690.4/

2577-2882

Spalex judei + - AJ544164.1/1

765-2070

Homo sapiens_1 + - BC011177.1/

220-528

Bovine heparin + - M32976.1/25

2-560

Homo sapiens_2 - + chr6:4384657

4-43847172

Pan troglodytes - + chr6:4480751

5-44807979

Macaca mulatta - + chr4:4368327

5-43683776

Canis familiaris - + chr12:152125

92-15213161

Bos Taurus - + chr23:180856

31-18086108

Mus musculus_2 - + chr17:454948

29-45495424

Rattus

- + chr9:1052135

norvegicus

9-10521954

cmsearch

Bit scores

10.68

10.59

0.70

0.70

10.68

10.68

10.68

10.68

10.68

10.68

12.66


IRES Family

in Rfam

IRES _IGF2

IRES Family

in Rfam

IRES _HSP70

Organisms Organisms

Present in

Rfam

32

Novel

homologs

Accession

no, start and

end

positions

Mus musculus_1 + - X71918.1/588

-708

Rattus

+ - X14833.1/165

norvegicus

-285

Homo sapiens_1 + - AC132217.15

/106569-

106689

Homo sapiens_2 - + chr11:211865

1-2119060

Pan troglodytes - + chr11:218952

5-2189934

Macaca mulatta - + chr14:214849

9-2148911

Canis familiaris - + chr18:493281

61-49328571

Bos taurus - + scaffold52535

:660-1066

Rattus

- + chr1:2029166

norvegicus

54-202917063

Fugu rubripes - + chrUn:929509

95-92951444

Organisms Organisms

Present in

Rfam

Novel

homologs

Accession

no, start and

end

positions

Chlorocebus

+ - X70684.1/2aethiops.

184

Macca mulatta_1 + - AC148662.1/

67681-67897

Homo sapiens_1 - + AB018045.1/

3434-3652

Homo sapiens_2 - + chr6_qbl_hap

2:3030873-

3043525

Pan troglodytes - + chr6:3239685

7-32397309

Macaca

- + chr4:3146473

mulatta_2

3-31465200

Tetradon

- + chr13:983865

nigroviridis

9-9839140

Fugu rubripes - + chrUn:131662

204-

131662494

cmsearch

Bit scores

116.50

128.64

130.06

130.06

121.28

130.06

129.52

124.90

128.65

____

cmsearch

Bit scores

136.72

159.90

169.01

174.74

173.64

164.76

____

____


IRES Family

in Rfam

IRES _c-myc

Organisms Organisms

Present in

Rfam

33

Novel

homologs

Accession

no, start and

end

positions

Mus musculus_1 + - J00603.1/348-

39

Rattus

+ - M18819.1/22

norvegicus

3-533

Woodchuck + - M35498.1/10

05-1299

Macaca mulatta + - AY232835.1/

377-679

Homo sapiens_1 + - M13930.1/81

0-1111

Hylobates

+ - M88115.1/10

pileatus

07-1310

Callithrix jacchus + - M88116.1/13

11-1614

Felis catus + - M22726.1/25

8-560

Sus_Scorfa + - X97040/1238-

1567

Mus musculus_2 - + chr15:618151

00-61815735

Homo sapiens_2 - + chr8:1288175

78-128818167

Macaca

- + chr8:1303492

mulatta_2

08-130349728

Bos taurus - + chr14:656175

0-6562273

Rattus

- + chr7:9895318

norvegicus

9-98953825

Neomysis

- + chr3:4039969

mercedis

01-403997486

Canis familiaris - + chr13:282379

20-28238497

cmsearch

Bit scores

57.99

92.54

125.66

264.92

279.02

274.76

261.62

261.63

129.61

84.90

279.02

264.92

94.11

98.90

86.34

96.87


APPENDIX A

IRES Description:

[1] IRES_Bag1

The family is represents the bag-1 internal ribosome entry site (IRES) when expressed the

Bag1 protein is known to enhance the anti-apoptotic properties ofBc12, although Bag-1

translation occurs via a cap dependent mechanism. It has been found to contain IRES in

its 5’UTR. Translation via the IRES has been found to be common following the heat

shock when cap dependent scanning is compromised.

[2] IRES_ Bip

This family represents the bip internal ribosome entry site (IRES). Bip protein expression

has been found to be significantly enhanced by the heat shock response due to IRESdependent

translation. It is thought that this translational mechanism is essential for the

survival of cells under stress.

[3] IRES_ n-myc

This family represents the n-myc internal ribosome entry site (IRES). The myc family of

genes when expressed are known to be involved in the control of cell growth,

differentiation and apoptosis. n-myc mRNA has an alternative method of translation via

internal ribosome entry where ribosomes are recruited to the IRES located in the 5’UTR

thus bypassing the typical eukaryotic cap-dependent translation pathway.

[4] IRES _c-myc

This family represents the c-myc internal ribosome entry site (IRES). The mammalian cmyc

gene is a proto-oncogene which is required for cell proliferation, transformation and

death. c-myc mRNA has an alternative method of translation via internal ribosome entry

where ribosomes are recruited to the IRES located in the 5’ UTR thus bypassing the

typical eukaryotic cap-dependent translation pathway.

[5] IRES _FGF2

This family represents the fibroblast growth factor 2 (FGF-2) internal ribosome entry site

(IRES). When expressed the FGF-2 protein plays a pivotal role in cell proliferation,

differentiation and survival as well as being involved in wound healing [1, 2]. It has been

found that FGF-2 IRES activity is strictly controlled and highly tissue specific. It is

thought translational IRES dependent activation of FGF-2 plays a pivotal role in

embryogenesis and in the adult brain.

[6]IRES L-Myc

The family represents the L-myc internal ribosome entry site (IRES) .The myc family of

genes when expressed are known to be involved in the control of cell growth,

34


differentiation and apoptosis L-myc undergoes translation via internal ribosome entry

and by passes the typical eukaryotic cap-dependent translation pathway.

[7] IRES_FGF1

This family represents the FGF1 internal ribosome entry site (IRES). The FGF-1 IRES is

present in the 5’ UTR of the mRNA and allows cap-independent translation. It is thought

that FGF-1 IRES activity is strictly controlled and highly tissue specific.

[8] IRES _KV1.4

This family represents the kv1.4 voltage–gated potassium channel internal ribosome

entry sites (IRES). This region has been shown to mediate internal ribosome entry in cells

derived from brain, heart, and skeletal muscle, tissue known to express kv1.4mRNA

species.

[9] IRES_HIF1

This family represents the hypoxia –inducible factor 1-alpha internal ribosome entry site.

HIF-1a is a subunit of the HIF-1 transcription factor, which induces transcription of

several genes involved in the cellular response to hypoxia. The HIF-1a IRES allows

translation to be maintained under hypoxia cell conditions that inhibit cap dependent

translation.

[10] IRES_mnt

This family represents the mnt internal ribosome entry site (IRES). Mnt is a

transcriptional repressor related to the myc /mad family of transcription factors. It is

thought that this family allows efficient mnt synthesis when cap–dependent translation

is reduced.

[11] IRES _VEGF A

This family represents the vascular endothelial growth factor (VEGF) internal ribosome

entry site (IRES). VEGF is an endothelial cell mitogen with many crucial functions such

as embryogenic development and wound healing. The 5’ UTR of VEGF mRNA contains

two IRES elements which are able to promote efficient translation at the AUG start

codon.

[12] IRES_APC

This family represents an APC internal ribosome entry site (IRES) which is located in the

coding sequence of the gene. APC is a tumour suppressor gene which is associated with

the inherited disease adenomatous polyposis coli (APC). It is thought that IRES-

mediated translation of APC is important for an apoptic cascade.

35


[13] IRES_Hsp70

This family represents heat shock protein 70 (Hsp70) internal ribosome entry site (IRES)

which allows cap independent translation during conditions such as heat shock and

stress.

[14] IRES_IGF2

This family represents the insulin growth factor 11 (IGF 11) internal ribosome entry site

(IRES) which is found in the 5’UTR of IGF-11 leader 2 mRNA. This family allows capindependent

translation of the mRNA and it is thought that this family may facilitate a

continuous IFG-11 production in rapidly dividing cells during development.

36


APPENDIX B

UCSC Genome Browser

37


APPENDIX C

MULTIPLE SEQUENCE ALIGNMENT

EXAMPLE OF IRES Vegf A FAMILY

38


APPENDIX D

cmsearch output example

sequence: Homo_sapiens_1

hit 0 : 15 214 75.02 bits

Alignment strategy:CYKDivideAndConquer:nb:small

***returning from CYKDivideAndConquer() sc: 75.022003

:::((((((,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)))))):::

181 cugGACUGCCUGCUGCUGCCCAACUGGCUGGCAAGAUGAAGCUCUCCCUGGuggccgCGA 240

CUGGACUGCCUGCUGCUGCCCAACUGGCUGGCAA

181 CUGGACUGCCUGCUGCUGCCCAACUGGCUGGCAA-------------------------- 214

::::::::

241 UGCUGCUG 248

- -------- -

Annotation: Base pairs are annotated by nested matching pairs of symbols , (), [], or {}.

The different symbols indicate the "depth" of the helix in the RNA structure as follows :

are used for simple terminal stems; () are used for "internal" helices enclosing a

multifurcation of all terminal stems; [ ] are used for internal helices enclosing a

multifurcation that includes at least one annotated () stem already; and {} are used for all

internal helices enclosing deeper multifurcations.

Hairpin loops: Hairpin loop residues are indicated by underscores _. Simple stem loops

stand out as e.g. .

Bulge, interior loops: Bulge and interior loop residues are indicated by dashes -.

Multifurcation loops: Multifurcation loop residues are indicated by commas,. The

mnemonic is "stem 1, stem2", e.g. ,,.

External residues: Unstructured single stranded residues completely outside the

structure (unenclosed by any base pairs) are annotated by colons :.

Insertion: Insertions relative to a known structure are indicated by periods.. Regions

where local structural alignment was invoked, leaving regions of both target and query

sequence unaligned, are indicated by tildes ˜. These symbols only appear in alignments

of a known (query) structure annotation to a target sequence of unknown structure.

39


Pseudo knot: WUSS notation allows pseudoknots to be annotated as pairs of upper

case/lower case letters: for example, aaaa annotates a simple

pseudoknot; additional pseudoknotted stems could be annotated by Bb, Cc, etc.

INFERNAL cannot handle pseudoknots, however; pseudoknot notation never appears in

INFERNAL output; it is accepted in input files, but ignored.

40


APPENDIX E

COMPARISON OF Rfam MODEL VS RNAalifold

RNAalifold prediction

(secondary structure)

Description: IRES FGF2 family

The structure predicted by RNAalifold was entirely inconsistent with the

secondary structure proposed in Rfam.

41

Rfam model prediction

(secondary structure)

More magazines by this user
Similar magazines