05.01.2013 Views

MiR-155 Target Prediction and Validation in Nasopharyngeal ... - KTH

MiR-155 Target Prediction and Validation in Nasopharyngeal ... - KTH

MiR-155 Target Prediction and Validation in Nasopharyngeal ... - KTH

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>MiR</strong>-<strong>155</strong> <strong>Target</strong> <strong>Prediction</strong><br />

<strong>and</strong> <strong>Validation</strong> <strong>in</strong> <strong>Nasopharyngeal</strong><br />

Carc<strong>in</strong>oma<br />

ILQAR ABDULLAYEV<br />

Master of Science Thesis<br />

Stockholm, Sweden 2010


<strong>MiR</strong>-<strong>155</strong> <strong>Target</strong> <strong>Prediction</strong><br />

<strong>and</strong> <strong>Validation</strong> <strong>in</strong> <strong>Nasopharyngeal</strong><br />

Carc<strong>in</strong>oma<br />

ILQAR ABDULLAYEV<br />

Master’s Thesis <strong>in</strong> Biomedical Eng<strong>in</strong>eer<strong>in</strong>g (30 ECTS credits)<br />

at the Computational <strong>and</strong> Systems Biology Master Programme<br />

Royal Institute of Technology year 2010<br />

Supervisor at CSC was Erik Aurell<br />

Exam<strong>in</strong>er was Anders Lansner<br />

TRITA-CSC-E 2010:164<br />

ISRN-<strong>KTH</strong>/CSC/E--10/164--SE<br />

ISSN-1653-5715<br />

Royal Institute of Technology<br />

School of Computer Science <strong>and</strong> Communication<br />

<strong>KTH</strong> CSC<br />

SE-100 44 Stockholm, Sweden<br />

URL: www.kth.se/csc


<strong>MiR</strong>-<strong>155</strong> target prediction <strong>and</strong> validation <strong>in</strong><br />

nasopharyngeal carc<strong>in</strong>oma<br />

Abstract<br />

MicroRNAs (miRNAs) play an important role <strong>in</strong> controll<strong>in</strong>g gene expression <strong>in</strong> Eukaryotes.<br />

They target many mRNAs <strong>and</strong> either degrade them or <strong>in</strong>hibit their translation <strong>in</strong>to<br />

prote<strong>in</strong>. Thus f<strong>in</strong>d<strong>in</strong>g targets of miRNAs has been a hot topic s<strong>in</strong>ce their first discovery.<br />

Many prediction tools have been designed for the purpose of target prediction. Different<br />

tools use different approaches, <strong>and</strong> hence they predict different targets. Thus f<strong>in</strong>d<strong>in</strong>g the<br />

best work<strong>in</strong>g tool or comb<strong>in</strong>ation of tools is important. MicroRNA-<strong>155</strong> (miR-<strong>155</strong>) is one<br />

of well-studied miRNAs which is associated (mostly upregulated) to numerous diseases<br />

<strong>in</strong>clud<strong>in</strong>g nasopharyngeal carc<strong>in</strong>oma (NPC) - one of the most common malignancies <strong>in</strong><br />

certa<strong>in</strong> areas of South-Ch<strong>in</strong>a, <strong>and</strong> Africa. This project aims to f<strong>in</strong>d the best scor<strong>in</strong>g miRNA<br />

prediction tool, implement<strong>in</strong>g it on miR-<strong>155</strong>, compared to the result from Microarray<br />

experiment <strong>and</strong> <strong>in</strong> this way shed some light on NPC.<br />

Målförutsägelse och valider<strong>in</strong>g av Mir-<strong>155</strong> i nasofaryngealt<br />

carc<strong>in</strong>oma<br />

Sammanfattn<strong>in</strong>g<br />

MikroRNA (miRNA) spelar en stor roll vid regler<strong>in</strong>g av genuttrycken i eukaryoter. En<br />

betyd<strong>and</strong>e del av cellers mRNA påverkas av sådana miRNA, ant<strong>in</strong>gen genom nedbrytn<strong>in</strong>g<br />

eller genom att translationen till prote<strong>in</strong>erna hämmas. Att söka efter mål för olika miRNA<br />

har därför varit ett hett änme allt sedan miRNA först upptäcktes. Många olika verktyg har<br />

designats och utvecklats för detta syfte och att hitta det bästa verktygen är därför viktigt.<br />

MikroRNA-<strong>155</strong> (miR-<strong>155</strong>) är ett välstuderat miRNA, associerat med ett flertal olika<br />

sjukdommar såsom till exempel nasopharyngeal carc<strong>in</strong>oma (NPC) - en av det vanligast<br />

förekom<strong>and</strong>e elakartade cancrarna i vissa delar av södra K<strong>in</strong>a och Afrika. Detta projekt har<br />

som mål att hitta det bästa verktyget för miRNA prediktion, implementera det på miR-<strong>155</strong>,<br />

för att sedan korrelatera det med redan funna resultat från microarray experiment och på så<br />

sätt öka förståelsen av NPC.


Aknowledgement<br />

I would like to thank some people First of all, I would like to thank for my supervisor Erik<br />

Aurell for tak<strong>in</strong>g me to his group <strong>and</strong> <strong>in</strong>troduc<strong>in</strong>g me to his collaborators, <strong>in</strong> which I ended<br />

up do<strong>in</strong>g my thesis. He also helped me to learn mak<strong>in</strong>g good research <strong>and</strong> to improve my<br />

writ<strong>in</strong>g skills. I would also like to thank for Aymeric for his valuable contributions<br />

especially about computational prediction part of my thesis.<br />

Socondly, I am grateful for my supervisor at Microbiology, Tumor <strong>and</strong> Cell Biology<br />

Department of Karol<strong>in</strong>ska Institute, Professor Ingemar Ernberg, for provid<strong>in</strong>g me this thesis<br />

<strong>and</strong> susta<strong>in</strong><strong>in</strong>g the suitable scientific environment as well as experimental platform. I am<br />

deeply thankful for his doctorate student – Zim<strong>in</strong>g Du, for help<strong>in</strong>g me do<strong>in</strong>g wet-lab<br />

experiments. I learned a lot from Zim<strong>in</strong>g.<br />

I would like to thank for my friends – Rustam, Rasim, Emre, Alej<strong>and</strong>ro, James,<br />

Shaghayegh who supported <strong>and</strong> motivated me through the entire process. I would also like<br />

to thank Ann Bengston for her coord<strong>in</strong>ations dur<strong>in</strong>g the adm<strong>in</strong>istrative processes. My<br />

special thanks go for my family who believed <strong>and</strong> supported me through my entire life<br />

whatever the conditions are. I am <strong>and</strong> always will be deeply grateful for them. F<strong>in</strong>ally, all<br />

my heart goes for my lovely wife – Aysegul.


Table of Contents<br />

1. Introduction ....................................................................................................................... 1<br />

1.1 General Information about miRNAs ........................................................................... 1<br />

1.1.1 Biogenesis ........................................................................................................... 1<br />

1.1.2 Plant miRNA target prediction works perfect ..................................................... 4<br />

1.1.3 Animal miRNAs ................................................................................................. 4<br />

1.2. <strong>Target</strong> <strong>Prediction</strong> of miRNAs .................................................................................... 6<br />

1.2.1 Features/Parameters for miRNA target prediction ............................................... 6<br />

1.2.2 <strong>Target</strong> prediction software packages .................................................................. 10<br />

1.3. Gene set analysis ...................................................................................................... 14<br />

1.3.1 Gene Ontology Enrichment Analysis Software Toolkit (GOEAST) ................. 14<br />

2 Methodology .................................................................................................................... 15<br />

2.1 <strong>Target</strong> prediction - Gather<strong>in</strong>g <strong>and</strong> h<strong>and</strong>l<strong>in</strong>g data ...................................................... 15<br />

2.2 Database of experimentally validated genes ............................................................. 15<br />

2.3 Comparison ............................................................................................................... 17<br />

2.4 Microarray experiment set-up .................................................................................. 17<br />

2.4.1 Cell l<strong>in</strong>es <strong>and</strong> tissue samples ............................................................................ 17<br />

2.4.2 <strong>MiR</strong>NA transfections ........................................................................................ 17<br />

2.5 Polymerase Cha<strong>in</strong> Reaction (PCR) assays ................................................................ 18<br />

2.5.1 Real-time polymerase cha<strong>in</strong> reaction (qPCR) .................................................. 18<br />

2.5.2 PCR ................................................................................................................... 19<br />

2.6 Microarray Analysis ................................................................................................. 19<br />

2.6.1 Def<strong>in</strong><strong>in</strong>g parameters used: ................................................................................. 20<br />

2.7 How to use of Microarray data <strong>and</strong> target predictions ............................................. 20<br />

3 Results .............................................................................................................................. 20<br />

Result I: The comparison between predicted targets <strong>and</strong> experimentally validated ........... 20<br />

Part 1 of Result I: The precision test by us<strong>in</strong>g manually constructed database ........... 21<br />

Part 2 of Result I: The precision test by us<strong>in</strong>g Mirwalk .............................................. 22


Result II: Amalgamation of predicted targets of top 4 software packages ....................... 23<br />

Result III: Quality control of microarray data ................................................................. 24<br />

Result IV: Elucidat<strong>in</strong>g microarray data ........................................................................ 25<br />

Result V: GOEAST analysis ..................................................................................... 29<br />

Result VI: <strong>Validation</strong> of microarray results by qPCR ....................................................... 30<br />

4 Discussion ...................................................................................................................... 33<br />

5 References ...................................................................................................................... 35<br />

Appendices ............................................................................................................................ 38<br />

Appendix 1 ............................................................................................................................ 39<br />

Appendix 2 ............................................................................................................................ 45<br />

Appendix 3 ............................................................................................................................ 48


Abbreviations:<br />

<strong>MiR</strong>NA: MicroRNA<br />

Mir-<strong>155</strong>: MicroRNA <strong>155</strong><br />

3’ UTR: 3' untranslated region<br />

RISC: RNA-<strong>in</strong>duced silenc<strong>in</strong>g complex<br />

mRNA: Messenger RNA: mRNA<br />

Ago: Argonaute prote<strong>in</strong><br />

CDS: Cod<strong>in</strong>g sequence<br />

qPCR: Quantitative (real-time) polymerase cha<strong>in</strong> reaction<br />

DAVID: Database for Annotation, Visualization <strong>and</strong> Integrated Discovery<br />

RNA Pol II: RNA Polymerase II<br />

Pri-miRNA: Primary microRNA<br />

Pre-miRNA: Precursor microRNA<br />

FOXO3A: forkhead box O3A<br />

GO: Gene Ontology<br />

KEGG: Kyoto Encyclopedia of Genes <strong>and</strong> Genomes<br />

GOEAST: Gene Ontology Enrichment Analysis Software Toolkit<br />

MAMI: Meta Mir:<strong>Target</strong> Inference<br />

ENG: Ensemble gene ID<br />

WC: Watson-Crick<br />

kb: kilobase


1. Introduction<br />

1.1 General Information about miRNAs<br />

Micro RNAs (miRNAs) are short (19-24 nucleotides <strong>in</strong> length), endogenously expressed<br />

RNA molecules, that regulate gene expression by directly <strong>and</strong> favorably b<strong>in</strong>d<strong>in</strong>g to 3'<br />

untranslated regions (UTRs) of prote<strong>in</strong> cod<strong>in</strong>g genes [1]. It is expected that miRNAs<br />

regulate up to 60% of all mammalian genes [22]. <strong>MiR</strong>NAs are well conserved among the<br />

species, be<strong>in</strong>g an evolutionary important component [22].<br />

The first miRNA was discovered <strong>in</strong> 1993 dur<strong>in</strong>g the study of the gene l<strong>in</strong>-4 <strong>in</strong> the nematode<br />

Caenorhabditis elegans [2]. It had been found that the correspond<strong>in</strong>g prote<strong>in</strong>'s – LIN-4 –<br />

translation is regulated by an RNA that is encoded by l<strong>in</strong>-4 itself. That endogenous RNA,<br />

which is called l<strong>in</strong>-4, acted as post-transcriptional regulator, <strong>and</strong> one thought at that time<br />

that this was a unique property of nematodes [2].<br />

Plant miRNAs are usually complementary to the cod<strong>in</strong>g regions of mRNAs, which<br />

promotes the cleavage of RNA. In contrast, microRNAs <strong>in</strong> animals partially base pair <strong>and</strong><br />

<strong>in</strong>hibit prote<strong>in</strong> translation of the target mRNA. This exists <strong>in</strong> plants also, but is less<br />

common. MicroRNAs that are partially complementary to the target can also speed<br />

up deadenylation (shorten<strong>in</strong>g of polyA tail on mRNA), caus<strong>in</strong>g mRNAs to be degraded <strong>in</strong><br />

comparatively shorter time.<br />

It is thought that miRNAs can have hundreds of targets. Until now - as reported <strong>in</strong> the<br />

miRBase database, 14197 miRNAs <strong>in</strong> 133 species are known [26].<br />

1.1.1 Biogenesis<br />

Mature miRNAs are processed from longer transcripts called primary miRNAs (primiRNAs).<br />

Primary miRNAs are usually transcribed by RNA Polymerase II (RNA Pol II).<br />

They are further processed <strong>in</strong> the nucleus <strong>and</strong> form ~70 nucleotide step-loop structures<br />

referred to as precursor miRNA (pre-miRNA) (see Figure 1).<br />

Furthermore, pre-miRNAs are cleaved <strong>in</strong> the cytoplasm by endonuclease called Dicer <strong>in</strong>to<br />

complementary short RNA molecules. One of the short RNA molecules <strong>in</strong>tegrates <strong>in</strong>to the<br />

RNA-<strong>in</strong>duced silenc<strong>in</strong>g complex (RISC) <strong>and</strong> leads the whole complex towards a target<br />

messenger RNA (mRNA). In other words, miRNAs provide the specificity that selects the<br />

<strong>in</strong>dividual gene targets through (partially) complementary base-pair<strong>in</strong>g between the miRNA<br />

<strong>and</strong> the mRNA transcript of its target gene (see Figure 1).<br />

1


Figure 1: Regulation of gene expression by miRNAs. Adopted from [25]. Pri-miRNAs are first processed<br />

by the Drosha/Pasha complex <strong>in</strong>to 60-70 nt pre-miRNAs <strong>in</strong> the nucleus. These pre-miRNAs are transported<br />

by Export<strong>in</strong> 5 <strong>in</strong>to the cytoplasm. Dicer then cleaves pre-miRNAs <strong>in</strong>to duplexes. Only one str<strong>and</strong> of this<br />

duplex is <strong>in</strong>corporated <strong>in</strong>to the RISC. The f<strong>in</strong>al complex is the function<strong>in</strong>g as both mRNA cleavage <strong>and</strong><br />

translational repressor by b<strong>in</strong>d<strong>in</strong>g to the target mRNA.<br />

<strong>Target</strong> selection then br<strong>in</strong>gs the mRNA transcripts close to the act<strong>in</strong>g range of the RISC<br />

effector prote<strong>in</strong>s, the pr<strong>in</strong>ciple components which are a miRNA-specific Argonaute prote<strong>in</strong><br />

(Ago) <strong>and</strong> a GW182 (scaffold prote<strong>in</strong>) [27]. Purification of the RISC has shown that it<br />

2


conta<strong>in</strong>s at least one member of the Ago prote<strong>in</strong> family. Furthermore, mutagenesis studies<br />

suggest that Ago2 is particularly responsible for cleavage activity of RISC [25].<br />

Figure 2: A Speculative model show<strong>in</strong>g the roles of each miRNA region <strong>and</strong> the way it b<strong>in</strong>ds to Ago prote<strong>in</strong>.<br />

[1]. (A) MicroRNA (red) is bound to Argonaute (AGO). The first nucleotide is twisted away from the helix<br />

<strong>and</strong> permanently unavailable for pair<strong>in</strong>g. Nucleotides 2–8 are bound (to Ago) <strong>in</strong> a way that they are<br />

preorganized to favor efficient pair<strong>in</strong>g. Nucleotides 9–11 are fac<strong>in</strong>g away from an <strong>in</strong>com<strong>in</strong>g mRNA <strong>and</strong><br />

unavailable for b<strong>in</strong>d<strong>in</strong>g; the rema<strong>in</strong>der of the miRNA is bound <strong>in</strong> a configuration that has not been<br />

preorganized for efficient pair<strong>in</strong>g. (B) 8mer site has been recognized by the complex. (C) The conformational<br />

accommodation of extensively paired sites allow<strong>in</strong>g the miRNA <strong>and</strong> mRNA to wrap around each other. (D)<br />

This pair<strong>in</strong>g is suitable for mRNA cleavage, <strong>in</strong> which Ago locks the paired duplex down so that the active site<br />

(shown with black arrow) will end up cleav<strong>in</strong>g the mRNA. (E) The 3′-supplementary pair<strong>in</strong>g, <strong>in</strong> which shown<br />

that the message can pair to nucleotides 13–16. In this model, miRNA <strong>and</strong> mRNA are not wrapped around<br />

each other. Adopted from [1].<br />

<strong>MiR</strong>NAs are <strong>in</strong>volved <strong>in</strong> diverse biological functions, such as development, proliferation,<br />

differentiation <strong>and</strong> apoptosis [9, 10]. Accumulative evidence allude to that microRNAs are<br />

deregulat<strong>in</strong>g the pathogenesis of tumors. Approximately 50% of all miRNAs are physically<br />

located <strong>in</strong> cancer-associated regions of genome [11]. Several miRNAs are function<strong>in</strong>g as<br />

tumor suppressors or as oncogenes [11].<br />

Individual miRNAs are well-studied compared to multiple miRNA cooperativity. There is<br />

the possibility that miRNAs act synergistically, which is largely unknown [12]. This makes<br />

target prediction very complicated. Microarray studies do not reveal full <strong>in</strong>formation about<br />

miRNA targets because they do not capture the effect of translation <strong>in</strong>hibition, they capture<br />

only degradation. Proteomics studies, on the other h<strong>and</strong>, uncover more <strong>in</strong>formation,<br />

because it yields data on the prote<strong>in</strong> level. There are very few large proteomics studies due<br />

to cost issues. So, when the f<strong>in</strong>al data production is considered, proteomics also expected<br />

to produce less data [8, 13].<br />

The last track on miRNA target prediction could be check<strong>in</strong>g pathways. That might yield<br />

better underst<strong>and</strong><strong>in</strong>g <strong>and</strong> solv<strong>in</strong>g target prediction problem from the systems approach.<br />

Particular miRNAs could act on particular pathways.<br />

3


1.1.2 Plant miRNA target prediction works perfect<br />

Plant miRNAs are <strong>in</strong>volved <strong>in</strong> various aspects of plant growth <strong>and</strong> development, <strong>in</strong>clud<strong>in</strong>g<br />

root formation, leaf morphology <strong>and</strong> polarity, molecular signal<strong>in</strong>g, diverse transition<br />

phases, flower<strong>in</strong>g time <strong>and</strong> floral organ identity. Plant miRNAs are also <strong>in</strong>volved <strong>in</strong> deal<strong>in</strong>g<br />

with stress by post-transcriptional regulation of target genes. <strong>MiR</strong>NA genes are transcribed<br />

by RNA polymerase II [34].<br />

Plant miRNA target prediction shows high success about f<strong>in</strong>d<strong>in</strong>g direct targets. Simply,<br />

check<strong>in</strong>g the high complementarities between miRNA <strong>and</strong> potential mRNA cod<strong>in</strong>g<br />

sequences (CDS) reveals the most probable targets [3].<br />

S<strong>in</strong>ce plant miRNA target prediction shows great success <strong>in</strong> silico, there is not that much<br />

need for novel prediction software or comb<strong>in</strong>ation of different software.<br />

1.1.3 Animal miRNAs<br />

Genetics is important to identify animal miRNA targets. In contrast to plant miRNAs, it has<br />

been found that l<strong>in</strong>-4 <strong>and</strong> let-7 regulate their gene targets by loose complementarity to the<br />

3'UTRs of those targets. It has been established that animal miRNAs do not generally show<br />

extensive complementarity to any endogenous transcripts [4, 5].<br />

There are numerous target prediction software packages which try to shed some light on the<br />

animal miRNA target<strong>in</strong>g problem. Different prediction tools try different approaches by<br />

<strong>in</strong>troduc<strong>in</strong>g various parameters, result<strong>in</strong>g different sets of predicted targets. The<br />

challeng<strong>in</strong>g part is to identify which prediction tool(s) (or comb<strong>in</strong>ation of different tools)<br />

work best. The goal of this study was to f<strong>in</strong>d best work<strong>in</strong>g prediction tool(s), thus, by the<br />

help of that f<strong>in</strong>d<strong>in</strong>g <strong>and</strong> try<strong>in</strong>g to validat<strong>in</strong>g some of those targets for microRNA-<strong>155</strong> (miR-<br />

<strong>155</strong>) <strong>in</strong> nasopharyngeal carc<strong>in</strong>oma.<br />

1.1.3.1 <strong>MiR</strong>-<strong>155</strong><br />

<strong>MiR</strong>-<strong>155</strong> is conta<strong>in</strong>ed <strong>in</strong> Bic, a 64 nucleotide long non-cod<strong>in</strong>g gene, resid<strong>in</strong>g <strong>in</strong><br />

chromosome 21 : 25868163 – 25868227. Primary microRNA transcript is transcribed from<br />

Bic, <strong>and</strong> is processed <strong>in</strong>to pre-miR-<strong>155</strong>, which is 62 nucleotide long, whereas mature miR-<br />

<strong>155</strong> is 22 nucleotides. Accord<strong>in</strong>g to the [26], there are 16 species that miR-<strong>155</strong> is<br />

expressed. Some of the well-studied species are Homo sapiens, Mus musculus, Gallus<br />

gallus, Danio rerio, Ciona savignyi <strong>and</strong> Ciona <strong>in</strong>test<strong>in</strong>alis. The miR<strong>155</strong> gene is present <strong>in</strong><br />

only one copy, <strong>and</strong> miR<strong>155</strong> does not share significant sequence with other reported<br />

miRNAs [26, 35].<br />

<strong>MiR</strong>-<strong>155</strong> is <strong>in</strong>volved <strong>in</strong> various biological processes <strong>in</strong>clud<strong>in</strong>g immunity, haematopoiesis<br />

<strong>and</strong> <strong>in</strong>flammation. Mir-<strong>155</strong> is highly expressed <strong>in</strong> Hodgk<strong>in</strong>‟s lymphoma <strong>and</strong> <strong>in</strong> large B cell<br />

lymphomas. The overexpression of miR-<strong>155</strong> <strong>in</strong>dicates that it is an oncogene. <strong>MiR</strong>-<strong>155</strong><br />

null mice had serious immune defects <strong>in</strong> both adaptive <strong>and</strong> <strong>in</strong>nate immunity [35].<br />

4


Figure 3: The representation of precursor miR-<strong>155</strong> (65 bp) sequence by Genome Browser, which resides <strong>in</strong><br />

chromosome 21 : 25868163 - 25868227 : + Adopted from [16]<br />

Accumulat<strong>in</strong>g evidence <strong>in</strong>dicates that miR-<strong>155</strong> is an oncogenic miRNA. Many profil<strong>in</strong>g<br />

studies have already shown that miR-<strong>155</strong> is upregulated <strong>in</strong> various types of human<br />

malignancies [23, 24]. Those malignancies <strong>in</strong>clude B cell lymphoma <strong>and</strong> breast,<br />

nasopharyngeal, colon, lung, <strong>and</strong> kidney carc<strong>in</strong>omas. For <strong>in</strong>stance, <strong>in</strong> breast cancer miR-<br />

<strong>155</strong> <strong>in</strong>duces cell survival <strong>and</strong> has a role <strong>in</strong> chemoresistance [24]. Its anti-apoptotic function<br />

is mediated by direct <strong>in</strong>hibition of FOXO3a (the gene that belongs to the forkhead family of<br />

transcription factors, associated with acute leukemia). Furthermore, elevated miR-<strong>155</strong><br />

levels have recently been observed <strong>in</strong> late stage <strong>and</strong> poor overall survival cases suffer<strong>in</strong>g<br />

from several different types of malignancies. Knock-down of miR-<strong>155</strong> has been associated<br />

with impaired immune activity [24]. In addition, it has been l<strong>in</strong>ked to <strong>in</strong>flammation, as<br />

well [24].<br />

5


Figure4: The secondary structure of precursor miR-<strong>155</strong> predicted by MirnaMap. Adopted from [17].<br />

1.2. <strong>Target</strong> <strong>Prediction</strong> of miRNAs<br />

1.2.1 Features/Parameters for miRNA target prediction<br />

Determ<strong>in</strong>ation of parameters that are crucial <strong>in</strong> target prediction has been quite challeng<strong>in</strong>g.<br />

This is ma<strong>in</strong>ly due to limited pair<strong>in</strong>g between miRNAs <strong>and</strong> target mRNAs. To solve that<br />

problem, many computational <strong>and</strong> experimental approaches have been used synergistically.<br />

Widely proposed parameters/features are divided <strong>in</strong>to six categories: „seed site‟ pair<strong>in</strong>g, site<br />

location, conservation, site accessibility, multiple sites <strong>and</strong> expression profile.<br />

1.2.1.1 ‘Seed site’ is the most important feature for target recognition<br />

<strong>MiR</strong>NA targets conta<strong>in</strong> at least one region that has Watson-Crick (WC) pair<strong>in</strong>g (<strong>in</strong> which<br />

aden<strong>in</strong>e (A) forms a base pair with thym<strong>in</strong>e (T) <strong>and</strong> cytos<strong>in</strong>e (C) with guan<strong>in</strong>e (G) )<br />

towards the 5′ end of the miRNA b<strong>in</strong>d<strong>in</strong>g site. Specifically, this region, which is located at<br />

positions 2–7 from the 5′ end of miRNA, is known as the „seed‟. RISC uses this site as a<br />

nucleation signal for recogniz<strong>in</strong>g target mRNAs.<br />

A str<strong>in</strong>gent-seed site has perfect Watson–Crick pair<strong>in</strong>g <strong>and</strong> can be divided <strong>in</strong>to four „seed‟<br />

types: 8mer, 7mer-m8, 7mer-A1 <strong>and</strong> 6mer – vary<strong>in</strong>g due to the comb<strong>in</strong>ation of the<br />

nucleotide of position 1 <strong>and</strong> pair<strong>in</strong>g at position 8. 8mer has both an aden<strong>in</strong>e residue at<br />

position 1 of the target site <strong>and</strong> base pair<strong>in</strong>g at position 8. 7mer-A1 has an aden<strong>in</strong>e at<br />

6


position 1, but no base pair<strong>in</strong>g at position 8. On the other h<strong>and</strong>, 7mer-m8 has base pair<strong>in</strong>g at<br />

position 8, but not aden<strong>in</strong>e at position 1. F<strong>in</strong>ally, 6mer has neither an aden<strong>in</strong>e at position 1<br />

nor base pair<strong>in</strong>g at position 8 [14]. The importance of the aden<strong>in</strong>e at position 1 is that, it<br />

<strong>in</strong>creases the efficiency of target recognition [8]. The hierarchy can be stated as:<br />

8mer > 7mer-m8 > 7mer-A1 > 6mer <strong>in</strong> the str<strong>in</strong>gent-seed types [14].<br />

In addition, moderate-str<strong>in</strong>gent-seed match<strong>in</strong>g – RISC tolerat<strong>in</strong>g little mismatches or the<br />

G:U wobble with<strong>in</strong> the seed region – is functional as well, because the RISC can tolerate<br />

little mismatches or the G:U wobble with<strong>in</strong> the seed region. This moderate-str<strong>in</strong>gent-seed<br />

match<strong>in</strong>g has five „seed‟ types: GUM, GUT, BM, BT <strong>and</strong> LP, def<strong>in</strong>ed regard<strong>in</strong>g to the<br />

mismatch type [14].<br />

The preferable nucleotide number of matches <strong>in</strong> the 3′ part differs between the site that has<br />

str<strong>in</strong>gent-seed pair<strong>in</strong>g <strong>and</strong> the one that has moderate-str<strong>in</strong>gent-seed pair<strong>in</strong>g. Str<strong>in</strong>gent-seeds<br />

require 3–4 matches <strong>in</strong> the positions 13–16, whereas moderate-str<strong>in</strong>gent-seeds require 4–5<br />

matches <strong>in</strong> the positions 13–19. Sites with this additional 3′ pair<strong>in</strong>g are called 3′supplementary<br />

The advantage of us<strong>in</strong>g different set of seed types is <strong>in</strong>creas<strong>in</strong>g sensitivity. On the other<br />

h<strong>and</strong>, high specificity is obta<strong>in</strong>ed when only str<strong>in</strong>gent-seed types are considered, but some<br />

targets could be missed <strong>in</strong> that way (due to tolerated mismatches, wobbles, <strong>and</strong> so on).<br />

Figure 5: Types of miRNA target sites <strong>and</strong> multiple sites. (a) Str<strong>in</strong>gent-seed site, 7mer-A1. Vertical l<strong>in</strong>es<br />

7


<strong>in</strong>dicate Watson–Crick par<strong>in</strong>g. (b) Moderate-str<strong>in</strong>gent-seed site, show<strong>in</strong>g BM as an example. (c) 3′supplementary<br />

site, <strong>in</strong> which more than three to four nucleotides par<strong>in</strong>g required. (d) Optimal distance of two<br />

miRNA target sites. Adopted from [15].<br />

1.2.1.2 Site location<br />

Most target sites of miRNAs are located <strong>in</strong> 3‟UTRs of target genes. . Somehow RISC<br />

prefers act<strong>in</strong>g on 3‟UTR. <strong>Target</strong> sites are not uniformly distributed with<strong>in</strong> 3‟UTRs, but<br />

<strong>in</strong>stead tend to cluster near ends if the sequence is more than 2kb long. Some genes have<br />

comparatively short 3‟UTRs, e.g. house-keep<strong>in</strong>g genes, which is believed to help avoid<br />

<strong>in</strong>terference from miRNAs. If the 3‟UTR is short, then the b<strong>in</strong>d<strong>in</strong>g sites (if there is any) are<br />

usually located 15-20 nucleotides away from stop codons [15].<br />

Alternative splic<strong>in</strong>g <strong>and</strong> polyadenylation makes it difficult to predict miRNA targets,<br />

because they result <strong>in</strong> unexpected or difficult to calculate target features. Consequently,<br />

software packages predict many false positive targets. More specifically, polyadenylation<br />

shortens the 3‟UTR, while alternative splic<strong>in</strong>g makes different potential targets [15].<br />

Even though many known miRNA targets are preferentially located <strong>in</strong> 3‟UTR, it is reported<br />

that some miRNA targets are also found on 5‟UTR <strong>and</strong> CDS [19]. Reasonably, function<strong>in</strong>g<br />

on CDS <strong>and</strong> 5‟UTR is more difficult for RISC than function<strong>in</strong>g on 3‟UTR s<strong>in</strong>ce it might<br />

have to compete with ribosomes, transcription factors <strong>and</strong> many other regulatory prote<strong>in</strong>s.<br />

This is believed to be one of the reasons why RISC prefers 3‟UTR [15].<br />

1.2.1.3 Conservation: <strong>Target</strong>s <strong>and</strong> miRNAs are conserved among related species<br />

<strong>MiR</strong>NAs that have the same seed site belong to the same miRNA family, <strong>and</strong> are well<br />

conserved among related species. Additionally, miRNA families have targets that are<br />

conserved among related species [9]. Apply<strong>in</strong>g conservation filters decreases the false<br />

positive rate <strong>and</strong> is especially effective amongst conserved miRNAs. On the other h<strong>and</strong>, it<br />

has been reported that 30% of all experimentally validated miRNA target genes may not be<br />

well-conserved.<br />

1.2.1.4 Accessibility<br />

The secondary structure of mRNA affects the target accessibility significantly. <strong>Target</strong> sites<br />

have to be accessible, mean<strong>in</strong>g that they have to be opened <strong>and</strong> must not <strong>in</strong>teract with other<br />

sites with<strong>in</strong> the mRNA. After the first <strong>in</strong>teraction, the secondary structure of mRNA could<br />

be disrupted by RISC on the b<strong>in</strong>d<strong>in</strong>g site to elongate hybridization [15].<br />

8


Figure 6: Accessibility of mRNA. For b<strong>in</strong>d<strong>in</strong>g to the miRNA, the target site has to be accessible,<br />

mean<strong>in</strong>g it has to be opened <strong>and</strong> must not <strong>in</strong>teract with other sites with<strong>in</strong> the mRNA. Open<strong>in</strong>g costs<br />

a certa<strong>in</strong> amount of energy ΔGopen . The total free energy change is Δ ΔG =ΔGduplex – ΔGopen. Δ ΔG<br />

represents score for the accessibility of the target site <strong>and</strong> the probability for a miRNA-target<br />

<strong>in</strong>teraction. Adopted from [15].<br />

Lower AU content is preferential, mean<strong>in</strong>g that it is easy to access mRNA <strong>and</strong> b<strong>in</strong>d to it,<br />

due to less hydrogen bond between A <strong>and</strong> U. Especially, the A:Us surround<strong>in</strong>g the b<strong>in</strong>d<strong>in</strong>g<br />

site could be used as a significant parameter to calculate accessibility. Efficient target sites<br />

preferentially have A:U rich context <strong>in</strong> ~30 nucleotides upstream <strong>and</strong> downstream from the<br />

seed site [14].<br />

9


1.2.1.5 Multiple sites <strong>in</strong> s<strong>in</strong>gle target<br />

Multiple b<strong>in</strong>d<strong>in</strong>g sites might exist on the same 3‟UTR. This <strong>in</strong> fact will result <strong>in</strong><br />

cooperativity, which may enhance overall miRNA functionality. <strong>MiR</strong>NAs can act on their<br />

targets synergistically. Two target sites with<strong>in</strong> the optimal distance are shown to enhance<br />

target site efficacy [14]. The optimal length is often between 17 <strong>and</strong> 35 nucleotides [14,<br />

13].<br />

1.2.1.6 Expression profile: miRNA:mRNA pairs are negatively correlated <strong>in</strong><br />

expression profiles<br />

S<strong>in</strong>gle miRNA is capable of regulat<strong>in</strong>g many genes; thus expression profiles of mRNAs<br />

might vary considerably depend<strong>in</strong>g on the miRNA expression levels. In addition, many<br />

miRNAs are also expressed differently <strong>in</strong> different tissues. As a result, if negatively<br />

correlated expression values of a miRNA:mRNA pair are detected across different tissue<br />

profiles, the mRNA of the pair is probably targeted by the miRNA [15]. This approach<br />

effectively reduces false positives. The majority of miRNA targets appear to be regulated<br />

both at the mRNA <strong>and</strong> prote<strong>in</strong> level, but some targets only show an effect at the prote<strong>in</strong><br />

level [32].<br />

1.2.2 <strong>Target</strong> prediction software packages<br />

1.2.2.1 Mirtarget2<br />

Mirtarget2 is mach<strong>in</strong>e learn<strong>in</strong>g tool, which has been developed by analyz<strong>in</strong>g thous<strong>and</strong>s of<br />

genes downregulated by miRNAs Available database for miRNA target prediction <strong>in</strong> five<br />

species are: human, mouse, rat, dog <strong>and</strong> chicken. Mirtarget2 <strong>in</strong>corporates 4 parameters<br />

which are: moderately-str<strong>in</strong>gent seeds, site positions, <strong>and</strong> site accessibility <strong>and</strong> conservation<br />

filter [6, 7].<br />

1.2.2.2 <strong>Target</strong>Scan<br />

<strong>Target</strong>Scan presents several approaches for predict<strong>in</strong>g microRNA target sites <strong>in</strong> several<br />

species. The first established version of <strong>Target</strong>Scan was designed to search for seed pair<strong>in</strong>g.<br />

The rank<strong>in</strong>g was based on the thermodynamic stability of the b<strong>in</strong>d<strong>in</strong>g site. Furthermore, the<br />

predicted targets for multiple species were comb<strong>in</strong>ed to get predictions for conserved target<br />

sites [18].<br />

The context score for a specific site is the sum of the contribution of these four features:<br />

10


i. Site (seed) contribution<br />

ii. 3' pair<strong>in</strong>g contribution<br />

iii. Local AU content<br />

iv. Positional contribution<br />

The imperfect seed match<strong>in</strong>g with addition of 3‟ compensatory pair<strong>in</strong>g is later <strong>in</strong>corporated<br />

to the <strong>Target</strong>Scan algorithm. The efficiencies of the sites are calculated by look<strong>in</strong>g at the<br />

3‟UTR context of the target mRNA sites. Web server of <strong>Target</strong>Scan provides miRNA<br />

predictions for human, dog, chimpanzee, rat, mouse, chicken, rhesus, cow, frog, opossum,<br />

worm <strong>and</strong> fly. The conservation filter is carefully quantified by <strong>Target</strong>Scan, which is called<br />

PCT. The probability of conserved target<strong>in</strong>g consider<strong>in</strong>g multiple sites, gives Aggregate PCT:<br />

1 - ( (1 - PCT)site1 x (1 - PCT)site2 x (1 - PCT)site3 ... ) [22]<br />

Figure 7: Snapshot taken from the <strong>Target</strong>Scan web server, while look<strong>in</strong>g for miR-<strong>155</strong> putative targets.<br />

<strong>Target</strong>Scan provides clear picture of predicted targets. Both gene symbol <strong>and</strong> the gene name are reported.<br />

Moreover, the number of different seed types, type of conservation (conserved <strong>and</strong> poorly conserved), total<br />

context score <strong>and</strong> aggregate PCT are shown on the website.<br />

1.2.2.3 DIANA-MicroT v3.0<br />

DIANA-MicroT algorithm searches str<strong>in</strong>gent seed pair<strong>in</strong>g to target mRNAs, which are at<br />

least 7 consecutive WC pairs. In addition, 6mer <strong>and</strong> seeds with G:U wobble are also<br />

accepted if the 3‟ end of the miRNA has a compensat<strong>in</strong>g pair<strong>in</strong>g with the target [21].<br />

11


By us<strong>in</strong>g the targets identified by the molecular biological method pSILAC developed by<br />

[13], the performance of various target prediction programs was assessed. DIANA-microT<br />

v3.0 accomplished the highest score of 66% accurately predict<strong>in</strong>g targets over all predicted<br />

targets [21].<br />

DIANA microT web server is very user-friendly, where prediction results are organized <strong>in</strong><br />

exp<strong>and</strong>able tabs (see Fig 8). For human <strong>and</strong> mouse those predictions are available at<br />

http://diana.cslab.ece.ntua.gr/microT/. DIANA provides the opportunity to search for<br />

targets of a specific miRNA <strong>and</strong> as well as miRNA(s) of specific mRNA (target genes).<br />

Furthermore, DIANA microT v3.0 provides a signal-to-noise ratio (SNR), miTG score <strong>and</strong><br />

precision score. Results are ranked accord<strong>in</strong>g to the miTG, <strong>in</strong> which user def<strong>in</strong>es threshold<br />

miTG score. Official gene symbol <strong>and</strong> Ensemble gene IDs are used as an identifier.<br />

F<strong>in</strong>ally, results can be downloaded as a spreadsheet to work on <strong>in</strong>dependently.<br />

Figure 8: Snapshot taken from DIANA MicroT web server, while predict<strong>in</strong>g miR-<strong>155</strong> targets. The<br />

exp<strong>and</strong>able tab shows almost all necessary <strong>in</strong>formation about predicted target (<strong>in</strong> this case BACH1). One of<br />

the very important one is seed type (shown on the very left). Shown here that <strong>in</strong> 3‟ UTR of gene BACH1,<br />

there are 4 miR-<strong>155</strong> target sites. Also, the number of conservations among species of that specific b<strong>in</strong>d<strong>in</strong>g site<br />

is expressed as well. On the very right, one can see the prediction confirmation by other well-known software<br />

packages.<br />

12


1.2.2.4 PicTar<br />

PicTar – probabilistic identification of comb<strong>in</strong>ations of target sites – is an algorithm to<br />

predict miRNA targets. The PicTar algorithm uses a different approach, which is rank<strong>in</strong>g<br />

targets by consider<strong>in</strong>g whether the mRNA is a target for comb<strong>in</strong>ations of other miRNAs as<br />

well.<br />

PicTar algorithm requires perfect 7mer of WC pair<strong>in</strong>g of either nucleotide 1-7 or 2-8.<br />

Imperfect seed pair<strong>in</strong>g is also allowed <strong>in</strong> PicTar, but it does not <strong>in</strong>crease the overall score.<br />

PicTar uses RNAhybrid to calculate free energy required to form a miRNA:mRNA hybrid<br />

<strong>in</strong> order to filter the potential targets accord<strong>in</strong>g to the free energy filter. Additionally,<br />

PicTar uses a conservation filter to reduce the number of false positives. F<strong>in</strong>ally, the<br />

magnitude of all <strong>in</strong>puts is put together <strong>and</strong> sent to PicTar Sequence Scor<strong>in</strong>g Algorithm,<br />

which uses Hidden Markov Model (HMM) to compute maximum-likelihood score (MLS).<br />

MLS def<strong>in</strong>es the likelihood of a gene be<strong>in</strong>g a target of a specific miRNA. The MLS score<br />

is calculated for every species separately, <strong>and</strong> comb<strong>in</strong>ed to get f<strong>in</strong>al PicTar score, which is<br />

<strong>in</strong> turn used for rank<strong>in</strong>g the potential targets. Typical MLS values for top predicted targets<br />

are rang<strong>in</strong>g from 5 to 10.<br />

At http://pictar.mdc-berl<strong>in</strong>.de/ precompiled predictions for vertebrates, flies, mice <strong>and</strong><br />

nematodes are available.<br />

1.2.2.5 MAMI<br />

MAMI (Meta Mir:<strong>Target</strong> Inference) is a software/database which uses pre-compiled lists of<br />

targets from other softwares to <strong>in</strong>crease the reliability of predictions. MAMI also allows<br />

users to choose the preferred sensitivity <strong>and</strong> specificity values.<br />

Sensitivity = True positives / (True positives + False negatives)<br />

Specificity = True negatives / (True negatives + False positives)<br />

Sensitivity <strong>and</strong> specificity are easily tunable to the user's needs, which is 5 different levels<br />

of sensitivity <strong>and</strong> specificity, to best suit for the experimental goals.<br />

The <strong>in</strong>ternal cutoff values, which were used to generate each performance <strong>in</strong> the validated<br />

set, were applied to all human miR-target predictions. Aim was to calculate the percentile<br />

of predictions that satisfy these cutoffs.<br />

1.2.2.6 Other prediction tools<br />

Other prediction tools are PITA, EIMMO, Mir<strong>and</strong>a, RNAhybrid, <strong>Target</strong>Rank, RNA22 <strong>and</strong><br />

etc.<br />

13


Table 2: List of miRNA prediction tools <strong>and</strong> their features. Adopted from [15]<br />

A<br />

Seed pair<strong>in</strong>g. ●: str<strong>in</strong>gent seeds, ○: moderately str<strong>in</strong>gent seeds, Blank: seed sites not<br />

considered.<br />

b<br />

Site location. ●: target positions considered, Blank: target positions not considered.<br />

c<br />

Conservation. ●: with/without conservation filter, ○: with conservation filter, Blank:<br />

conservation not considered.<br />

d<br />

Site accessibility. ●: site accessibility with m<strong>in</strong>imum free energy considered, ○: A:U rich<br />

flank<strong>in</strong>g considered, Blank: site accessibility not considered.<br />

e<br />

Multiple sites <strong>in</strong> s<strong>in</strong>gle mRNA. ●: multiple sites considered, ○: the number of putative<br />

sites considered, Blank: multiple co-operability not considered.<br />

f<br />

Expression profile. ●: expression profiles used, Blank: expression profiles not used.<br />

1.3. Gene set analysis<br />

Several methods have been developed for gene set analysis of microarray data. These<br />

methods calculate the differential gene expression patterns of group of functionally related<br />

genes rather than <strong>in</strong>dividual ones. The basic goal is to discover gene sets whose expression<br />

patterns are associated with phenotypes of <strong>in</strong>terest. Gene Ontology (GO) <strong>and</strong> Kyoto<br />

Encyclopedia of Genes <strong>and</strong> Genomes (KEGG) are good examples for collect<strong>in</strong>g genes <strong>in</strong>to<br />

functional groups.<br />

1.3.1 Gene Ontology Enrichment Analysis Software Toolkit (GOEAST)<br />

GOEAST is web based software toolkit which provides an easy way to analyze highthroughput<br />

experimental results, i.e. microarray data. It has a user friendly <strong>in</strong>terface which<br />

is easy to visualize extensive data <strong>and</strong> perform GO analysis. Moreover, the ma<strong>in</strong> function<br />

of GOEAST is to identify significantly enriched GO terms among give lists of genes us<strong>in</strong>g<br />

desired statistical methods [31].<br />

14


2 Methodology<br />

2.1 <strong>Target</strong> prediction - Gather<strong>in</strong>g <strong>and</strong> h<strong>and</strong>l<strong>in</strong>g data<br />

First of all, all the miR-<strong>155</strong> related predictions are obta<strong>in</strong>ed from each website. The<br />

follow<strong>in</strong>g is the list of target prediction software‟s websites:<br />

Table 1: List of target prediction softwares/databases <strong>and</strong> their correspond<strong>in</strong>g websites:<br />

PicTar http://pictar.mdc-berl<strong>in</strong>.de/<br />

<strong>Target</strong>Scan 5.1 www.targetscan.org<br />

DIANA-MicroT 3.0 http://diana.cslab.ece.ntua.gr/microT/<br />

MAMI http://mami.med.harvard.edu/<br />

EIMMO 3 www.mirz.unibas.ch/ElMMo3/<br />

Mir<strong>Target</strong>2 http://mirdb.org/miRDB/<br />

PITA http://genie.weizmann.ac.il/pubs/mir07/<br />

<strong>Target</strong>Rank http://genes.mit.edu/targetrank/<br />

RNA22 http://cbcsrv.watson.ibm.com/rna22.html<br />

<strong>Prediction</strong> softwares do not use a common gene identifier. As a result, DIANA-MicroT 3.0<br />

gives gene symbol <strong>and</strong> Ensemble gene ID (ENG), <strong>Target</strong>Scan 5.1<strong>and</strong> Mir<strong>Target</strong>2 yield gene<br />

symbol <strong>and</strong> gene name, PicTar gives gene name <strong>and</strong> RefSeq ID, MAMI shows only gene<br />

symbols <strong>and</strong> so on. So, those results were mapped to unique identifier, which is found to<br />

be ENG, because most genes have a unique ENG identifier.<br />

2.2 Database of experimentally validated genes<br />

Total numbers of experimentally validated genes are constructed us<strong>in</strong>g Tarbase [28] <strong>and</strong><br />

Mirwalk [29]. These databases show both mRNA <strong>and</strong> prote<strong>in</strong> level downregulation. Thus,<br />

only mRNA level (validated by Luciferase reporter assay) down-regulations, which are<br />

constructed by manually check<strong>in</strong>g Tarbase [28] <strong>and</strong> publications are considered separately<br />

<strong>in</strong> this study. By do<strong>in</strong>g this, f<strong>in</strong>ally, 37 mRNA level experimentally validated<br />

downregulated genes were obta<strong>in</strong>ed (see Table 2). By us<strong>in</strong>g those targets, one can only<br />

study mRNA degredation, because translation <strong>in</strong>hibition is not detectable <strong>in</strong> Luciferase<br />

15


eporter assay. The second database was Mirwalk [29], which comprised all the targets of<br />

Tarbase. It was also used as a validation source, but keep<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d that validated targets<br />

by Mirwalk are derived from onl<strong>in</strong>e publications (consider<strong>in</strong>g any k<strong>in</strong>d of miRNA-target<br />

<strong>in</strong>teractions that are reported). As a result, 528 “DIRECT <strong>and</strong> “INDIRECT” (study <strong>in</strong>cludes<br />

<strong>and</strong> doesn‟t <strong>in</strong>clude Luciferase reporter assay, respectively) targets of miR-<strong>155</strong> were<br />

collected by us<strong>in</strong>g Mirwalk [29].<br />

Gene_symbol Gene_name<br />

AGTR1 Angiotens<strong>in</strong> II receptor, type 1<br />

AGTRAP Angiotens<strong>in</strong> II receptor-associated prote<strong>in</strong><br />

AID Activation-<strong>in</strong>duced cytid<strong>in</strong>e deam<strong>in</strong>ase<br />

ARID2 AT rich <strong>in</strong>teractive doma<strong>in</strong> 2 (ARID, RFX-like)<br />

ARNTL Aryl hydrocarbon receptor nuclear translocator-like<br />

AT1R angiotens<strong>in</strong> II receptor 1B<br />

BACH1 BTB <strong>and</strong> CNC homology 1, basic leuc<strong>in</strong>e zipper transcription factor 1<br />

BCL2L13 BCL2-like 13 (apoptosis facilitator)<br />

BIRC4BP XIAP associated factor 1<br />

CEBPB CCAAT/enhancer b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (C/EBP), beta<br />

CSF1R Colony stimulat<strong>in</strong>g factor 1 receptor<br />

CUTL1 Cut-like homeobox 1<br />

Ets-1 v-ets erythroblastosis virus E26 oncogene homolog 1<br />

FGF7 Fibroblast growth factor 7 (kerat<strong>in</strong>ocyte growth factor)<br />

FOS FBJ mur<strong>in</strong>e osteosarcoma viral oncogene homolog<br />

HIF1A Hypoxia <strong>in</strong>ducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)<br />

HIVEP2 Human immunodeficiency virus type I enhancer b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 2<br />

IKBKE Inhibitor of kappa light polypeptide gene enhancer <strong>in</strong> B-cells, k<strong>in</strong>ase epsilon<br />

JARID2 Jumonji, AT rich <strong>in</strong>teractive doma<strong>in</strong> 2<br />

MAF V-maf musculoaponeurotic fibrosarcoma oncogene homolog (avian)<br />

MAP3K10 Mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 10<br />

MEIS1 Meis homeobox 1<br />

PDCD6 Programmed cell death 6<br />

PICALM Phosphatidyl<strong>in</strong>ositol b<strong>in</strong>d<strong>in</strong>g clathr<strong>in</strong> assembly prote<strong>in</strong><br />

PU.1 Spleen focus form<strong>in</strong>g virus (SFFV) proviral <strong>in</strong>tegration oncogene spi1<br />

RFK Riboflav<strong>in</strong> k<strong>in</strong>ase<br />

RHOA Ras homolog gene family, member A<br />

RPS6KA3 Ribosomal prote<strong>in</strong> S6 k<strong>in</strong>ase, 90kDa, polypeptide 3<br />

SAMHD1 SAM doma<strong>in</strong> <strong>and</strong> HD doma<strong>in</strong> 1<br />

SHIP1 <strong>in</strong>ositol polyphosphate-5-phosphatase<br />

SLA Src-like-adaptor<br />

SMAD5 SMAD family member 5<br />

TAB2 TGF-beta activated k<strong>in</strong>ase 1/MAP3K7 b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 2<br />

16


TP53INP1 Tumor prote<strong>in</strong> p53 <strong>in</strong>ducible nuclear prote<strong>in</strong> 1<br />

ZIC3 Zic family member 3 (odd-paired homolog, Drosophila)<br />

ZNF537 Z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 537<br />

ZNF652 Z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 652<br />

Table 3: The list of experimentally validated 37 genes.<br />

2.3 Comparison<br />

37 validated genes were compared with predicted targets of each software/database. The<br />

result was put <strong>in</strong>to the list, which <strong>in</strong>cludes total number of predicted targets for each<br />

software packages <strong>and</strong> number of validated targets are among those targets. Precision, the<br />

percentage of validated targets to total predicted targets, was calculated for each<br />

software/database. This parameter – precision, shows the comb<strong>in</strong>atorial effect of both<br />

sensitivity <strong>and</strong> specificity. S<strong>in</strong>ce, pre-compiled results are obta<strong>in</strong>ed directly from websites<br />

of different softwares, it was impossible to calculated specificity (because number of True<br />

Negatives are unknown) unless they already mention it (i.e., MAMI). On the other h<strong>and</strong>,<br />

sensitivity (by consider<strong>in</strong>g validated targets) could be calculated, s<strong>in</strong>ce number of True<br />

Positives (TP) <strong>and</strong> False Negatives (FN) are known.<br />

Precision = True positives / Total predicted targets<br />

2.4 Microarray experiment set-up<br />

Microarray experimental design was done at Microbiology Tumor <strong>and</strong> Cell Biology (MTC)<br />

department of Karol<strong>in</strong>ska Institute with the help of doctoral student, Zim<strong>in</strong>g Du, under the<br />

supervision of Prof. Ingemar Ernberg. The whole experimental design, from harvest<strong>in</strong>g<br />

cells to extract<strong>in</strong>g RNA took place <strong>in</strong> March 2010. Microarray experiment was done us<strong>in</strong>g<br />

Affymetrix platform at the core facility for Bio<strong>in</strong>formatics <strong>and</strong> Expression Analysis (BEA),<br />

located at the Department of Biosciences <strong>and</strong> Nutrition at Novum, Hudd<strong>in</strong>ge.<br />

2.4.1 Cell l<strong>in</strong>es <strong>and</strong> tissue samples<br />

Human NPC cell l<strong>in</strong>e TW03 cells were cultured <strong>in</strong> IMEM (Gibco USA) conta<strong>in</strong><strong>in</strong>g<br />

10% fetal calf serum (FCS). The immortalized nasopharyngeal epithelial cell l<strong>in</strong>e NP69<br />

was cultured <strong>in</strong> kerat<strong>in</strong>ocyte serum-free medium (Invitrogen) supplemented with 5% FCS,<br />

25 μg/ml bov<strong>in</strong>e pituitary extract, <strong>and</strong> 0.2 ng/ml recomb<strong>in</strong>ant epidermal growth factor, as<br />

suggested by the manufacturer. All the cell l<strong>in</strong>es were grown <strong>in</strong> a humidified <strong>in</strong>cubator at<br />

37 o C with 5% CO2.<br />

2.4.2 <strong>MiR</strong>NA transfections<br />

Before transfection, 2 × 10 5 cells per well were plated <strong>in</strong>to 6-well plates <strong>and</strong> grown for one<br />

day <strong>in</strong> antibiotic-free medium conta<strong>in</strong><strong>in</strong>g 10% FCS. When the cells confluent were reached<br />

to 40% to 60%, cells were transfected with miR-<strong>155</strong> Pre-miR miRNA Precursor (miR-<br />

<strong>155</strong> mimic) Molecules (Cat#: PM12601, Ambion, USA), or Pre-miR miRNA Precursor<br />

Molecules-Negative Control #1 (Cat#: AM17110, Ambion, USA) or miR-<strong>155</strong> Anti-miR<br />

17


miRNA Inhibitor (Cat#: AM12601, Ambion, USA), or Anti-miR miRNA Inhibitors-<br />

Negative Control #1 (Cat#: AM17010, Ambion, USA) us<strong>in</strong>g Lipofectam<strong>in</strong>e 2000<br />

(Invitrogen, USA) accord<strong>in</strong>g to the manufacturer‟s <strong>in</strong>structions.<br />

Transfected (miR-<strong>155</strong> mimic 100nM, miR-<strong>155</strong> mimic 50nM, miR-<strong>155</strong> control 50nM) cells<br />

were grown at 37 o C for 6 hr, followed by <strong>in</strong>cubation with complete medium. For miR-<strong>155</strong><br />

assay <strong>and</strong> Western blot analysis, cells were harvested for RNA <strong>and</strong> prote<strong>in</strong> after 48 hr.<br />

2.5 Polymerase Cha<strong>in</strong> Reaction (PCR) assays<br />

The PCR assays were done at Microbiology Tumor <strong>and</strong> Cell Biology (MTC) department of<br />

Karol<strong>in</strong>ska Institute with the help of doctoral student, Zim<strong>in</strong>g Du, under the supervision of<br />

Prof. Ingemar Ernberg. Whole experimental design took place <strong>in</strong> June 2010.<br />

symbol<br />

mimic<br />

100nM<br />

mimic<br />

50nM<br />

Control 50nM NP69 TW03 LOG2_100 LOG2_50 LOG2_TW03 prediction<br />

C9orf5 884,29 828,1 1656,07 791,85 975,8 -0.91 -1 0.3 <strong>Target</strong>Scan<br />

PERP 1531,4 594,97 1328,54 2120,2 1098,3 0.21 -1.16 -0.95 DIANA-MicroT<br />

TP53INP1 48,38 50,02 164,9 7,19 111,41 -1.77 -1.72 3.95 <strong>Target</strong>Scan<br />

TERF1 422,37 350,07 553,02 312,8 449,22 -0.39 -0.66 0.52<br />

18<br />

DIANA-MicroT+<br />

TargeScan<br />

BCLAF1 530,82 455,01 691,37 748,89 453,08 -0.38 -0.6 -0.72 DIANA-MicroT<br />

E2F2 95,39 101,94 129,26 168,52 142,3 -0.44 -0.34 -0.24 DIANA-MicroT<br />

Table 4: 6 genes which are found to be <strong>in</strong>terest<strong>in</strong>g enough to perform validation experiments on<br />

them, s<strong>in</strong>ce they have been predicted by at least one of softwares as potential targets. In addition,<br />

the microarray expression values of those genes are downregulated compared to the control_50nM<br />

or NP69 normal tissue.<br />

2.5.1 Real-time polymerase cha<strong>in</strong> reaction (qPCR)<br />

For the qPCR assay, total RNA was isolated from cell l<strong>in</strong>es us<strong>in</strong>g TRIzol reagent<br />

(Invitrogen) accord<strong>in</strong>g to the manufacturer‟s <strong>in</strong>structions, then was treated with RNase free<br />

DNase I (Cat#: 04716728001, Roche). The miR-<strong>155</strong> qPCR assay was performed by<br />

TaqMan® MicroRNA Assays (Cat#: 4373124, Applied Biosystems, USA) <strong>and</strong> RNU6B<br />

(Cat#: 4373381, Applied Biosystems, USA) was used as <strong>in</strong>ternal control. The relative<br />

expression level was determ<strong>in</strong>ed as 2 -ΔΔCt .<br />

Data are presented as the expression level relative to the calibrator, with the st<strong>and</strong>ard error<br />

of the mean of triplicate measures for each test sample.<br />

After reverse transcription of the total RNA, the first-str<strong>and</strong> cDNA was then used as<br />

template for detection of PERP, TP53INP1, TERF1, BCLAF1 <strong>and</strong> E2F2 expression by<br />

quantitative real time PCR (QT-PCR) with the SYBR Green I chemistry (Power SYBR<br />

Green PCR Master Mix, CAT#: 4367659, ABI Inc., USA). GAPDH was used as <strong>in</strong>ternal<br />

control.


Here is the list of picked genes (with their correspond<strong>in</strong>g primers) from microarray data for<br />

further validations:<br />

qRT-Primers for ZDHHC2 (NM_016353)<br />

ZDHHC2 Forward: TCTTAGGCGAGCAGCCAAGGAT<br />

ZDHHC2 Reverse: CAGTGATGGCAGCGATCTGGTT<br />

qRT-Primers for KDM5B (NM_006618)<br />

KDM5B Forward: AGCCAGAGACTGGCTTCAGGAT<br />

KDM5B Reverse: AGCCTGAACCTCAGCTACTAGG<br />

qRT-Primers for E2F2 (NM_004091)<br />

E2F2 Forward: CTCTCTGAGCTTCAAGCACCTG<br />

E2F2 Reverse: CTTGACGGCAATCACTGTCTGC<br />

qRT-Primers for BCLAF1 (NM_014739)<br />

BCLAF1 Forward: CCTAAACGAGCGGTTCACTTCG<br />

BCLAF1 Reverse: GCTAAACGGGTATGCTTCCTCAG<br />

qRT-Primers for TERF1 (NM_017489)<br />

TERF1 Forward: CATGGAACCCAGCAACAAGACC<br />

TERF1 Reverse: CTGCTTTCAGTGGCTCTTCTGC<br />

qRT-Primers for TP53INP1 (NM_033285)<br />

TP53INP1 Forward: TGATGAATGGATTCTTGTTGACTTC<br />

TP53INP1 Reverse: TGAAGGGTGCTCAGTAGGTGAC<br />

qRT-Primers for PERP (NM_022121)<br />

PERP Forward: CCAGATGCTTGTCTTCCTGAGAG<br />

PERP Reverse: AGTGACAGCAGGGTTGGCATGA<br />

2.5.2 PCR<br />

For normal PCR assay, total RNA was extracted from cell l<strong>in</strong>es us<strong>in</strong>g TRIzol reagent<br />

(Invitrogen). This was done as a quality check before runn<strong>in</strong>g qPCR.<br />

2.6 Microarray Analysis<br />

Microarray analysis was done at Department of Computational Biology at <strong>KTH</strong> with the<br />

help of doctoral student, Aymeric Fouquier d‟Hérouel, under the supervision of Prof. Erik<br />

19


Aurell. Annotations were obta<strong>in</strong>ed from Affymetrix probset annotation file - HuGene-1_0st-v1.r3.cdf.<br />

The whole analysis took place <strong>in</strong> June 2010. The PLIER algorithm was used<br />

for gene expression analysis. The primary analysis <strong>in</strong>cludes the follow<strong>in</strong>g <strong>in</strong>dividual<br />

operations:<br />

1) Image correction<br />

2) Global <strong>and</strong> local background correction<br />

3) Feature normalization<br />

4) Spatial normalizatione<br />

5) Global normalization<br />

2.6.1 Def<strong>in</strong><strong>in</strong>g parameters used:<br />

In order to analyze large microarray data, it is important to <strong>in</strong>troduce some parameters to<br />

filter out noise. The expression values of genes are rang<strong>in</strong>g approximately from 0.01 to<br />

10000. The follow<strong>in</strong>g parameters are chosen for elim<strong>in</strong>at<strong>in</strong>g noise, while not los<strong>in</strong>g useful<br />

<strong>in</strong>formation:<br />

1. Expression values > 30 (applied on all samples simultaneously) AND<br />

2. Log2 (miR-<strong>155</strong> mimic 100nM / miR-<strong>155</strong> control 50nM) < - 0.5 AND<br />

3. Log2 (miR-<strong>155</strong> mimic 50nM / miR-<strong>155</strong> control 50nM) < - 0.5 AND<br />

4. 2 < Log2 (miR-<strong>155</strong> control 50nM / Np69) < 0.5<br />

2.7 How to use Microarray data <strong>and</strong> target predictions<br />

Microarray data shows the change <strong>in</strong> mRNA expression <strong>in</strong> vitro, whereas target prediction<br />

predict<strong>in</strong>g the miRNA-mRNA <strong>in</strong>teraction <strong>in</strong> silico. By comb<strong>in</strong><strong>in</strong>g those two types of data,<br />

the target<strong>in</strong>g mechanism was <strong>in</strong>vestigated.<br />

20


3 Results<br />

Result I: The comparison between predicted targets <strong>and</strong><br />

experimentally validated targets<br />

The list of predicted targets for PicTar was obta<strong>in</strong>ed from onl<strong>in</strong>e database at<br />

http://pictar.mdc-berl<strong>in</strong>.de/ <strong>in</strong> February 2010. In total, 199 miR-<strong>155</strong> target genes were<br />

obta<strong>in</strong>ed. The list of predicted targets for <strong>Target</strong>Scan 5.1 obta<strong>in</strong>ed from onl<strong>in</strong>e database at<br />

www.targetscan.org <strong>in</strong> February 2010. In total, 281 miR-<strong>155</strong> target genes were obta<strong>in</strong>ed.<br />

The list of predicted targets for DIANA-MicroT 3.0 obta<strong>in</strong>ed from onl<strong>in</strong>e database at<br />

http://diana.cslab.ece.ntua.gr/microT/ <strong>in</strong> February 2010. In total, 166 miR-<strong>155</strong> target genes<br />

were obta<strong>in</strong>ed. The list of predicted targets for MAMI obta<strong>in</strong>ed from onl<strong>in</strong>e database at<br />

http://mami.med.harvard.edu/ <strong>in</strong> February 2010. In total, 205 miR-<strong>155</strong> target genes were<br />

obta<strong>in</strong>ed.<br />

The manually constructed database has been created by us<strong>in</strong>g Tarbase [28] <strong>and</strong> different<br />

publications. Totally 37 genes were identified as experimentally validated miR-<strong>155</strong> targets.<br />

Those genes were used to check the precision of software packages dur<strong>in</strong>g downstream<br />

processes.<br />

Mirwalk [29] has been used for the construction of the second database. Totally 528 genes<br />

were identified as <strong>in</strong>direct miR-<strong>155</strong> targets. Those genes were also used to check the<br />

precision of software packages dur<strong>in</strong>g downstream processes.<br />

Eleven software packages/databases were tested by us<strong>in</strong>g a manually constructed database<br />

(37 targets) <strong>and</strong> Mirwalk [29] database (528 genes). By check<strong>in</strong>g the precision score of<br />

eleven softwares/databases us<strong>in</strong>g 2 different sets of validated targets, the reliability of those<br />

was assessed. The ones which showed highest precision <strong>and</strong> sensitivity at the same time<br />

were chosen to perform further predictions.<br />

21


Part 1 of Result I: The precision test us<strong>in</strong>g manually constructed database<br />

The software benchmark was implemented us<strong>in</strong>g 37 direct targets. The precision <strong>and</strong> sensitivity<br />

score of eleven software/databases were checked <strong>and</strong> ranked. Top four ones are significant enough<br />

for our further analysis.<br />

Software/<br />

Database<br />

DIANAmicroT<br />

3.0<br />

TRUE_POSITIVE Total_#_of_targets Precision Sensitivity<br />

12 166 7.23 0.32<br />

<strong>Target</strong>scan 22 281 7.83 0.59<br />

Pictar 17 199 8.54 0.46<br />

MAMI 16 205 7.8 0.43<br />

EIMMO 31 2955 1.05 0.84<br />

Mir<strong>and</strong>a 26 1952 1.33 0.7<br />

PITA 29 1266 2.29 0.78<br />

RNA22 1 332 0.3 0.03<br />

<strong>Target</strong>rank 27 682 3.96 0.73<br />

Mirgator 28 723 3.87 0.76<br />

Mirbase 15 854 1,75 0,40<br />

Table5: The software benchmark us<strong>in</strong>g 37 direct targets. The precision <strong>and</strong> sensitivity score of<br />

eleven software/databases were checked <strong>and</strong> ranked. Top four ones are significant enough for our<br />

further analysis.<br />

Part 2 of Result I: The precision test by us<strong>in</strong>g Mirwalk<br />

The software benchmark was implemented us<strong>in</strong>g 528 <strong>in</strong>direct targets. The precision <strong>and</strong> sensitivity<br />

score of eleven software/databases were checked <strong>and</strong> ranked. The same top four ones are obta<strong>in</strong>ed<br />

as <strong>in</strong> the previous test case (see Table 3). Therefore, those four software packages/databases were<br />

found significant enough for our further analysis.<br />

22


Software TRUE_POSITIVES Total_#_of_targets Precision Sensitivity<br />

DIANAmicroT<br />

24 166 14.46 0.05<br />

<strong>Target</strong>scan 38 281 13.52 0.07<br />

Pictar 24 199 12.06 0.05<br />

MAMI 33 205 16.1 0.06<br />

EIMMO 123 2955 4.16 0.23<br />

Mir<strong>and</strong>a 118 1952 6.05 0.22<br />

PITA 82 1266 6.48 0.16<br />

RNA22 18 332 5.42 0.03<br />

<strong>Target</strong>rank 53 682 7.77 0.1<br />

Mirgator 57 723 7.88 0.11<br />

Mirbase 35 854 4.1 0.07<br />

Table6: The software benchmark us<strong>in</strong>g 528 direct <strong>and</strong> <strong>in</strong>direct targets by Mirwalk - the database of<br />

experimentally validated miRNA targets. The precision <strong>and</strong> sensitivity score of eleven<br />

software/databases were checked <strong>and</strong> ranked. Top four ones are significant enough for our further<br />

analysis.<br />

Result II: Amalgamation of predicted targets of top 4 software<br />

packages yielded 9 miR-<strong>155</strong> target c<strong>and</strong>idates<br />

Top scor<strong>in</strong>g software packages predicted their own gene-sets. It is not obvious which genes are<br />

potentially targets without comb<strong>in</strong><strong>in</strong>g targets of four software packages. Thus by amalgamat<strong>in</strong>g<br />

predicted targets from all four, the list of genes that were predicted by correspond<strong>in</strong>g software<br />

package was constructed as below:<br />

Gene_Symbol DIANA TargS MAMI Pictar TOTAL Gene_name<br />

NUFIP2 + + + + 4 nuclear fragile X mental retardation prote<strong>in</strong> <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2<br />

MAP3K7IP2 + + + + 4 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 7 <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2<br />

SGK3 + + + + 4 serum/glucocorticoid regulated k<strong>in</strong>ase family, member 3<br />

TSHZ3 + + + + 4 teashirt z<strong>in</strong>c f<strong>in</strong>ger homeobox 3<br />

SEMA5A + + + + 4<br />

23<br />

sema doma<strong>in</strong>, seven thrombospond<strong>in</strong> repeats (type 1 <strong>and</strong> type 1-like),<br />

transmembrane<br />

RAB11FIP2 + + + + 4 RAB11 family <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2 (class I)


SEPT11 + + + + 4 sept<strong>in</strong> 11<br />

FAR1 + + + + 4 fatty acyl CoA reductase 1<br />

KRAS + + + + 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog<br />

ETS1 + + + + 4 v-ets erythroblastosis virus E26 oncogene homolog 1 (avian)<br />

BACH1 + + + + 4 BTB <strong>and</strong> CNC homology 1, basic leuc<strong>in</strong>e zipper transcription factor 1<br />

ZNF236 + + + + 4 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 236<br />

DCUN1D3 - + + + 3 DCN1, defective <strong>in</strong> cull<strong>in</strong> neddylation 1, doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 3<br />

ETNK2 - + + + 3 ethanolam<strong>in</strong>e k<strong>in</strong>ase 2<br />

DNAJB7 - + + + 3 DnaJ (Hsp40) homolog, subfamily B, member 7<br />

IKBKE - + + + 3 <strong>in</strong>hibitor of kappa light polypeptide gene enhancer <strong>in</strong> B-cells<br />

HDAC4 + - + + 3 histone deacetylase 4<br />

FBXO11 + + - + 3 F-box prote<strong>in</strong> 11<br />

CACNA1C - + + + 3 hypothetical prote<strong>in</strong> LOC100131098;<br />

C3orf18 - + + + 3 chromosome 3 open read<strong>in</strong>g frame 18<br />

UBQLN1 + - + + 3 ubiquil<strong>in</strong> 1<br />

CSF1R - + + + 3 colony stimulat<strong>in</strong>g factor 1 receptor<br />

CD47 - + + + 3 CD47 molecule<br />

CARHSP1 - + + + 3 calcium regulated heat stable prote<strong>in</strong> 1, 24kDa<br />

YWHAE - + + + 3 similar to 14-3-3 prote<strong>in</strong> epsilon (14-3-3E)<br />

MIDN + + + - 3 midnol<strong>in</strong><br />

MAP3K14 + + - + 3 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 14<br />

MAP3K10 - + + + 3 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 10<br />

NFAT5 - + + + 3 nuclear factor of activated T-cells 5, tonicity-responsive<br />

N4BP1 - + + + 3 NEDD4 b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 1<br />

MYO10 - + + + 3 myos<strong>in</strong> X<br />

KPNA1 + + - + 3 karyopher<strong>in</strong> alpha 1 (import<strong>in</strong> alpha 5)<br />

KIAA1274 - + + + 3 KIAA1274<br />

JARID2 + + + - 3 jumonji, AT rich <strong>in</strong>teractive doma<strong>in</strong> 2<br />

LRRC59 - + + + 3 leuc<strong>in</strong>e rich repeat conta<strong>in</strong><strong>in</strong>g 59<br />

Table 7: Intersection of predicted targets by four different softwares. Blue ones are validated<br />

DIRECT targets of miR-<strong>155</strong>. The whole list is shown at Appendix 1.<br />

If we consider the precision scores from test case I: it is ~ 8 %. After comb<strong>in</strong><strong>in</strong>g prediction<br />

results of four software packages, this percentage <strong>in</strong>creases ~25 % when consider<strong>in</strong>g 4 hits.<br />

This means that, 3 out of 12 hits which were predicted by all four software packages are<br />

experimentally validated direct miR-<strong>155</strong> targets. This br<strong>in</strong>gs the idea that other 9 targets<br />

(see Table 8) are strong potential miR-<strong>155</strong> targets, which could be checked dur<strong>in</strong>g further<br />

validation experiments.<br />

24


Gene_Symbol DIANA TargS MAMI Pictar TOTAL Gene_name<br />

NUFIP2 + + + + 4 nuclear fragile X mental retardation prote<strong>in</strong> <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2<br />

MAP3K7IP2 + + + + 4 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 7 <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2<br />

SGK3 + + + + 4 serum/glucocorticoid regulated k<strong>in</strong>ase family, member 3<br />

SEMA5A + + + + 4<br />

25<br />

sema doma<strong>in</strong>, seven thrombospond<strong>in</strong> repeats (type 1 <strong>and</strong> type 1-like),<br />

transmembrane<br />

RAB11FIP2 + + + + 4 RAB11 family <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2 (class I)<br />

SEPT11 + + + + 4 sept<strong>in</strong> 11<br />

FAR1 + + + + 4 fatty acyl CoA reductase 1<br />

KRAS + + + + 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog<br />

Table 8: The list of 9 miR-<strong>155</strong> target c<strong>and</strong>idates. All of those genes have been predicted by all 4<br />

top scor<strong>in</strong>g target prediction software packages.<br />

Result III: Quality control of microarray data<br />

By plott<strong>in</strong>g the scatter plot the reproducibility of the microarray experiment was checked.<br />

Even though it is not the exact parameters that were checked (miR-<strong>155</strong> mimic 50nM was<br />

aimed to represent “roughly” the duplicate of miR-<strong>155</strong> mimic 100nM), it still shows that<br />

the data is reproducible.


Figure 9: Scatter plot show<strong>in</strong>g miR-<strong>155</strong> mimic 100nM <strong>and</strong> miR-<strong>155</strong> mimic 50Mg. This figure<br />

roughly suggests the correlation between miR-<strong>155</strong> mimic 100ng <strong>and</strong> miR-<strong>155</strong> mimic 50nM data.<br />

Result IV: Elucidat<strong>in</strong>g microarray data revealed some potential miR-<strong>155</strong><br />

target genes<br />

Part I: Us<strong>in</strong>g DAVID revealed two c<strong>and</strong>idate genes: WEE1 <strong>and</strong> DPY19L1<br />

As a result of Microarray analysis, us<strong>in</strong>g specified parameters described <strong>in</strong> Methods<br />

section, 395 genes (not shown) were obta<strong>in</strong>ed. Only 363 out of 395 genes were annotated<br />

on The Database for Annotation, Visualization <strong>and</strong> Integrated Discovery v6.7 (DAVID)‟s<br />

database, thus were chosen for further functional analysis [36, 37]. The human genome<br />

was used as background for the functional annotation. Results are sorted accord<strong>in</strong>g to pvalues<br />

(see Figure 10).<br />

The first significant functionally annotated group was identified – “ur<strong>in</strong>ary bladder<br />

tumor_disease_3rd”, which belongs to the database – “UNIGENE_EST_QUARTILE”.<br />

The list of 105 genes which belong to “ur<strong>in</strong>ary bladder tumor_disease_3rd” are provided <strong>in</strong><br />

Appendix 2.<br />

Furthermore, other “UNIGENE_EST_QUARTILE” related enriched datasets are: adrenal<br />

tumor_disease_3 rd , oral tumor_disease_3 rd , thyroid tumor_disease_3 rd , ear_normal_3 rd ,<br />

esophageal tumor_disease_3 rd , tongue_normal_3 rd , pharynx_normal_3 rd , mammary<br />

gl<strong>and</strong>_normal_3 rd , larynx_normal_3 rd , laryngeal cancer_disease_3 rd , pharyngeal<br />

tumor_disease_3 rd <strong>and</strong> esophagus_normal_3 rd . Those datasets <strong>in</strong>clude list of genes that are<br />

related to correspond<strong>in</strong>g tissues. S<strong>in</strong>ce those datasets are highly enriched <strong>in</strong> this study,<br />

mean<strong>in</strong>g that miR-<strong>155</strong> mimic downregulated genes <strong>in</strong> these datasets. In addition, many of<br />

those tissues are located on either digestive or respiratory track where they are somehow<br />

anatomically close to the nasopharyngeal tissue. To give a concrete example,<br />

pharynx_normal_3 rd has 68 genes which are enriched among those 363 annotated genes<br />

(see Appendix 3). Those genes are related to normal pharynx tissue accord<strong>in</strong>g to<br />

“UNIGENE_EST_QUARTILE” database. S<strong>in</strong>ce we are deal<strong>in</strong>g with nasopharyngeal<br />

tissue, those 68 genes are found to be highly significant for further validation analysis. By<br />

quick look<strong>in</strong>g up to the DIANA MicroT prediction results, 2 genes from the<br />

pharynx_normal_3 rd dataset are found, which are:<br />

DPY19L1, dpy-19-like 1 (C. elegans); similar to hCG1645499 [36]<br />

WEE1, WEE1 homolog (S. pombe) [36]<br />

WEE1 is predicted by three top scor<strong>in</strong>g software packages: DIANA MicroT, <strong>Target</strong>Scan<br />

<strong>and</strong> Pictar. Accord<strong>in</strong>g to DIANA MicroT it has 9mer (9 nucleotide match at seed region).<br />

This <strong>in</strong>creases the possibility of WEE1 be<strong>in</strong>g a potential miR-<strong>155</strong> target. Additionally,<br />

DPY19L1 has also been predicted by DIANA MicroT, <strong>in</strong> which it has two 8mers.<br />

26


Figure 10: Functional annotation of 395 genes that were obta<strong>in</strong>ed by microarray data.<br />

DAVID [36, 37] is used to perform the functional annotation.<br />

Part II: Compar<strong>in</strong>g predictions to microarray data slightly <strong>in</strong>creased the accuracy<br />

<strong>and</strong> revealed potential miR-<strong>155</strong> target c<strong>and</strong>idate genes, such as kras, sgk3,<br />

MAP3K7IP2 <strong>and</strong> far1<br />

Among the genes predicted by at least 1 top scor<strong>in</strong>g software packages, the genes most<br />

downregulated ones are also predicted more than once. In addition, experimentally<br />

validated direct targets were enriched. The precision <strong>in</strong>creased a little bit, ~30% (9 out of<br />

30 genes are on the Table 6). Moreover, 4 genes shown red <strong>in</strong> Table 6 - kras, sgk3,<br />

MAP3K7IP2 <strong>and</strong> far1 are predicted by all of the top scor<strong>in</strong>g software packages <strong>and</strong> also<br />

significantly (at least 25%) downregulated <strong>in</strong> microarray experiment. Those 4 genes are<br />

strong miR-<strong>155</strong> c<strong>and</strong>idate target genes that could be considered for further validations.<br />

27


Gene_Symbol DIANA TargS MAMI Pictar TOTAL LOG2_100 LOG2_50<br />

LOG2_TW<br />

03<br />

p53DINP1 - + + + 3 -1.77 -1.72 3.95<br />

Myo1d - + + + 3 0.41 -1.59 -1.85<br />

VAV3 + + - - 2 0.26 -1.28 -0.94<br />

KRAS + + + + 4 -0.69 -1.18 -1.08<br />

ADD3 + - + + 3 -1.39 -1.15 1.6<br />

ETNK2 - + + + 3 -0.77 -1.01 3.15<br />

PICALM + - + - 2 -0.62 -0.83 -1.41<br />

BCAT1 + + - + 3 0.15 -0.73 -2.1<br />

ZNF652 + + + - 3 -0.66 -0.71 -0.55<br />

TSGA14 - + + + 3 -0.97 -0.66 1.37<br />

ETS1 + + + + 4 -0.34 -0.66 0.29<br />

CARHSP1 - + + + 3 -0.39 -0.65 -0.48<br />

JARID2 + + + - 3 0.17 -0.65 -0.88<br />

SDCBP - + + + 3 0.03 -0.6 -0.54<br />

USP48 - + + + 3 -0.47 -0.57 0.27<br />

SMAD1 + + - - 2 0.08 -0.55 -0.22<br />

MEIS1 - + - + 2 0.08 -0.54 1.55<br />

kcip-1 + + - + 3 -0.1 -0.54 -0.42<br />

MYO10 - + + + 3 0.14 -0.53 -1.88<br />

SGK3 + + + + 4 -0.32 -0.5 -0.62<br />

WWC1 - + + + 3 0 -0.48 -1.1<br />

CSNK1G2 - + + + 3 -0.04 -0.47 -0.86<br />

HIF1A - - + + 2 -0.18 -0.46 -0.22<br />

UBQLN1 + - + + 3 -0.19 -0.46 -0.71<br />

YWHAE - + + + 3 -0.1 -0.38 -0.37<br />

ARID2 + + - - 2 -0.29 -0.33 -0.01<br />

MAP3K7IP2 + + + + 4 -0.17 -0.32 -0.53<br />

KPNA1 + + - + 3 0.1 -0.29 -1.49<br />

FAR1 + + + + 4 -0.2 -0.28 -1.32<br />

SLA - + + + 3 -0.06 -0.28 -0.16<br />

Table 9: Comb<strong>in</strong>ation of microarray data with prediction data. The microarray data is<br />

<strong>in</strong>corporated to the list of targets <strong>in</strong> Appendix 1. Blue ones on the left <strong>in</strong>dicate that the gene has<br />

been validated by wet-lab experiments. LOG2_100 <strong>in</strong>dicates: log2(miR-<strong>155</strong> mimic 100 nM / miR-<br />

<strong>155</strong> control 50 nM). LOG2_50 <strong>in</strong>dicates: log2(miR-<strong>155</strong> mimic 50 nM / miR-<strong>155</strong> control 50 nM).<br />

LOG2_TW03 <strong>in</strong>dicates: log2 (TW03 / NP69).<br />

28


Result V: GOEAST analysis revealed the importance of prote<strong>in</strong> <strong>and</strong><br />

nucleotide b<strong>in</strong>d<strong>in</strong>g related genes via Gene Ontologies<br />

The analysis of 395 genes us<strong>in</strong>g GOEAST revealed the importance of prote<strong>in</strong> <strong>and</strong><br />

nucleotide b<strong>in</strong>d<strong>in</strong>g related genes. This also means that the significant portion of 395 genes<br />

is transcription factors (GO: 0000166).<br />

Another significantly enriched GO term is, GO:0005072 - transform<strong>in</strong>g growth factor beta<br />

receptor, cytoplasmic mediator activity, def<strong>in</strong>es the molecular function <strong>in</strong> which it expla<strong>in</strong>s<br />

the activity of any molecules that transmit the signal from a TGF-beta receptor from the<br />

cytoplasm to the nucleus [40]. As seen from Figure 11, there are totally 10 genes (see Table<br />

10) <strong>in</strong> GO:0005072, <strong>and</strong> 4 of them are enriched <strong>in</strong> the list <strong>in</strong>troduced.<br />

Database ID Gene_Symbol Reference Evidence Gene name<br />

UniProtKB O15105 SMAD7 PMID:9256479 IDA Mothers aga<strong>in</strong>st decapentaplegic homolog 7<br />

UniProtKB O15198 SMAD9 PMID:19018011TAS Mothers aga<strong>in</strong>st decapentaplegic homolog 9<br />

UniProtKB O43541 SMAD6 PMID:9256479 IDA Mothers aga<strong>in</strong>st decapentaplegic homolog 6<br />

UniProtKB P17813 ENG PMID:12015308IDA Endogl<strong>in</strong><br />

UniProtKB P46527 CDKN1B PMID:8033212 TAS Cycl<strong>in</strong>-dependent k<strong>in</strong>ase <strong>in</strong>hibitor 1B<br />

UniProtKB P84022 SMAD3 PMID:9111321 IDA Mothers aga<strong>in</strong>st decapentaplegic homolog 3<br />

UniProtKB Q13485 SMAD4 PMID:9389648 IDA Mothers aga<strong>in</strong>st decapentaplegic homolog 4<br />

UniProtKB Q15796 SMAD2 PMID:9256479 IDA Mothers aga<strong>in</strong>st decapentaplegic homolog 2<br />

Table 10: List of totally 10 genes <strong>in</strong> GO:0005072 - transform<strong>in</strong>g growth factor beta<br />

receptor, cytoplasmic mediator activity [40].<br />

Parameters that were chosen on GOEAST:<br />

Statistical test method: Hypergeometric<br />

Multi-test adjustment method: Yekutieli (FDR under dependency)<br />

Significance Level of Enrichment: 0.001<br />

29


Figure: 11 395 genes that were obta<strong>in</strong>ed by microarray data is used to analyze by the help of<br />

GOEAST. The gradient of the color yellow <strong>in</strong>dicates the significance of the correspond<strong>in</strong>g gene<br />

ontology (the more <strong>in</strong>tense the yellow is, the more the significance is because of lower p values).<br />

Result VI: <strong>Validation</strong> of microarray results by qPCR revealed that<br />

Zdhhc2 <strong>and</strong> tp53<strong>in</strong>p1 genes are significantly downregulated<br />

As a result of qPCR experiment, the quantification of selected genes was obta<strong>in</strong>ed. This let<br />

us accurately determ<strong>in</strong>e which gene(s) is/are downregulated <strong>in</strong> 5 different samples. As a<br />

result of qPCR, zdhhc2 <strong>and</strong> tp53<strong>in</strong>p1 showed downregulation <strong>in</strong> both miR-<strong>155</strong> mimic<br />

100nM <strong>and</strong> miR-<strong>155</strong> mimic 50nM when compared to miR-<strong>155</strong> control 50nM.<br />

As a result of qPCR, Zdhhc2 <strong>and</strong> tp53<strong>in</strong>p1 genes showed significant downregulation <strong>in</strong><br />

both miR-<strong>155</strong> mimic 100nM <strong>and</strong> miR-<strong>155</strong> mimic 50nM when compared to miR-<strong>155</strong><br />

control 50nM (see Figure 11 <strong>and</strong> 12).<br />

30


Figure11: The qPCR results of 2ef2, kdm5b <strong>and</strong> zdhhc2 genes. The zdhhc2 gene showed<br />

significant downregulation when consider<strong>in</strong>g NP69 mimic 50nM <strong>and</strong> NP69 mimic 100nM<br />

compared with NP69 control 50nM, NP69 parental <strong>and</strong> TW03. Other genes did not show<br />

significant downregulation.<br />

Figure 12: The qPCR results of bclaf1, terf1 <strong>and</strong> tp53<strong>in</strong>p1 genes. The tp53<strong>in</strong>p1 gene<br />

showed significant downregulation when consider<strong>in</strong>g NP69 mimic 50nM <strong>and</strong> NP69 mimic<br />

100nM compared with NP69 control 50nM, NP69 parental <strong>and</strong> TW03.<br />

31


Figure 13: The qPCR results of the gene perp. This gene did not show significant<br />

downregulation.<br />

32


4 Discussion<br />

F<strong>in</strong>d<strong>in</strong>g the best work<strong>in</strong>g software packages for miRNA target prediction is quite<br />

complicated for many reasons. Different software packages use different parameters as<br />

well as different 3‟UTRs (some of them only considers the longest 3‟UTR, while the other<br />

consider<strong>in</strong>g all possible 3‟UTRs). These differences result <strong>in</strong> different set of targets for<br />

particular miRNA. Another complication is that hav<strong>in</strong>g different output formats from<br />

different software packages. This needs to be converted <strong>in</strong>to common identifier <strong>and</strong><br />

sometimes it is difficult to f<strong>in</strong>d the proper identifier.<br />

The biological difference between animal <strong>and</strong> plant miRNA target<strong>in</strong>g mechanism rema<strong>in</strong>s<br />

largely unknown. The obvious difference is the site of miRNA b<strong>in</strong>d<strong>in</strong>g, which <strong>in</strong> plant is<br />

CDS, while <strong>in</strong> animals it is 3‟UTR. Theoretically, miRNA can b<strong>in</strong>d to CDS <strong>in</strong> animals, too.<br />

Maybe this is where the difference arise, that b<strong>in</strong>d<strong>in</strong>g CDS is more difficult <strong>in</strong> animals than<br />

plants, because of ribosome or other translation factors occupy<strong>in</strong>g mRNA. B<strong>in</strong>d<strong>in</strong>g to the<br />

CDS might be difficult to avoid <strong>in</strong> plants. Another hypothesis would be the difference <strong>in</strong><br />

the effect of RISC, mean<strong>in</strong>g that <strong>in</strong> plants RISC might b<strong>in</strong>d to correspond<strong>in</strong>g miRNA so<br />

that it would favor to have complete complementarity. This leads to the fact that, miRNA<br />

target prediction <strong>in</strong> plants is easier.<br />

The first evaluation of this study is that, 4 different miRNA prediction softwares namely –<br />

<strong>Target</strong>Scan 5.1, PicTar 5, MAMI <strong>and</strong> DIANA-MicroT 3.0 could be used for further<br />

<strong>in</strong>vestigation. Incorporat<strong>in</strong>g microarray data <strong>in</strong>to the study slightly strengthened the results<br />

ga<strong>in</strong>ed from those softwares. However, the overall result is, computational predictions <strong>and</strong><br />

microarray data didn't add drammatic effect.<br />

By us<strong>in</strong>g computational predictions <strong>and</strong> microarray data 2 potentially strong miR-<strong>155</strong><br />

target genes is another contribution of this study. These two targets namely, tp53<strong>in</strong>p1 <strong>and</strong><br />

zdhhc2, will be considered for further validation, especially Luciferase reporter assay.<br />

Us<strong>in</strong>g 395 genes for functional enrichment studies could be considered as another source<br />

for f<strong>in</strong>d<strong>in</strong>g potential targets. As described <strong>in</strong> Methods, those 395 genes have at least 25%<br />

downregulation (log2 of fold changes is less than -0.5). One might consider 25%<br />

downregulation as <strong>in</strong>significant or noise, but when it comes to miRNAs, many reported<br />

studies <strong>in</strong>dicated that 25% downregulation also matters. Therefore, c<strong>and</strong>idate target genes<br />

found by us<strong>in</strong>g DAVID, DPY19L1 <strong>and</strong> WEE1, are also significant enough for further<br />

validation experiments. The amount of WEE1 (a nuclear prote<strong>in</strong>, which is a tyros<strong>in</strong>e<br />

k<strong>in</strong>ase) enzyme decreases at M phase when it is hyperphosphorylated, is consistent with the<br />

idea that it might act as a negative regulator of entry <strong>in</strong>to mitosis. If one would make a<br />

story, the storyl<strong>in</strong>e would be; by downregulat<strong>in</strong>g WEE1 activity, mitosis will be kept active,<br />

which results <strong>in</strong> proliferation, which is favored by almost all cancer tissues.<br />

The target prediction <strong>and</strong> validation procedure could be improved by us<strong>in</strong>g alternative<br />

technologies. The alternative method for microarray would be RNA-Seq, which is a<br />

33


technique that quantifies the transcriptome of cells by us<strong>in</strong>g deep sequenc<strong>in</strong>g technologies.<br />

There are significant amount of publications support<strong>in</strong>g that RNA-Seq reveals more<br />

<strong>in</strong>formation than microarray, because it is not hybridization dependent technique (like<br />

microarray), <strong>in</strong> which detect<strong>in</strong>g different isoforms is less likely. S<strong>in</strong>ce multi-exon genes<br />

have the potential to produce different isoforms, <strong>and</strong> microarray mostly doesn‟t detect<br />

different isoforms, one could argue about the mislead<strong>in</strong>g of microarray data. Hypothetical<br />

example would be: Specific miRNA b<strong>in</strong>ds to specific isoform of a gene (alternative 3‟UTR<br />

splic<strong>in</strong>g events give rise to different 3‟UTR of a s<strong>in</strong>gle gene) <strong>and</strong> eventually downregulates<br />

it. The hybridization probe of microarray is unique for the gene, but it doesn‟t specifically<br />

b<strong>in</strong>d to 3‟UTR (it can b<strong>in</strong>d everywhere on mRNA). Thus while one isoform go<strong>in</strong>g down;<br />

other isoforms would still exist, contribut<strong>in</strong>g the total amount of mRNA <strong>in</strong> the sample.<br />

Consequently, the microarray data will not show highly downregulation, <strong>and</strong> this will lead<br />

to mis<strong>in</strong>terpret<strong>in</strong>g microarray data. As a result of all these, one could use RNA-Seq for the<br />

miRNA studies.<br />

Another part of the experiment was to validate microarray results by check<strong>in</strong>g the relative<br />

expression of some “significant” genes by qPCR. Be<strong>in</strong>g significant gene here could be<br />

expla<strong>in</strong>ed by hav<strong>in</strong>g a tumor suppression function. The follow<strong>in</strong>g 5 genes are related to<br />

tumor suppression or negative regulation of mitosis or positive regulation of apoptosis.<br />

This means that <strong>in</strong> the absence of those genes there is a certa<strong>in</strong> risk of the cell be<strong>in</strong>g highly<br />

proliferative <strong>and</strong> becom<strong>in</strong>g a tumor cell.<br />

TERF1, BCLAF1 ----> Negative regulation of mitosis, GO: 0045839.<br />

PERP, TP53INP1 ---> Positive regulation of apoptosis, GO: 0043065.<br />

E2F2 -----> Plays a crucial role <strong>in</strong> the control of cell cycle <strong>and</strong> action of tumor suppressor<br />

prote<strong>in</strong>s <strong>and</strong> is also a target of the transform<strong>in</strong>g prote<strong>in</strong>s of DNA tumor viruses.<br />

Recent study [41] has published that SMAD2 is direct target of miR-<strong>155</strong>. SMAD2 was<br />

also enriched <strong>in</strong> this study dur<strong>in</strong>g GO analysis of GOEAST. SMAD2 belongs to<br />

GO:0005072 which has 10 genes mostly belong<strong>in</strong>g to SMAD family, those act as a<br />

mediators of TGF-β (pleiotropic cytok<strong>in</strong>e, with important effects on processes such as<br />

fibrosis, angiogenesis <strong>and</strong> immunosupression) signal<strong>in</strong>g. Upregulation of miR-<strong>155</strong> altered<br />

the response mechanisms to TGF-β by chang<strong>in</strong>g the expression of target genes which are<br />

<strong>in</strong>volved <strong>in</strong> <strong>in</strong>flammation, fibrosis <strong>and</strong> angiogenesis. Briefly, this br<strong>in</strong>gs the idea that other<br />

SMAD family genes that were enriched <strong>in</strong> our study could be checked dur<strong>in</strong>g further<br />

validations.<br />

34


5 References<br />

[1] Bartel DP: MicroRNAs: <strong>Target</strong> Recognition <strong>and</strong> Regulatory Functions. Cell 2009,<br />

136(2):215-233<br />

[2] Lee, R. C., Fe<strong>in</strong>baum, R. L., <strong>and</strong> Ambros, V. (1993). The C. elegans heterochronic gene<br />

l<strong>in</strong>-4 encodes small RNAs with antisense complementarity to l<strong>in</strong>-14. Cell 75, 843-854.<br />

[3] Rhoades MW, Re<strong>in</strong>hart BJ, Lim LP, Burger CB, Bartel B, Bartel DP: <strong>Prediction</strong> of plant<br />

microRNA targets. Cell 2003, 110:513-520.<br />

[4] Re<strong>in</strong>hart BJ, Slack F, Basson M, Pasqu<strong>in</strong>elli A, Bett<strong>in</strong>ger J, Rougvie A, Horvitz HR,<br />

Ruvkun G: The 21-nucleotide let-7 RNA regulates developmental tim<strong>in</strong>g <strong>in</strong> Caenorhabditis<br />

elegans. Nature 2000,<br />

403:901-906.<br />

[5] Lee RC, Fe<strong>in</strong>baum RL, Ambros V: The C. elegans heterochronic gene l<strong>in</strong>-4 encodes<br />

small RNAs with antisense complementarity to l<strong>in</strong>-14. Cell 1993, 75:843-854.<br />

[6] Xiaowei Wang <strong>and</strong> Issam M. El Naqa (2008) <strong>Prediction</strong> of both conserved <strong>and</strong><br />

nonconserved microRNA targets <strong>in</strong> animals. Bio<strong>in</strong>formatics 24(3):325-332.<br />

[7] Xiaowei Wang (2008) miRDB: a microRNA target prediction <strong>and</strong> functional annotation<br />

database with a wiki <strong>in</strong>terface. RNA 14(6):1012-1017<br />

[8] Lewis BP, Burge CB, Bartel DP. Conserved seed pair<strong>in</strong>g, often flanked by adenos<strong>in</strong>es,<br />

<strong>in</strong>dicates that thous<strong>and</strong>s of human genes are microRNA targets. Cell 2005;120:15-20.<br />

[9] Ambros V (2004). The functions of animal microRNAs. Nature 431: 350–355<br />

[10] Bushati N, Cohen SM (2007) microRNA functions. Annu Rev Cell Dev Biol 23: 175–<br />

205<br />

[11] Sevignani C, Cal<strong>in</strong> GA, Nnadi SC, Shimizu M, Davuluri RV, Hyslop T, Demant P,<br />

Croce CM, Siracusa LD (2007) MicroRNA genes are frequently located near mouse cancer<br />

susceptibility loci. Proc Natl Acad Sci USA 104: 8017– 8022<br />

[12] Asangani IA, Rasheed SA, Nikolova DA, Leupold JH, Colburn NH, Post S, Allgayer<br />

H (2008) MicroRNA-21 (miR-21) post-transcriptionally downregulates tumor suppressor<br />

Pdcd4 <strong>and</strong> stimulates <strong>in</strong>vasion, <strong>in</strong>travasation <strong>and</strong> metastasis <strong>in</strong> colorectal cancer. Oncogene<br />

27: 2128–2136<br />

[13] Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khan<strong>in</strong> R, Rajewsky N (2008)<br />

35


Widespread changes <strong>in</strong> prote<strong>in</strong> synthesis <strong>in</strong>duced by microRNAs. Nature 455: 58– 63<br />

[14] Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA<br />

target<strong>in</strong>g specificity <strong>in</strong> mammals: determ<strong>in</strong>ants beyond seed pair<strong>in</strong>g. Mol Cell<br />

2007;27(1):91–105.<br />

[15] Saito T., <strong>and</strong> Sætrom P., (2010). MicroRNAs – target<strong>in</strong>g <strong>and</strong> target prediction<br />

[16] UCSC Genome Browser on Human Mar. 2006 (NCBI36/hg18) Assembly. Retrieved <strong>in</strong><br />

04.04.2010 from http://genome.ucsc.edu/cgib<strong>in</strong>/hgTracks?db=hg18&position=chr21:25868163-<br />

25868227&hgt.customText=http://mirnamap.mbc.nctu.edu.tw/cache/bed/hsa-mirna.bed<br />

[17] The pre-miRNA of MI0000681. Retrieved <strong>in</strong> 04.04.2010 from<br />

http://mirnamap.mbc.nctu.edu.tw/php/mirna_entry.php?acc=MI0000681<br />

[18] Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. <strong>Prediction</strong> of<br />

mammalian microRNA targets. Cell. 115: 787-798 (2003).<br />

[19] Lytle, J.R. et al. (2007) <strong>Target</strong> mRNAs are repressed as efficiently by<br />

microRNAb<strong>in</strong>d<strong>in</strong>g sites <strong>in</strong> the 50 UTR as <strong>in</strong> the 30 UTR. Proc. Natl. Acad. Sci. U. S. A.<br />

104, 9667– 9672.<br />

[20] Arvey A, Larsson E, S<strong>and</strong>er C, Leslie CS, Marks DS. <strong>Target</strong> mRNA abundance dilutes<br />

microRNA <strong>and</strong> siRNA activity. Molecular Systems Biology (2010) 6:363.<br />

[21] M. Maragkakis; M. Reczko; V. A. Simossis; P. Alexiou; G. L. Papadopoulos; T.<br />

Dalamagas; G. Giannopoulos; G. Goumas; E. Koukis; K. Kourtis; T. Vergoulis; N. Koziris;<br />

T. Sellis; P. Tsanakas; A. G. Hatzigeorgiou. DIANA-microT web server: elucidat<strong>in</strong>g<br />

microRNA functions through target prediction. Nucleic Acids Research 2009 Jul 1; 37(Web<br />

Server issue):W273-6.<br />

[22] Friedman, R.C., Farh K. K., Christopher B Burge, David P Bartel. (2009) Most<br />

mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105<br />

[23] Rajewsky N., <strong>and</strong> Chen K. Natural selection on human microRNA b<strong>in</strong>d<strong>in</strong>g sites<br />

<strong>in</strong>ferred from SNP data. Nature Genetics 38, 1452 - 1456 (2006)<br />

[24] Kong W, He L, Coppola M, Guo J, Esposito NN, Coppola D, Cheng JQ. MicroRNA-<br />

<strong>155</strong> regulates cell survival, growth <strong>and</strong> chemosensitivity by target<strong>in</strong>g FOXO3a <strong>in</strong> breast<br />

cancer. J Biol Chem. 2010 Apr 6<br />

[25] Brown JR, Sanseau P. A computational view of microRNAs <strong>and</strong> their targets. Drug<br />

Discov Today. 10: 595-601 (2005)<br />

[26] Griffiths-Jones S, Sa<strong>in</strong>i HK, van Dongen S, Enright AJ. miRBase: tools for microRNA<br />

genomics. Nucleic Acids Res. 36: D154-D158 (2008)<br />

[27] Hammell CM. The microRNA-argonaute complex: a platform for mRNA modulation.<br />

36


RNA Biol 2008;5(3):123–7.<br />

[28] The database of experimentally supported targets: a functional update of TarBase.<br />

(Papadopoulos GL, Reczko M, Simossis VA, Sethupathy P, Hatzigeorgiou AG.), Nucleic<br />

Acids Res. 2009 Jan;37(Database issue):D<strong>155</strong>-8. Epub 2008 Oct 27.<br />

[29] Mirwalk, http://www.ma.uni-heidelberg.de/apps/zmf/mirwalk/<strong>in</strong>dex.html<br />

[30] Uniprot, http://www.uniprot.org/keywords/?query=name:"Phosphoprote<strong>in</strong>"<br />

[31] Nucleic Acids Res. 2008 May 16. GOEAST: a web-based software toolkit for Gene<br />

Ontology enrichment analysis. Zheng Q, Wang XJ. PMID: 18487275<br />

[32] Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP,<br />

L<strong>in</strong>sley PS, Johnson JM. Microarray analysis shows that some microRNAs downregulate<br />

large numbers of target mRNAs, Nature. 433: 769-773 (2005)<br />

[33] Eulalio A, Huntz<strong>in</strong>ger E, Nishihara T, Rehw<strong>in</strong>kel J, Fauser M, Izaurralde E (January<br />

2009)."Deadenylation is a widespread effect of miRNA regulation". RNA 15 (1): 21–<br />

32.doi:10.1261/rna.1399509. PMID 19029310.<br />

[34] Sunkar R, Jagadeeswaran G. In silico identification of conserved microRNAs <strong>in</strong> large<br />

number of diverse plant species. BMC Plant Biol. 2008;8:37.<br />

[35] Howell F Moffett <strong>and</strong> Carl D Nov<strong>in</strong>a (2007). A small RNA makes a Bic difference.<br />

Genome Biology 2007, 8:221 (doi:10.1186/gb-2007-8-7-221)<br />

[36] Huang DW, Sherman BT, Lempicki RA. Systematic <strong>and</strong> <strong>in</strong>tegrative analysis of large<br />

gene lists us<strong>in</strong>g DAVID Bio<strong>in</strong>formatics Resources. Nature Protoc. 2009;4(1):44-57.<br />

[37] Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA.<br />

DAVID: Database for Annotation, Visualization, <strong>and</strong> Integrated Discovery. Genome Biol.<br />

2003;4(5):P3<br />

[38] Retrieved from Genecards, http://www.genecards.org, on June 2010.<br />

[39] Anthony A. Millar <strong>and</strong> Peter M. Waterhouse. (2005) Plant <strong>and</strong> animal microRNAs:<br />

similarities <strong>and</strong> differences. FUNCTIONAL & INTEGRATIVE GENOMICS. 5:3, 129-<br />

135, DOI: 10.1007/s10142-005-0145-2.<br />

[40] Retrieved from www.geneontology.org<br />

[41] Louafi F, Mart<strong>in</strong>ez-Nunez RT, Sanchez-Elsner T.(2010). Microrna-<strong>155</strong> (miR-<strong>155</strong>)<br />

targets SMAD2 <strong>and</strong> modulates the response of macrophages to transform<strong>in</strong>g growth factor-<br />

{beta}. J Biol Chem.<br />

37


Appendices<br />

Supplementary Figure 1<br />

Supplementary Figure 1: The scatter plot show<strong>in</strong>g the correlation between NP69 tissue<br />

<strong>and</strong> NP69 control 50nM did not significant correlation between them.<br />

38


Appendix 1<br />

DIANA-MicroT 3.0 http://diana.cslab.ece.ntua.gr/microT/<br />

<strong>Target</strong>Scan 5.1 http://www.targetscan.org/vert_50/<br />

Pictar 5.0 http://pictar.mdc-berl<strong>in</strong>.de/<br />

MAMI http://mami.med.harvard.edu/<br />

Blue ones are validated DIRECT targets of miR-<strong>155</strong><br />

Gene_Symbol DIANA TargS MAMI Pictar TOTAL Gene_name<br />

NUFIP2 1 1 1 1 4<br />

MAP3K7IP2 1 1 1 1 4<br />

39<br />

nuclear fragile X mental retardation prote<strong>in</strong> <strong>in</strong>teract<strong>in</strong>g<br />

prote<strong>in</strong> 2<br />

mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 7<br />

<strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2<br />

SGK3 1 1 1 1 4 serum/glucocorticoid regulated k<strong>in</strong>ase family, member 3<br />

TSHZ3 1 1 1 1 4 teashirt z<strong>in</strong>c f<strong>in</strong>ger homeobox 3<br />

SEMA5A 1 1 1 1 4<br />

sema doma<strong>in</strong>, seven thrombospond<strong>in</strong> repeats (type 1 <strong>and</strong><br />

type 1-like)<br />

RAB11FIP2 1 1 1 1 4 RAB11 family <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2 (class I)<br />

SEPT11 1 1 1 1 4 sept<strong>in</strong> 11<br />

FAR1 1 1 1 1 4 fatty acyl CoA reductase 1<br />

KRAS 1 1 1 1 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog<br />

ETS1 1 1 1 1 4<br />

BACH1 1 1 1 1 4<br />

v-ets erythroblastosis virus E26 oncogene homolog 1<br />

(avian)<br />

BTB <strong>and</strong> CNC homology 1, basic leuc<strong>in</strong>e zipper<br />

transcription factor 1<br />

ZNF236 1 1 1 1 4 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 236<br />

DCUN1D3 1 1 1 3<br />

DCN1, defective <strong>in</strong> cull<strong>in</strong> neddylation 1, doma<strong>in</strong><br />

conta<strong>in</strong><strong>in</strong>g 3 (S. cerevisiae)<br />

ETNK2 1 1 1 3 ethanolam<strong>in</strong>e k<strong>in</strong>ase 2<br />

DNAJB7 1 1 1 3 DnaJ (Hsp40) homolog, subfamily B, member 7<br />

IKBKE 1 1 1 3<br />

<strong>in</strong>hibitor of kappa light polypeptide gene enhancer <strong>in</strong> Bcells,<br />

k<strong>in</strong>ase epsilon<br />

HDAC4 1 1 1 3 histone deacetylase 4<br />

FBXO11 1 1 1 3 F-box prote<strong>in</strong> 11<br />

CACNA1C 1 1 1 3 hypothetical prote<strong>in</strong> LOC100131098; calcium channel<br />

C3orf18 1 1 1 3 chromosome 3 open read<strong>in</strong>g frame 18<br />

UBQLN1 1 1 1 3 ubiquil<strong>in</strong> 1<br />

CSF1R 1 1 1 3 colony stimulat<strong>in</strong>g factor 1 receptor<br />

CD47 1 1 1 3 CD47 molecule<br />

CARHSP1 1 1 1 3 calcium regulated heat stable prote<strong>in</strong> 1, 24kDa<br />

YWHAE 1 1 1 3 similar to 14-3-3 prote<strong>in</strong> epsilon (14-3-3E)<br />

MIDN 1 1 1 3 midnol<strong>in</strong><br />

MAP3K14 1 1 1 3 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 14


MAP3K10 1 1 1 3 mitogen-activated prote<strong>in</strong> k<strong>in</strong>ase k<strong>in</strong>ase k<strong>in</strong>ase 10<br />

NFAT5 1 1 1 3 nuclear factor of activated T-cells 5, tonicity-responsive<br />

N4BP1 1 1 1 3 NEDD4 b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 1<br />

MYO10 1 1 1 3 myos<strong>in</strong> X<br />

KPNA1 1 1 1 3 karyopher<strong>in</strong> alpha 1 (import<strong>in</strong> alpha 5)<br />

KIAA1274 1 1 1 3 KIAA1274<br />

JARID2 1 1 1 3 jumonji, AT rich <strong>in</strong>teractive doma<strong>in</strong> 2<br />

LRRC59 1 1 1 3 leuc<strong>in</strong>e rich repeat conta<strong>in</strong><strong>in</strong>g 59<br />

ZIC3 1 1 1 3 Zic family member 3 (odd-paired homolog, Drosophila)<br />

KPNA4 1 1 1 3 karyopher<strong>in</strong> alpha 4 (import<strong>in</strong> alpha 3)<br />

SLA 1 1 1 3 Src-like-adaptor<br />

SKV 1 1 1 3 v-ski sarcoma viral oncogene homolog (avian)<br />

RNF123 1 1 1 3 r<strong>in</strong>g f<strong>in</strong>ger prote<strong>in</strong> 123<br />

ZNF652 1 1 1 3 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 652<br />

Nova1 1 1 1 3 neuro-oncological ventral antigen 1<br />

FGF7 1 1 1 3 hypothetical LOC100132771; fibroblast growth factor 7<br />

CEBPB 1 1 1 3 CCAAT/enhancer b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (C/EBP), beta<br />

Myo1d 1 1 1 3 myos<strong>in</strong> ID<br />

ZNF703 1 1 1 3 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 703<br />

kcip-1 1 1 1 3<br />

40<br />

tyros<strong>in</strong>e 3-monooxygenase/tryptophan 5-monooxygenase<br />

activation prote<strong>in</strong><br />

p53DINP1 1 1 1 3 tumor prote<strong>in</strong> p53 <strong>in</strong>ducible nuclear prote<strong>in</strong> 1<br />

LRP1B 1 1 1 3<br />

low density lipoprote<strong>in</strong>-related prote<strong>in</strong> 1B (deleted <strong>in</strong><br />

tumors)<br />

C1QL2 1 1 1 3 complement component 1, q subcomponent-like 2<br />

ARL5B 1 1 1 3 ADP-ribosylation factor-like 5B<br />

AICDA 1 1 1 3 activation-<strong>in</strong>duced cytid<strong>in</strong>e deam<strong>in</strong>ase<br />

ADD3 1 1 1 3 adduc<strong>in</strong> 3 (gamma)<br />

BOC 1 1 1 3 Boc homolog (mouse)<br />

BCAT1 1 1 1 3 branched cha<strong>in</strong> am<strong>in</strong>otransferase 1, cytosolic<br />

ASTN2 1 1 1 3 astrotact<strong>in</strong> 2<br />

c-myb 1 1 1 3 v-myb myeloblastosis viral oncogene homolog (avian)<br />

ZNF198 1 1 1 3 z<strong>in</strong>c f<strong>in</strong>ger, MYM-type 2<br />

CSNK1G2 1 1 1 3 case<strong>in</strong> k<strong>in</strong>ase 1, gamma 2<br />

TLE4 1 1 1 3<br />

transduc<strong>in</strong>-like enhancer of split 4 (E(sp1) homolog,<br />

Drosophila)<br />

EHD1 1 1 1 3 EH-doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

Olfml3 1 1 1 3 olfactomed<strong>in</strong>-like 3<br />

PSKH1 1 1 1 3 prote<strong>in</strong> ser<strong>in</strong>e k<strong>in</strong>ase H1<br />

USP48 1 1 1 3 ubiquit<strong>in</strong> specific peptidase 48


SOX1 1 1 1 3 SRY (sex determ<strong>in</strong><strong>in</strong>g region Y)-box 1<br />

TOMM20 1 1 1 3 similar to translocase of outer mitochondrial membrane<br />

WIT1 1 1 1 3 Wilms tumor upstream neighbor 1<br />

UPP2 1 1 1 3 urid<strong>in</strong>e phosphorylase 2<br />

SOCS1 1 1 1 3 suppressor of cytok<strong>in</strong>e signal<strong>in</strong>g 1<br />

SPI1 1 1 1 3 spleen focus form<strong>in</strong>g virus (SFFV)<br />

TSGA14 1 1 1 3 testis specific, 14<br />

SDCBP 1 1 1 3 syndecan b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (synten<strong>in</strong>)<br />

WWC1 1 1 1 3 WW <strong>and</strong> C2 doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

TRIM2 1 1 1 3 tripartite motif-conta<strong>in</strong><strong>in</strong>g 2<br />

SUFU 1 1 1 3 suppressor of fused homolog (Drosophila)<br />

SMARCA4 1 1 2 SWI/SNF related, matrix associated<br />

AKAP10 1 1 2 A k<strong>in</strong>ase (PRKA) anchor prote<strong>in</strong> 10<br />

Itk 1 1 2 IL2-<strong>in</strong>ducible T-cell k<strong>in</strong>ase<br />

VAV3 1 1 2 vav 3 guan<strong>in</strong>e nucleotide exchange factor<br />

SMNDC1 1 1 2 survival motor neuron doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

RCN2 1 1 2 reticulocalb<strong>in</strong> 2, EF-h<strong>and</strong> calcium b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong><br />

RREB1 1 1 2 ras responsive element b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 1<br />

SPIN3 1 1 2 sp<strong>in</strong>dl<strong>in</strong> family, member 3<br />

JMJD1A 1 1 2 lys<strong>in</strong>e (K)-specific demethylase 3A<br />

CSNK1A1 1 1 2 case<strong>in</strong> k<strong>in</strong>ase 1, alpha 1<br />

SOX10 1 1 2 SRY (sex determ<strong>in</strong><strong>in</strong>g region Y)-box 10<br />

SOS1 1 1 2 son of sevenless homolog 1 (Drosophila)<br />

SMAD1 1 1 2 SMAD family member 1<br />

ATP2B1 1 1 2 ATPase, Ca++ transport<strong>in</strong>g, plasma membrane 1<br />

ANTXR2 1 1 2 anthrax tox<strong>in</strong> receptor 2<br />

SMAD2 1 1 2 SMAD family member 2<br />

SLC39A10 1 1 2 solute carrier family 39 (z<strong>in</strong>c transporter), member 10<br />

BCL11A 1 1 2 B-cell CLL/lymphoma 11A (z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong>)<br />

ZNF642 1 1 2 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 642<br />

BAG5 1 1 2 BCL2-associated athanogene 5<br />

TSPAN14 1 1 2 tetraspan<strong>in</strong> 14<br />

ZNF644 1 1 2 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 644<br />

BRD1 1 1 2 bromodoma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

SOCS6 1 1 2 suppressor of cytok<strong>in</strong>e signal<strong>in</strong>g 6<br />

TRPS1 1 1 2 trichorh<strong>in</strong>ophalangeal syndrome I<br />

ABHD2 1 1 2 abhydrolase doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

ACTA1 1 1 2 act<strong>in</strong>, alpha 1, skeletal muscle<br />

41


RRP15 1 1 2 ribosomal RNA process<strong>in</strong>g 15 homolog (S. cerevisiae)<br />

MLCK 1 1 2 myos<strong>in</strong> light cha<strong>in</strong> k<strong>in</strong>ase<br />

KBTBD2 1 1 2 kelch repeat <strong>and</strong> BTB (POZ) doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

FLJ90013 1 1 2 transmembrane anterior posterior transformation 1<br />

BSN2 1 1 2 basonucl<strong>in</strong> 2<br />

SP3 1 1 2 Sp3 transcription factor<br />

PSIP1 1 1 2 PC4 <strong>and</strong> SFRS1 <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 1<br />

WDFY3 1 1 2 WD repeat <strong>and</strong> FYVE doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 3<br />

INPP5D 1 1 2 <strong>in</strong>ositol polyphosphate-5-phosphatase, 145kDa<br />

TYRP1 1 1 2 tyros<strong>in</strong>ase-related prote<strong>in</strong> 1<br />

ARID2 1 1 2 AT rich <strong>in</strong>teractive doma<strong>in</strong> 2 (ARID, RFX-like)<br />

ZFYVE14 1 1 2 ankyr<strong>in</strong> repeat <strong>and</strong> FYVE doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

PELI1 1 1 2 pell<strong>in</strong>o homolog 1 (Drosophila)<br />

WDR45 1 1 2 WD repeat doma<strong>in</strong> 45<br />

ZNF238 1 1 2 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 238<br />

LCORL 1 1 2 lig<strong>and</strong> dependent nuclear receptor corepressor-like<br />

SP1 1 1 2 Sp1 transcription factor<br />

NR2F2 1 1 2 nuclear receptor subfamily 2, group F, member 2<br />

Mon1a 1 1 2 MON1 homolog A (yeast)<br />

RAB1A 1 1 2 RAB1A, member RAS oncogene family<br />

cab39 1 1 2 calcium b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 39<br />

TMEM178 1 1 2 transmembrane prote<strong>in</strong> 178<br />

TFDP2 1 1 2 transcription factor Dp-2 (E2F dimerization partner 2)<br />

SSH2 1 1 2 sl<strong>in</strong>gshot homolog 2 (Drosophila)<br />

NDFIP1 1 1 2 Nedd4 family <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 1<br />

EHF 1 1 2 ets homologous factor<br />

STRN3 1 1 2 striat<strong>in</strong>, calmodul<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 3<br />

DNCI1 1 1 2 dyne<strong>in</strong>, cytoplasmic 1, <strong>in</strong>termediate cha<strong>in</strong> 1<br />

AHCYL2 1 1 2 adenosylhomocyste<strong>in</strong>ase-like 2<br />

TRIM32 1 1 2 tripartite motif-conta<strong>in</strong><strong>in</strong>g 32<br />

H3.3B 1 1 2 H3 histone, family 3B (H3.3B);<br />

MEIS1 1 1 2 Meis homeobox 1<br />

KCNN3 1 1 2 potassium <strong>in</strong>termediate/small conductance channel<br />

LOC389458 1 1 2<br />

42<br />

hypothetical LOC389458; RB-associated KRAB z<strong>in</strong>c<br />

f<strong>in</strong>ger<br />

RBMS3 1 1 2 RNA b<strong>in</strong>d<strong>in</strong>g motif, s<strong>in</strong>gle str<strong>and</strong>ed <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong><br />

HIF1A 1 1 2 hypoxia <strong>in</strong>ducible factor 1, alpha subunit<br />

RNF146 1 1 2 r<strong>in</strong>g f<strong>in</strong>ger prote<strong>in</strong> 146<br />

IRF2BP2 1 1 2 <strong>in</strong>terferon regulatory factor 2 b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 2


HIVEP2 1 1 2<br />

43<br />

human immunodeficiency virus type I enhancer b<strong>in</strong>d<strong>in</strong>g<br />

prote<strong>in</strong> 2<br />

HNRPA3 1 1 2 heterogeneous nuclear ribonucleoprote<strong>in</strong> A3<br />

KCNA1 1 1 2 potassium voltage-gated channel<br />

KIAA1267 1 1 2 KIAA1267<br />

RICTOR 1 1 2 RPTOR <strong>in</strong>dependent companion of MTOR,<br />

JHDM1D 1 1 2 jumonji C doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g histone demethylase 1<br />

SGCB 1 1 2 sarcoglycan, beta<br />

GPR85 1 1 2 G prote<strong>in</strong>-coupled receptor 85<br />

GTF2A1L 1 1 2 ston<strong>in</strong> 1<br />

GDF6 1 1 2 growth differentiation factor 6<br />

GOLPH3L 1 1 2 golgi phosphoprote<strong>in</strong> 3-like<br />

RSPO2 1 1 2 R-spond<strong>in</strong> 2 homolog (Xenopus laevis)<br />

HERC4 1 1 2 hect doma<strong>in</strong> <strong>and</strong> RLD 4<br />

H3F3B 1 1 2 H3 histone, family 3B (H3.3B)<br />

HBP1 1 1 2 HMG-box transcription factor 1<br />

MECP2 1 1 2 methyl CpG b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 2 (Rett syndrome)<br />

PKN2 1 1 2 prote<strong>in</strong> k<strong>in</strong>ase N2<br />

NAV3 1 1 2 neuron navigator 3; similar to neuron navigator 3<br />

PLEKHK1 1 1 2 rhotek<strong>in</strong> 2<br />

PLAG1 1 1 2 pleiomorphic adenoma gene 1<br />

PEA15 1 1 2 phosphoprote<strong>in</strong> enriched <strong>in</strong> astrocytes 15<br />

PHF17 1 1 2 PHD f<strong>in</strong>ger prote<strong>in</strong> 17<br />

PKIA 1 1 2 prote<strong>in</strong> k<strong>in</strong>ase (cAMP-dependent, catalytic) <strong>in</strong>hibitor alpha<br />

PICALM 1 1 2 phosphatidyl<strong>in</strong>ositol b<strong>in</strong>d<strong>in</strong>g clathr<strong>in</strong> assembly prote<strong>in</strong><br />

REPS2 1 1 2 RALBP1 associated Eps doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

RC3H2 1 1 2 r<strong>in</strong>g f<strong>in</strong>ger <strong>and</strong> CCCH-type z<strong>in</strong>c f<strong>in</strong>ger doma<strong>in</strong>s 2<br />

RBM47 1 1 2 RNA b<strong>in</strong>d<strong>in</strong>g motif prote<strong>in</strong> 47<br />

KIAA1715 1 1 2 KIAA1715<br />

RCOR1 1 1 2 REST corepressor 1<br />

RAB34 1 1 2 RAB34, member RAS oncogene family<br />

ZBTB41 1 1 2 z<strong>in</strong>c f<strong>in</strong>ger <strong>and</strong> BTB doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 41<br />

LOC646270 1 1 2 elongation factor, RNA polymerase II, 2<br />

LOC646438 1 1 2 H3 histone, family 3B (H3.3B);<br />

CDC73 1 1 2 cell division cycle 73<br />

CHD7 1 1 2 chromodoma<strong>in</strong> helicase DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 7<br />

SF3B1 1 1 2 splic<strong>in</strong>g factor 3b, subunit 1, <strong>155</strong>kDa<br />

CBL 1 1 2 Cas-Br-M (mur<strong>in</strong>e) ecotropic retroviral seq<br />

COL21A1 1 1 2 collagen, type XXI, alpha 1


COL7A1 1 1 2 collagen, type VII, alpha 1<br />

CKAP5 1 1 2 cytoskeleton associated prote<strong>in</strong> 5<br />

CNTN4 1 1 2 contact<strong>in</strong> 4<br />

BCORL1 1 1 2 BCL6 co-repressor-like 1<br />

C10orf26 1 1 2 chromosome 10 open read<strong>in</strong>g frame 26<br />

C10orf46 1 1 2 chromosome 10 open read<strong>in</strong>g frame 46<br />

SLC12A6 1 1 2 solute carrier family 12<br />

C10orf12 1 1 2 chromosome 10 open read<strong>in</strong>g frame 12<br />

C5orf41 1 1 2 chromosome 5 open read<strong>in</strong>g frame 41<br />

C9orf5 1 1 2 chromosome 9 open read<strong>in</strong>g frame 5<br />

SIM2 1 1 2 s<strong>in</strong>gle-m<strong>in</strong>ded homolog 2 (Drosophila)<br />

SHOX 1 1 2 short stature homeobox<br />

FAM134C 1 1 2 family with sequence similarity 134, member C<br />

FBXO33 1 1 2 F-box prote<strong>in</strong> 33<br />

FLJ37543 1 1 2 hypothetical prote<strong>in</strong> FLJ37543<br />

FAM135A 1 1 2 family with sequence similarity 135, member A<br />

S100PBP 1 1 2 S100P b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong><br />

GABRA1 1 1 2 gamma-am<strong>in</strong>obutyric acid (GABA) A receptor, alpha 1<br />

GCN5L2 1 1 2 K(lys<strong>in</strong>e) acetyltransferase 2A<br />

FOS 1 1 2 v-fos FBJ mur<strong>in</strong>e osteosarcoma viral oncogene homolog<br />

FZD5 1 1 2 frizzled homolog 5 (Drosophila)<br />

COPS3 1 1 2 COP9 constitutive photomorphogenic homolog subunit 3<br />

DNAJB1 1 1 2 DnaJ (Hsp40) homolog, subfamily B, member 1<br />

SCG2 1 1 2 secretogran<strong>in</strong> II (chromogran<strong>in</strong> C)<br />

SEC14L5 1 1 2 SEC14-like 5 (S. cerevisiae)<br />

DET1 1 1 2 de-etiolated homolog 1 (Arabidopsis)<br />

SATB1 1 1 2 SATB homeobox 1<br />

SALL1 1 1 2 sal-like 1 (Drosophila)<br />

E2F2 1 1 2 E2F transcription factor 2<br />

EDG1 1 1 2 sph<strong>in</strong>gos<strong>in</strong>e-1-phosphate receptor 1<br />

44


Appendix 2:<br />

ID Gene Name<br />

HMGCS1 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (soluble)<br />

AAK1 AP2 associated k<strong>in</strong>ase 1<br />

ATP6V1C1 ATPase, H+ transport<strong>in</strong>g, lysosomal 42kDa, V1 subunit C1<br />

CAP2 CAP, adenylate cyclase-associated prote<strong>in</strong>, 2 (yeast)<br />

CNOT1 CCR4-NOT transcription complex, subunit 1<br />

CDC42BPA CDC42 b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> k<strong>in</strong>ase alpha (DMPK-like)<br />

COMMD2 COMM doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

F11R F11 receptor<br />

H3F3A H3 histone, family 3B (H3.3B); H3 histone, family 3A pseudogene; H3 histone, family 3A; similar to H3 histone, family 3B; simi<br />

HBS1L HBS1-like (S. cerevisiae)<br />

KIAA1671 KIAA1671 prote<strong>in</strong><br />

LASS6 LAG1 homolog, ceramide synthase 6<br />

LRBA LPS-responsive vesicle traffick<strong>in</strong>g, beach <strong>and</strong> anchor conta<strong>in</strong><strong>in</strong>g<br />

MLF1IP MLF1 <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong><br />

PERP PERP, TP53 apoptosis effector<br />

ARHGAP5 Rho GTPase activat<strong>in</strong>g prote<strong>in</strong> 5<br />

SMAD2 SMAD family member 2<br />

SMAD6 SMAD family member 6<br />

SMEK2 SMEK homolog 2, suppressor of mek1 (Dictyostelium)<br />

ST6GALNAC2ST6 (alpha-N-acetyl-neuram<strong>in</strong>yl-2,3-beta-galactosyl-1,3)-N-acetylgalactosam<strong>in</strong>ide alpha-2,6-sialyltransferase 2<br />

TAF9B TAF9B RNA polymerase II, TATA box b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (TBP)-associated factor, 31kDa<br />

WEE1 WEE1 homolog (S. pombe)<br />

XIAP X-l<strong>in</strong>ked <strong>in</strong>hibitor of apoptosis<br />

ACOX1 acyl-Coenzyme A oxidase 1, palmitoyl<br />

ANLN anill<strong>in</strong>, act<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong><br />

ANXA2P2 annex<strong>in</strong> A2 pseudogene 2<br />

ANXA2P1, ANXA2 annex<strong>in</strong> A2 pseudogene 3; annex<strong>in</strong> A2; annex<strong>in</strong> A2 pseudogene 1<br />

ATL3 atlast<strong>in</strong> GTPase 3<br />

CREBL2 cAMP responsive element b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>-like 2<br />

CDH1 cadher<strong>in</strong> 1, type 1, E-cadher<strong>in</strong> (epithelial)<br />

CHP calcium b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> P22<br />

CREG1 cellular repressor of E1A-stimulated genes 1<br />

CENPF centromere prote<strong>in</strong> F, 350/400ka (mitos<strong>in</strong>)<br />

CBX5 chromobox homolog 5 (HP1 alpha homolog, Drosophila)<br />

C18orf10 chromosome 18 open read<strong>in</strong>g frame 10<br />

CSDE1 cold shock doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g E1, RNA-b<strong>in</strong>d<strong>in</strong>g<br />

COL12A1 collagen, type XII, alpha 1<br />

CBFB core-b<strong>in</strong>d<strong>in</strong>g factor, beta subunit<br />

DLGAP5 discs, large (Drosophila) homolog-associated prote<strong>in</strong> 5<br />

ENAH enabled homolog (Drosophila)<br />

ERMP1 endoplasmic reticulum metallopeptidase 1<br />

EGFR epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)<br />

45


ANKRD36B similar to KIAA1641; similar to ankyr<strong>in</strong> repeat doma<strong>in</strong> 26; ankyr<strong>in</strong> repeat doma<strong>in</strong> 36B<br />

SNRNP200 similar to U5 snRNP-specific prote<strong>in</strong>, 200 kDa; small nuclear ribonucleoprote<strong>in</strong> 200kDa (U5)<br />

HMGB3 similar to high mobility group box 3; high-mobility group box 3<br />

PRKDC similar to prote<strong>in</strong> k<strong>in</strong>ase, DNA-activated, catalytic polypeptide; prote<strong>in</strong> k<strong>in</strong>ase, DNA-activated, catalytic polypeptide<br />

TOMM20 similar to translocase of outer mitochondrial membrane 20 homolog; similar to mitochondrial outer membrane prote<strong>in</strong> 19; trans<br />

SLC35B4 solute carrier family 35, member B4<br />

SLC9A6 solute carrier family 9 (sodium/hydrogen exchanger), member 6<br />

FAM173B family with sequence similarity 173, member B<br />

SKA2 family with sequence similarity 33, member A; similar to Sp<strong>in</strong>dle <strong>and</strong> k<strong>in</strong>etochore-associated prote<strong>in</strong> 2<br />

GLTP glycolipid transfer prote<strong>in</strong>; glycolipid transfer prote<strong>in</strong> pseudogene 1<br />

GPC1 glypican 1<br />

GNAQ guan<strong>in</strong>e nucleotide b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (G prote<strong>in</strong>), q polypeptide<br />

HSPB1 heat shock 27kDa prote<strong>in</strong>-like 2 pseudogene; heat shock 27kDa prote<strong>in</strong> 1<br />

HELZ helicase with z<strong>in</strong>c f<strong>in</strong>ger<br />

HP1BP3 heterochromat<strong>in</strong> prote<strong>in</strong> 1, b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> 3<br />

HNRNPA3 heterogeneous nuclear ribonucleoprote<strong>in</strong> A3<br />

HIST1H1B histone cluster 1, H1b<br />

HIP1 hunt<strong>in</strong>gt<strong>in</strong> <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 1<br />

ID1 <strong>in</strong>hibitor of DNA b<strong>in</strong>d<strong>in</strong>g 1, dom<strong>in</strong>ant negative helix-loop-helix prote<strong>in</strong><br />

ID3 <strong>in</strong>hibitor of DNA b<strong>in</strong>d<strong>in</strong>g 3, dom<strong>in</strong>ant negative helix-loop-helix prote<strong>in</strong><br />

ITGAV <strong>in</strong>tegr<strong>in</strong>, alpha V (vitronect<strong>in</strong> receptor, alpha polypeptide, antigen CD51)<br />

IL13RA1 <strong>in</strong>terleuk<strong>in</strong> 13 receptor, alpha 1<br />

KPNA6 karyopher<strong>in</strong> alpha 6 (import<strong>in</strong> alpha 7)<br />

KRT17 kerat<strong>in</strong> 17; kerat<strong>in</strong> 17 pseudogene 3<br />

KTN1 k<strong>in</strong>ect<strong>in</strong> 1 (k<strong>in</strong>es<strong>in</strong> receptor)<br />

KREMEN1 kr<strong>in</strong>gle conta<strong>in</strong><strong>in</strong>g transmembrane prote<strong>in</strong> 1<br />

LNX2 lig<strong>and</strong> of numb-prote<strong>in</strong> X 2<br />

MANEA mannosidase, endo-alpha<br />

MID1 midl<strong>in</strong>e 1 (Opitz/BBB syndrome)<br />

MSN moes<strong>in</strong><br />

MYH9 myos<strong>in</strong>, heavy cha<strong>in</strong> 9, non-muscle<br />

MARCKS myristoylated alan<strong>in</strong>e-rich prote<strong>in</strong> k<strong>in</strong>ase C substrate<br />

NCAPD2 non-SMC condens<strong>in</strong> I complex, subunit D2<br />

PAK2 p21 prote<strong>in</strong> (Cdc42/Rac)-activated k<strong>in</strong>ase 2<br />

PPL periplak<strong>in</strong><br />

PICALM phosphatidyl<strong>in</strong>ositol b<strong>in</strong>d<strong>in</strong>g clathr<strong>in</strong> assembly prote<strong>in</strong><br />

PKP1 plakophil<strong>in</strong> 1 (ectodermal dysplasia/sk<strong>in</strong> fragility syndrome); similar to plakophil<strong>in</strong> 1 isoform 1a<br />

PABPC1 poly(A) b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, cytoplasmic pseudogene 5; poly(A) b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, cytoplasmic 1<br />

PMEPA1 prostate transmembrane prote<strong>in</strong>, <strong>and</strong>rogen <strong>in</strong>duced 1<br />

PCMTD2 prote<strong>in</strong>-L-isoaspartate (D-aspartate) O-methyltransferase doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

RIPK4 receptor-<strong>in</strong>teract<strong>in</strong>g ser<strong>in</strong>e-threon<strong>in</strong>e k<strong>in</strong>ase 4<br />

RPL5 ribosomal prote<strong>in</strong> L5 pseudogene 34; ribosomal prote<strong>in</strong> L5 pseudogene 1; ribosomal prote<strong>in</strong> L5<br />

RPLP0 ribosomal prote<strong>in</strong>, large, P0 pseudogene 2; ribosomal prote<strong>in</strong>, large, P0 pseudogene 3; ribosomal prote<strong>in</strong>, large, P0 pseudoge<br />

SEMA3C sema doma<strong>in</strong>, immunoglobul<strong>in</strong> doma<strong>in</strong> (Ig), short basic doma<strong>in</strong>, secreted, (semaphor<strong>in</strong>) 3C<br />

46


SYNE2 spectr<strong>in</strong> repeat conta<strong>in</strong><strong>in</strong>g, nuclear envelope 2<br />

SGPL1 sph<strong>in</strong>gos<strong>in</strong>e-1-phosphate lyase 1<br />

SKAP2 src k<strong>in</strong>ase associated phosphoprote<strong>in</strong> 2<br />

STON2 ston<strong>in</strong> 2<br />

SMC4 structural ma<strong>in</strong>tenance of chromosomes 4<br />

SNAP23 synaptosomal-associated prote<strong>in</strong>, 23kDa<br />

TNKS tankyrase, TRF1-<strong>in</strong>teract<strong>in</strong>g ankyr<strong>in</strong>-related ADP-ribose polymerase<br />

TLL1 tolloid-like 1<br />

TOP2A topoisomerase (DNA) II alpha 170kDa<br />

TOP2B topoisomerase (DNA) II beta 180kDa<br />

TOB2 transducer of ERBB2, 2<br />

TBL1XR1 transduc<strong>in</strong> (beta)-like 1 X-l<strong>in</strong>ked receptor 1<br />

TM7SF3 transmembrane 7 superfamily member 3<br />

TMEM14A transmembrane prote<strong>in</strong> 14A<br />

TMEM56 transmembrane prote<strong>in</strong> 56<br />

TWF1 tw<strong>in</strong>fil<strong>in</strong>, act<strong>in</strong>-b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, homolog 1 (Drosophila)<br />

UBE4A ubiquit<strong>in</strong>ation factor E4A (UFD2 homolog, yeast)<br />

ZFAT z<strong>in</strong>c f<strong>in</strong>ger <strong>and</strong> AT hook doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g<br />

ZBTB41 z<strong>in</strong>c f<strong>in</strong>ger <strong>and</strong> BTB doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 41<br />

47


Appendix 3:<br />

ID Gene Name<br />

HMGCS1 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (soluble)<br />

OXCT1 3-oxoacid CoA transferase 1<br />

AHNAK AHNAK nucleoprote<strong>in</strong><br />

AHNAK2 AHNAK nucleoprote<strong>in</strong> 2<br />

ATP6V1D ATPase, H+ transport<strong>in</strong>g, lysosomal 34kDa, V1 subunit D<br />

AGAP1 ArfGAP with GTPase doma<strong>in</strong>, ankyr<strong>in</strong> repeat <strong>and</strong> PH doma<strong>in</strong> 1<br />

DIP2A DIP2 disco-<strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 2 homolog A (Drosophila)<br />

ELAVL2 ELAV (embryonic lethal, abnormal vision, Drosophila)-like 2 (Hu antigen B)<br />

F11R F11 receptor<br />

GPSM2 G-prote<strong>in</strong> signal<strong>in</strong>g modulator 2 (AGS3-like, C. elegans)<br />

IQGAP1 IQ motif conta<strong>in</strong><strong>in</strong>g GTPase activat<strong>in</strong>g prote<strong>in</strong> 1<br />

KIAA1671 KIAA1671 prote<strong>in</strong><br />

KLF7 Kruppel-like factor 7 (ubiquitous)<br />

LFNG LFNG O-fucosylpeptide 3-beta-N-acetylglucosam<strong>in</strong>yltransferase<br />

MLF1IP MLF1 <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong><br />

NDRG1 N-myc downstream regulated 1<br />

WEE1 WEE1 homolog (S. pombe)<br />

XIAP X-l<strong>in</strong>ked <strong>in</strong>hibitor of apoptosis<br />

ANXA2P1, ANXA2 annex<strong>in</strong> A2 pseudogene 3; annex<strong>in</strong> A2; annex<strong>in</strong> A2 pseudogene 1<br />

CDH1 cadher<strong>in</strong> 1, type 1, E-cadher<strong>in</strong> (epithelial)<br />

C18orf10 chromosome 18 open read<strong>in</strong>g frame 10<br />

PSAT1 chromosome 8 open read<strong>in</strong>g frame 62; phosphoser<strong>in</strong>e am<strong>in</strong>otransferase 1<br />

CIT citron (rho-<strong>in</strong>teract<strong>in</strong>g, ser<strong>in</strong>e/threon<strong>in</strong>e k<strong>in</strong>ase 21)<br />

CLOCK clock homolog (mouse)<br />

COL12A1 collagen, type XII, alpha 1<br />

DLG1 discs, large homolog 1 (Drosophila)<br />

DPY19L1 dpy-19-like 1 (C. elegans); similar to hCG1645499<br />

ENAH enabled homolog (Drosophila)<br />

EGFR epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)<br />

FAM173B family with sequence similarity 173, member B<br />

GGCX gamma-glutamyl carboxylase<br />

GLTP glycolipid transfer prote<strong>in</strong>; glycolipid transfer prote<strong>in</strong> pseudogene 1<br />

GTDC1 glycosyltransferase-like doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 1<br />

GNG12 guan<strong>in</strong>e nucleotide b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (G prote<strong>in</strong>), gamma 12<br />

HDLBP high density lipoprote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong><br />

48


HIP1 hunt<strong>in</strong>gt<strong>in</strong> <strong>in</strong>teract<strong>in</strong>g prote<strong>in</strong> 1<br />

KRT17 kerat<strong>in</strong> 17; kerat<strong>in</strong> 17 pseudogene 3<br />

MAN1A2 mannosidase, alpha, class 1A, member 2<br />

MBOAT2 membrane bound O-acyltransferase doma<strong>in</strong> conta<strong>in</strong><strong>in</strong>g 2<br />

MMGT1 membrane magnesium transporter 1<br />

MSN moes<strong>in</strong><br />

MYO5A myos<strong>in</strong> VA (heavy cha<strong>in</strong> 12, myox<strong>in</strong>)<br />

MYH9 myos<strong>in</strong>, heavy cha<strong>in</strong> 9, non-muscle<br />

PAK2 p21 prote<strong>in</strong> (Cdc42/Rac)-activated k<strong>in</strong>ase 2<br />

PALLD pallad<strong>in</strong>, cytoskeletal associated prote<strong>in</strong><br />

PPL periplak<strong>in</strong><br />

PLD1 phospholipase D1, phosphatidylchol<strong>in</strong>e-specific<br />

PKP1 plakophil<strong>in</strong> 1 (ectodermal dysplasia/sk<strong>in</strong> fragility syndrome); similar to plakophil<strong>in</strong> 1 isoform 1a<br />

PABPC1 poly(A) b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, cytoplasmic pseudogene 5; poly(A) b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, cytoplasmic 1<br />

PRKCA prote<strong>in</strong> k<strong>in</strong>ase C, alpha<br />

PTPN11 prote<strong>in</strong> tyros<strong>in</strong>e phosphatase, non-receptor type 11; similar to prote<strong>in</strong> tyros<strong>in</strong>e phosphatase, non-receptor<br />

PTPRK prote<strong>in</strong> tyros<strong>in</strong>e phosphatase, receptor type, K<br />

PHTF2 putative homeodoma<strong>in</strong> transcription factor 2<br />

RIPK4 receptor-<strong>in</strong>teract<strong>in</strong>g ser<strong>in</strong>e-threon<strong>in</strong>e k<strong>in</strong>ase 4<br />

KDM5B similar to Jumonji, AT rich <strong>in</strong>teractive doma<strong>in</strong> 1B (RBP2-like); lys<strong>in</strong>e (K)-specific demethylase 5B<br />

PRKDC similar to prote<strong>in</strong> k<strong>in</strong>ase, DNA-activated, catalytic polypeptide; prote<strong>in</strong> k<strong>in</strong>ase, DNA-activated, catalytic p<br />

SLC39A9 solute carrier family 39 (z<strong>in</strong>c transporter), member 9<br />

SPTBN1 spectr<strong>in</strong>, beta, non-erythrocytic 1<br />

SGPL1 sph<strong>in</strong>gos<strong>in</strong>e-1-phosphate lyase 1<br />

SREBF2 sterol regulatory element b<strong>in</strong>d<strong>in</strong>g transcription factor 2<br />

SVIL supervill<strong>in</strong><br />

TBL1XR1 transduc<strong>in</strong> (beta)-like 1 X-l<strong>in</strong>ked receptor 1<br />

TGFBI transform<strong>in</strong>g growth factor, beta-<strong>in</strong>duced, 68kDa<br />

TNRC6B tr<strong>in</strong>ucleotide repeat conta<strong>in</strong><strong>in</strong>g 6B<br />

TUFT1 tuftel<strong>in</strong> 1<br />

TWF1 tw<strong>in</strong>fil<strong>in</strong>, act<strong>in</strong>-b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, homolog 1 (Drosophila)<br />

VPS13B vacuolar prote<strong>in</strong> sort<strong>in</strong>g 13 homolog B (yeast)<br />

ZNF185 z<strong>in</strong>c f<strong>in</strong>ger prote<strong>in</strong> 185 (LIM doma<strong>in</strong>)<br />

49


TRITA-CSC-E 2010:164<br />

ISRN-<strong>KTH</strong>/CSC/E--10/164-SE<br />

ISSN-1653-5715<br />

www.kth.se

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!