11.07.2015 Views

Research Report 2010 - MDC

Research Report 2010 - MDC

Research Report 2010 - MDC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Miguel AndradeStructure of the GroupGroup LeaderDr. Miguel AndradeScientistsDr. Enrique MuroDr. Jean-Fred FontaineDr. Adriano BarbosaDr. Nancy MahGraduate StudentMartin SchaeferTechnical AssistantMatthew HuskaSecretariatSylvia OlbrichTimkehet Teffera**part of the period reportedComputational Biology and Data MiningOur group focuses on the development and application of computational methods that areused to research the molecular and genetic components of human disease. Often, we workwith data from high throughput gene expression, proteomics, and protein-protein interactionexperiments performed at a genomic level. Our primary research tools, besides hardware andsoftware, are the increasing collection of public repositories of biological information such asmolecular sequence and structure databases, literature databases like MEDLINE, and otherresources related to human disease such as the OMIM database. The results of our work aredistributed as software or online web services.Prediction of transcript 3’UTR endsMethodologies for the prediction of gene transcriptsfrom genomic and expression data are still under development,and the increase in the amount of transcriptdata in public databases offers a chance to improvesuch methods. We observed that the databases ofexpressed sequence tags (ESTs) contain abundant evidenceof alternative 3’UTR ends that are currentlyabsent from the public database records for manygenes, or invalidate many transcript ends included indatabases like RefSeq or Ensembl. We proposed andexperimentally verified a method to predict transcriptends using EST data and analysis of poly-adenylationsignals (Muro et al., 2008). The results of the analysis ofthe complete human and murine genomes are availableas data tables and through the TranscriptomeSailor web tool [http://www.ogic.ca/ts/], which allowsexamination of particular genomic regions for predictionsand evidence.Analysis of gene expression instem cell differentiationGene expression repositories generated for a particularobjective within a single laboratory overcome some ofthe problems that pervade such data such as platformheterogeneity and poor sample quality. The Stem CellGenomics Project (http://www.ottawagenomecenter.ca/projects/stemcellgenomics) used this approach toproduce a resource of gene expression experimentsthat follow stem cell differentiation. In this context, weproduced a database of stem cell microarray data[StemBase; http://www.stembase.ca/] describing geneexpression (from cDNA microarrays and experiments ofserial analysis of gene expression) in more than 200samples of stem cells and their derivatives in mouseand human (Sandie et al., 2009). We illustrated how touse these data to study stem cell biology at three levels:cells, genetic networks, and particular genes. The databaseprovides a variety of querying mechanisms for differenttasks including the detection of gene markers[http://www.ogic.ca/projects/markerserver/] and theuse of the UCSC Genome Browser to display the data ingenomic context.Identification of protein repeatsSince 1995, when we found the first homology of theHuntington’s disease protein (huntingtin) to other proteinsequence to be a repeat of around 40 amino acids(HEAT repeat; Andrade and Bork, 1995), few advanceshave been made in the characterization of theserepeats in huntingtin or in other proteins. In the case ofhuntingtin, this enormously hampers the elucidation ofits normal function and of the mechanisms that triggerHuntington’s chorea. To approach this problem, wedeveloped a neural network [ARD; http://www.ogic.ca/218 Technology Platforms

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!