13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

HAPLOTYPE EXPRESSION ASSOCIATIONS 91nificance <strong>of</strong> the result in the face <strong>of</strong> multiple testing <strong>of</strong>many markers. For this purpose, the omnibus statistic isrecalculated for up to 6,250 randomly permuted data setsrepresenting the null hypothesis <strong>of</strong> no true association.<strong>The</strong> permutation was done in such a way that the expressionvalues were randomly redistributed among the subjects,whereas the genetic assignments were left alone.<strong>The</strong> fraction <strong>of</strong> these instances <strong>of</strong> the null hypothesis thatresult in a more extreme omnibus statistic is a direct measurefor the probability <strong>of</strong> the observed statistic havingarisen by chance, and is called the p value. More specifically,if N random permutations are performed, and M <strong>of</strong>the permutations yield an omnibus statistic more extremethan that <strong>of</strong> the original data, the probability <strong>of</strong> the nullhypothesis is p = M/N. This procedure is nonparametric,meaning it makes no assumptions about the distribution<strong>of</strong> the expression values. In addition, because <strong>of</strong> the use<strong>of</strong> the omnibus statistic, the testing <strong>of</strong> multiple markers isfully accounted for. In fact, most other methods used toadjust for multiple comparisons would result in a largeloss <strong>of</strong> sensitivity, since there is no way to accurately accountfor the high degree <strong>of</strong> correlation between themarkers, many <strong>of</strong> which differ only slightly from eachother. For efficiency, the permutation testing was done onsuccessively larger sets <strong>of</strong> permutations N = (50, 250,1250, 6250), and the next higher set <strong>of</strong> permutations wasskipped when the number <strong>of</strong> false positives M was 10 ormore in the previous set.After the full association analysis was run, we furtherfiltered the results based on the quality <strong>of</strong> the expressiondata. Each expression level for each sample is associatedwith both a quantitative expression value and binary present/absentlabel. This label is calculated by a statisticalalgorithm in the MAS 5.0 s<strong>of</strong>tware, and a present labelmeans that there is a probability <strong>of</strong> less than 5% that thereis no signal and what is measured is only noise. For mostanalyses, we excluded fragments for which there werefewer than 10 present calls out <strong>of</strong> 89 total. Additionally,for some analyses, we excluded fragments for which themaximum fold change was less than two. This is calculatedby taking the maximum and minimum expressionvalues for the expression level in the subset <strong>of</strong> samplewith “present” calls and taking the ratio. In addition, weexcluded in the analysis the genes that showed more thantw<strong>of</strong>old <strong>of</strong> mRNA expression levels within the three samplesfrom the same individual.RESULTSAssociation analysis was carried out examining 6,184gene loci by 22,242 expression levels. At a significancelevel <strong>of</strong> p ≤ 0.001, we found 66,141 significant associationsbetween gene loci and expression levels. Becausethis is such a large and rich data set, we have concentratedour analysis on two interesting subsets <strong>of</strong> associations.<strong>The</strong> first <strong>of</strong> these analyses concentrates on a set <strong>of</strong> cis-actingSNPs; i.e., SNPs that affect the expression levels <strong>of</strong>the genes they reside in. <strong>The</strong> second example examines agene coding for a largely uncharacterized zinc finger protein(ZNF295), which likely forms part <strong>of</strong> a transcriptionfactor complex. Two amino acid-changing SNPs in thislocus are strongly associated with expression levels in alarge number <strong>of</strong> other genes. Finally, we examine theglobal statistics <strong>of</strong> the data set and attempt to compare ourresults to some reported previously.cis-Acting Genetic VariationsNot all <strong>of</strong> the gene loci for which we have haplotypedata are represented by expression probes, and, viceversa, not all <strong>of</strong> the genes represented by expressionprobes have known haplotypes associated with them. Inour experiment, there were 4,762 gene loci and 7,553 expressionprobes where each <strong>of</strong> the loci belonged to thesame gene as one or more <strong>of</strong> the probes. We identified 22genes from our initial screen that have candidate cis-associationssignificant at a level <strong>of</strong> p

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!