12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

182 Ho et al.In the special case of two genes, the class-specific, standardized entropy is afunction of the class-specific correlation coefficient ρ k⎡1 E k=− + ρ ⎛1 ⎜+ ρ ⎞(11)⎝ ⎠⎟+ 1 − ρ ⎛1⎜− ρ ⎞⎤k k k k⎢ log2 log2⎟⎥⎣ 2 2 2 ⎝ 2 ⎠⎦Although there is no simple closed form expression for entropy for larger setsof genes, eigenvalues can be calculated efficiently and methods for doing so areimplemented in many computational programs.To assess significance, classlabels are repeatedly permuted and the entropyscore is recalculated over all gene sets under consideration. Each permutationgives a distribution of null scores, which are averaged to produce a stable referencedistribution. As usual, the p-value is the proportion of null scores that exceedthe observed value. There is one caveat concerning the calculation of the pooledcorrelation value. If the class-specific sample sizes are very different, the largerone may dominate the pool. In that case, one might weigh by sample size whencalculating the pooled correlation to equalize the influence of the two classes.3.2. Simulation-Based Evaluation of MethodsTo compare methods, data were simulated from each of the archetypical twoclassexamples, cross and shift, performance of each method was evaluating bycalculating power. In the two-class case, LA is equivalent to the S crossand so thetwo methods coincide in these simulations. The data was simulated from normaldistributions, with a sample size of 50 for each of the two classes. Type I errorwas set to α=0.05 throughout. Null distributions for all methods were obtainedby recalculating scores after permuting class labels. The power was computed asthe frequency of simulated data sets with a test statistic more than the 95-th quantileof the null distribution.To simulate shift patterns samples were drawn from class-specific bivariatenormal distributions. Class 1 was drawn from a N(µ 1= d, µ 2= d, σ 1= 1, σ 2= 1,ρ=ρ 0) distribution and class 2 was drawn from N(µ 1=−d, µ 2=−d, σ 1= 1, σ 2= 1,ρ = ρ 0), where d is allowed to vary. Thus, the expression levels for both genesare increased in one class and decreased in the other, whereas correlation for thetwo classes remains identical.Figure 4 demonstrates the power of the three methods to detect shift patterns.Power is shown as a function of the shift d between the distributions of the twoclasses (in the x-axis) and the correlation of the class-conditional distributions(by panel). The largest ECF-statistic is consistently among the most powerful.S shiftmatches its power at low correlations, whereas the entropy score matches itat higher correlations. For all methods, power increases with both the shift andthe class-conditional correlation, with exception of combinations of low correlationand large shifts, a situation in which increasing the shift will decrease the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!