12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

188 Ho et al.this section were not included in the simulation-based analysis. For notationin this section, consider a candidate set of G genes and two phenotypic groupswith n 1and n 2samples, respectively. Then u i, i = 1,..., n 1and v i, i = 1,..., n 2are G dimensional vectors representing sample intensities measured overgenes in the set.In a recent article, Xiao and colleagues (4) considered multivariate searchesfor differentially expressed gene combinations. Their algorithm is built on apreviously proposed multivariate test statistic (16) and successive selection ofdifferentially expressed sets of genes (17). Their goal is to uncover subsets ofpredefined size G such that the multivariate distributions of expression in thetwo phenotypes differ. To score candidate gene sets users need to choose a kernelfunction F(u, v) and calculaten1n2n n1 1S = ∑∑Fu( , υ ′ 21n n) − Fu ( , u)i i 2 ∑∑−i i′2nnnn1 2 i=1 i′=11i= 1 i′=12 i=1 i′=1wherein the sums are taken over all pairs of samples in each class. Distancefunctions are classical choices for F( u, υ), the authors use the Euclidean distancefunction throughout. With that choice, the score S can be described asaverage between-group distance minus average within-group distance.The search starts with an arbitrary set of G genes, which are then exchanged,one at a time, at random, with candidate genes from outside the set. Exchangesthat do not improve the score are discarded, whereas if an exchange improvesthe score, the set is modified accordingly and the search continues for a setnumber of steps, or until predefined criteria is met. Cross-validation is used tostabilize the results of the search procedure. A permutation test is used to evaluatesignificance, and a multiple-testing procedure is developed to control familywiseerror rate when selecting combinations of genes. The approach uncoverssets that potentially consist of combinations of jointly and marginally differentiallyexpressed genes. Kostka and Spang (10) took a different approach to the basicproblem. The goal of their methodology was to identify sets of genes, which arenormally tightly coregulated, but which disregulate in a diseased state. Themeasure of coregulation of a gene set G, within class k, denoted S(G,k), waspreviously suggested by Cheng and Church (18). It is calculated as the meansquared residual obtained over values of g in 1, ..., G and i in 1, ..., n k, after fittingthe following modely = a + b + c +εig G k igSmall values of S indicate strong correlation so if the genes in G are tightlycorrelated in phenotypic group k but not in group k′, then the ratio S(G, k)/S(G, k′)will be small. The search procedure begins with an arbitrary set of genes,.2∑∑1 2F( υ i, υ i′)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!