12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Identifying Differentially Expressed Gene Combinations 189adding or removing genes to improve the score until further improvement isnot possible, or until a predetermined number of iterations are completed.Gene sets are not required to be of a particular size, so they are added orremoved individually.In recent years, several network-based methods for discovering genecoexpressionpatterns have been proposed. Bayesian networks are most frequentlyused in this fashion, although Boolean networks and other approaches have beenapplied as well (19). Bayesian networks offer a graphical representation of thedependence structure among a set of variables. In the gene-expression setting,genes are represented in the network by nodes, with edges connecting those nodeswhen genes strongly coregulate. The parameters of the underlying Bayesianmodel can be estimated independently of the graphical component of the networkmodel and summarized by nongraphical means. However, the graphical networkrepresentation offers additional intuitive, potentially informative, and possiblybiologically relevant features with which gene interactions can be characterized.Examples include the degree of connectivity seen in a set of correlated genes andthe number of distinct components, or gene sets that can be identified. Candidatenetwork structures can be scored for goodness of fit of the dependence relationshipsobserved in the data. Graphical features that characteristically associatewith high-scoring network structures are likely to be interesting.Every possible network structure corresponds to a Bayesian model, which canbe fitted to the data. The score for a network is calculated as the log likelihood ofthe corresponding model, and so the best-fitting network is one that correspondsto the maximum likelihood model. The space of all possible networks growsexponentially with the number of genes/nodes under consideration and so, as inthe methods described earlier in this section, greedy stochastic search algorithmsare used to navigate the network space. Edges are added or removed at random toimprove the overall fit and the search stops after a predetermined number of stepsor when improvement is no longer possible.Work by Friedman and colleagues (11,20) is representative of results in thisarea. The investigators search for the network structure that best fits a set ofgene-expression data, identify biologically interesting graphical features of thatnetwork, and assign bootstrap-based confidences to the discoveries. The Markovblanket of a set of genes/nodes is one such feature. Imagine that a set of nodesX is isolated in a corner of the network, relating to the remaining nodes onlythrough the mediation of a small set of neighbors Y. Then Y is described as theMarkov blanket of X. A bootstrap procedure is used to assign confidences todiscovered features. The data is repeatedly resampled with replacement, eachtime the search for the best-fitting network structure is performed on the resampleddata. The proportion of samples exhibiting the feature under study is taken as theconfidence level for the feature.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!