12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

136 DavuluriTF-map alignments of orthologous promoters is suggested. The effectivenessof either of the approaches largely depends on the quality and availability oforthologous promoter sequences. The use of OMGProm database (61), whichcontains the promoters of orthologous mammalian genes and their sequencealignments is suggested.3.1.3. Classification of Target Promoters From Nontargetsand Inferring cis-Regulatory Modules (TFBS Clusters)Using Decision Tree MethodologyTF interaction is an important aspect of mammalian gene regulation.Through the fine tuning of different partners, a specific TF could involve indifferent cellular processes and achieve opposite downstream effects by eitheractivating or repressing the direct target promoters (3). Different methods toinfer cis-regulatory modules in a given set of target promoters have beendeveloped (3,24,46,62,63). Most of the methods rely on discriminating a setof target promoters from nontarget promoters by using TFBSs or sequencemotifs as feature variables in classification function. The best discriminatingfeature variables (e.g., TFBSs) are then extracted to infer the cis-regulatorymodules. In this protocol, the use of decision tree approaches is recommendedfor their simplicity and interpretability.Tree-based statistical methods have become increasingly popular sincethe publication of the CART monograph (64). These approaches have manyadvantages over discriminant analysis, as tree-based models are easy tointerpret, are nonparametric, and make no assumptions regarding thecovariance structure of the two groups. CART analysis provides a betterunderstanding of the dependence of the response variables (y i) (promoterstatus—target or nontarget in the present case) on the structure of therelationships of potential explanatory variables (x i) (e.g., TFBSs—presentor not present in a given promoter) and their combinations, together withtheir high-level interactions. If (y i) is binary, CART produces a classificationtree, whereas if the response variable is continuous, a regression tree isproduced. In essence, CART uses recursive partitioning and asymmetricstratification to develop tree-like models. CART splits the data at a parentnode by determining a cutoff value along the range of values for anexplanatory variable, thus producing two child nodes with greaterhomogeneity (purity) than the parent node.Child nodes are recursively treated as parent nodes, thereby continuouslysplitting the data until a stopping criterion is reached and a set of terminal nodesare produced, which in total resemble an inverted tree. Overfitted trees are grown,and then pruning trims the trees to a more optimal size using test samples orcross-validation. Each terminal node is assigned a class that is determined by the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!