12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Modeling Transcription Factor Target Promoters 139Select a subset of variables by removing a certain percentage (20–25%) of theleast important variables by using mean decrease in accuracy and/or Gini indexvalues in columns 3 and 4 of the importance matrix (see Notes 3 and 4).8. Repeat steps 8 and 9 by using the selected subset of variables 5–10 times, andfinally select a subset of 10–20% of the most important variables.9. Using the subset of variables selected in step 9, run CART to produce the decisiontree that classifies the two sets of promoters. The splitting rules at each node canbe interpreted as “if-then” statements to determine what TFBS are present in eachclass of promoters to determine the regulatory modules.3.2.2. Results of Application to ER-α TargetsThe above steps (except the Random Forest steps 8 and 9) have been successfullyimplemented in the earlier studies to classify ER-α targets from nontargets(24) and acetylated ER-α targets from methylated ER-α targets (3,24). A manuscriptdescribing the above algorithm is currently under preparation. An automatedversion of the computational pipeline would soon be made available (see Note 5).To demonstrate the above steps, the ER-α target data set from Cheng et al.(3) consisting of acetylated ER-α promoters (target set) and methylated ER-αpromoters (nontarget set) are used. Briefly, ChIP-chip experiments were conductedby probing the 12 K CpG-island microarray (30) with series of differentChIP assays using antibodies against ER-α, acetyl-, and dimethyl-H3-K9 inMCF7 cells treated with E2 for 0, 3, 12, and 24 h. Integrated statistical andgenome analysis of these data identified 92 ER-α target promoters, of which40 were classified as acetylated (upregulated) and 28 as methylated (downregulated)targets. Retrieve human and mouse orthologous promoter sequences thatcorrespond to these probes from OMGProm database, and ran MATCH programon both human and mouse sequences. First find the TFBSs of ER-α (primary TFof interest). Table 2 gives the list of genes, and genomic coordinates of thesequences analyzed. Then locate all the TFBSs within −220 and +220 regionaround the predicted ERE, and prepare the data matrix as explained in step 7.Table 3 presents part of the data matrix, which includes the top ranking TFs asdetermined by Random Forest variable importance (in step 9). The original datamatrix contains all the TFs that have at least one TFBS in 20% of either of thepromoter sets. Figure 2 presents the plot of variable importance obtained instep 9. Then select the top 10 ranking variables, ranked according to the meandecrease in accuracy, for step 10. Here the number of variables selected wasarbitrarily chosen; user should repeat step 10 by varying this number. Using thesubdata matrix that contains only the selected 10 variables run CART and/orrpart program.Figure 3A presents a minimal cost tree constructed based on these TFBSs asthe categorical predictor variables. The prediction rate based on 10-fold

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!