12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

138 Davuluridiscussed which programs to choose and how to use those programs in practice.Given two sets of genome coordinates (usually genomic locations of targetprobes and nontarget probes from ChIP-chip experimental data analysis) thefollowing steps are recommended in classifying target promoters from nontargetsand inferring the cis-regulatory modules. Several alternatives for targets andnontarget sets are possible (e.g., acetylated promoters vs methylated promoters,target promoters of specific TF vs nontarget promoters, and methylated- vsunmethylated-CpG islands).3.2.1. Procedures1. Retrieve human and mouse orthologous sequence regions by extending 500 bp atboth ends of each probe of the input set of genome coordinates. Each of the retrievedsequences would then be of length 1 kb plus the probe length (60 bp in case of Agilentpromoter array Agilent Technologies [http://www.home.agilent.com.]), with a correspondingorthologous region of similar length in human or mouse. The user can useeither OMGProm or USCS genome browser to retrieve the sequences.2. Use MATCH to predict the TFBSs in each of the sequences by using the minSumcutoff profile.3. Consider the conserved TFBSs by comparing the MATCH predictions in orthologouspromoter pairs. See Note 2 for alternative approaches to predict TFBSs.4. Choose a primary TF of interest and locate its conserved binding sites. Forexample, Estrogen Recepter (ER)-α would be primary TF of interest if the ChIPchipdata was obtained by using antibody against ER-α TF. If multiple TFBSs ofprimary TF are predicted within a given sequence region, choose the TFBS closestto the center of the probe.5. Locate all the TFBSs within −220- to +220-bp region of primary TFBS for eachsequence. Prepare a data matrix (x ij), in which i-th row (object) and j-th column(variable) correspond to i-th promoter and j-th TF, respectively. The data matrix isbinary in nature, such that x ij= 10 ⏐ depending on j-th TF has its binding sitelocated or not located in i-th promoter. Similarly, prepare the classification vectoror response variable y i, such that y i= 10 ⏐ depending on whether i-th promoter istarget or nontarget. The user may also use the actual counts (number of TFBSspresent in the promoter for each TF) in the data matrix, in which case the datamatrix is not binary but quantitative in nature.6. Run RandomForest program in R console by using “randomForest” command (e.g.,ER.rf

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!