12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

134 DavuluriPWM-based models do not capture the complexity of TF–DNA interactionsand produce too many false predictions at genome scale, these simple and easilyinterpretable models provide a very good approximation to reality (38). Toreduce the number of predictions found by chance, recent methods have incorporatedadditional information, such as use of complex sequence motif models(39–41), conservation of TFBSs in orthologous promoters of closely relatedspecies (42–45), and clustering of binding sites in promoters of coregulated genes(3,24,46). In this protocol, a combination of sequence conservation and clusteringof TFBSs of known PWMs is described in predicting and classifying the targetpromoters (see Note 1). Readers are encouraged to read recent reviews (47–49)for practical strategies to scan for TFBSs.3.1.1. Identifying Candidate TFBSs by PWM ApproachA number of databases of experimentally supported TFBSs have beenassembled (Table 1). The largest and perhaps most widely used databases areTRANSFAC (12) and JASPAR (11), which catalog eukaryotic TFs, associatedbinding sites, and PWMs. Similarly, PWM-based sequence scanning programs,such as MatInspector (50), MATCH (19), and MATRIX SEARCH (51),can be used to search the query sequences for candidate TFBSs by matchingthe corresponding PWMs. These programs are quite similar in the use of PWMdatabases (e.g., TRANSFAC or JASPAR) and statistically principled methodsin scoring the sites.Choosing a cutoff threshold for the PWM score is the main requirement indetermining whether a sequence site is a putative TFBS or not, and the number ofTFBS predictions in a candidate sequence is inversely proportional to the cutoffvalues. A basic procedure to scan a query sequence using PWM is illustrated inFig. 1. MATCH uses the matrix library collected in the TRANSFAC database.MATCH has built-in optimized matrix cutoff values (called profiles), which wereprecalculated to provide three different search modes of varying stringency. Theuser can choose one of these three predefined profiles: (1) minFP—cutoffsminimizing false-negative rate, (2) minFN—cutoffs minimizing false-negativerate, and (3) minSum—cutoffs minimizing the sum of both errors. The use ofminSum profile is suggested, because sequence conservation is added as anadditional criterion to minimize the false-positive predictions in the next step.3.1.2. Identification of Conserved TFBSs in Orthologous PromotersAs PWM-based methods tend to produce an overwhelming number of falsepositives,phylogenetic footprinting or comparative genomics approach has beenwidely used by both experimental and computational biologists to aid regulatoryelement identification by examining orthologous sequences from multiple species(52). Recent studies (28,53,54) have identified blocks of highly conserved regions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!