02.01.2015 Views

Combinatorial and High-Throughput Screening of Materials ...

Combinatorial and High-Throughput Screening of Materials ...

Combinatorial and High-Throughput Screening of Materials ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ACS <strong>Combinatorial</strong> Science<br />

possible to recognize these relationships with far greater speed<br />

<strong>and</strong> not necessarily depending on a priori models, provided the<br />

availability <strong>of</strong> the relevant data. The predictive aspect <strong>of</strong> data<br />

mining tasks can serve for both classification <strong>and</strong> regression<br />

operations. Data mining, which is an interdisciplinary blend <strong>of</strong><br />

statistics, machine learning, artificial intelligence, <strong>and</strong> pattern<br />

recognition, is viewed to have four core tasks:<br />

Cluster analysis seeks to find groups <strong>of</strong> closely related observations<br />

<strong>and</strong> is valuable in targeting groups <strong>of</strong> data that may have<br />

well behaved correlations <strong>and</strong> can form the basis <strong>of</strong> physics-based<br />

as well statistically based models. Cluster analysis when integrated<br />

with CHT experimentation, can serve as a powerful tool<br />

for rapidly screening combinatorial libraries.<br />

Predictive modeling helps build models for targeted objectives<br />

(e.g., a specific materials property) as a function <strong>of</strong> input or<br />

exploratory variables. The success <strong>of</strong> these models also helps<br />

refine the relevance <strong>of</strong> the input parameters.<br />

Association analysis is used to discover patterns that describe<br />

strongly associated features in data (for instance the frequency <strong>of</strong><br />

association <strong>of</strong> a specific materials property to materials chemistry).<br />

Such an analysis over extremely large data sets is made possible<br />

with the development <strong>of</strong> very high-speed search algorithms <strong>and</strong><br />

can help to develop heuristic rules for materials behavior governed<br />

by many factors.<br />

Anomaly detection does the opposite by identifying data or<br />

observations significantly different from the norm. To be able to<br />

identify such anomalies or outliers are critical in CHT experiments<br />

<strong>and</strong> high throughput screening especially to quantitatively<br />

assess uncertainty <strong>and</strong> accuracy <strong>of</strong> results <strong>and</strong> distinguish between<br />

true “discoveries” <strong>and</strong> false positives.<br />

Massive data generated from high-troughput characterization<br />

tools leads to the need for effective data analysis <strong>and</strong> interpretation<br />

to identify the trends <strong>and</strong> relationships in the data. 9<br />

Advanced mathematical <strong>and</strong> statistical techniques are used in<br />

CHT screening to determine, <strong>of</strong>ten by indirect means, the<br />

properties <strong>of</strong> substances that otherwise would be very difficult<br />

to measure directly. 82,83 These techniques have been successfully<br />

applied for visualization <strong>and</strong> pattern recognition, 84,85 lead identification<br />

<strong>and</strong> optimization, 86,87 process optimization, 66,88 <strong>and</strong><br />

development <strong>of</strong> predictive models <strong>and</strong> quantitative structure<br />

activity relationships (QSARs) <strong>and</strong> quantitative structure<br />

property relationships (QSPRs). 85,89,90 The implementation<br />

<strong>of</strong> multivariate calibration improves analysis selectivity when<br />

spectroscopic or any other types <strong>of</strong> measurements are performed<br />

with relatively low resolution instruments. 91,92<br />

Multivariate data analysis techniques are essential analytical<br />

tools for in combinatorial materials science. The attractiveness <strong>of</strong><br />

partial least-squares (PLS) method has been demonstrated for<br />

in situ monitoring <strong>of</strong> combinatorial reactions using single-bead<br />

FTIR spectroscopy. 93 Using these tools, quantitative <strong>and</strong> qualitative<br />

analysis <strong>of</strong> combinatorial samples was performed with IR<br />

spectra with severely overlapped b<strong>and</strong>s. A limitation <strong>of</strong> PLS<br />

method is in the ability to reveal only the changes due to reaction<br />

but it cannot provide spectra for pure components. 93 This<br />

limitation can be overcome by using other multivariate analysis<br />

tools such as evolving factor analysis <strong>and</strong> multivariate curve<br />

resolution methods. 8<br />

In combinatorial materials science, pattern recognition techniques<br />

find similarities <strong>and</strong> differences between samples <strong>of</strong><br />

combinatorial libraries, serve as a powerful visualization <strong>and</strong><br />

descriptor-identification tool, <strong>and</strong> can warn <strong>of</strong> the occurrence<br />

<strong>of</strong> abnormalities in the measured data. 84 These techniques can<br />

REVIEW<br />

reveal correlated patterns in large data sets obtained from<br />

combinatorial experiments, large databases or simulations, can<br />

determine the structural relationship among screening hits, <strong>and</strong><br />

can significantly reduce data dimensionality to make it more manageable<br />

in the database with the minimal loss <strong>of</strong> information.<br />

79 81,94<br />

Methods <strong>of</strong> pattern recognition include principal components<br />

analysis (PCA), hierarchical cluster analysis (HCA), s<strong>of</strong>t independent<br />

modeling <strong>of</strong> class analogies (SIMCA), neural networks,<br />

<strong>and</strong> some others. 95,96<br />

PCA is a widely used approach to search for patterns by<br />

seeking to reduce the dimensionality <strong>of</strong> data because it provides<br />

an unsupervised <strong>and</strong> robust way for analysis <strong>of</strong> multivariate data.<br />

PCA projects the data set onto a subspace <strong>of</strong> lower dimensionality<br />

with removed collinearity. PCA achieves this objective by<br />

explaining the variance <strong>of</strong> the data matrix in terms <strong>of</strong> the weighted<br />

sums <strong>of</strong> the original variables without significant loss <strong>of</strong> information.<br />

These weighted sums <strong>of</strong> the original variables are called<br />

principal components (PCs). Each PC is a suitable linear<br />

combination <strong>of</strong> all the original variables. The first PC accounts<br />

for the maximum variance (eigenvalue) in the original data set.<br />

The second PC is orthogonal (uncorrelated) to the first <strong>and</strong><br />

accounts for most <strong>of</strong> the remaining variance. Thus, the mth PC is<br />

orthogonal to all others <strong>and</strong> has the mth largest variance in the set<br />

<strong>of</strong> PCs. Once the N PCs have been calculated using eigenvalue/<br />

eigenvector matrix operations, only M PCs with variances above<br />

a critical level are retained. This M-dimensional PC space has<br />

retained most <strong>of</strong> the information from the initial N-dimensional<br />

variable space, by projecting it into orthogonal axes <strong>of</strong> high<br />

variance <strong>and</strong> forming a suitable PCA model. 18<br />

In PCA, the scores plots show the relations between analyzed<br />

samples, while the loadings plots show the relations between<br />

analyzed variables. To ensure the quality <strong>of</strong> the data analyzed<br />

using PCA, several statistical tools are also available. Multivariate<br />

control charts use two statistical indicators <strong>of</strong> the PCA model such<br />

as Hotelling’s T 2 <strong>and</strong> Q values plotted as a function <strong>of</strong> combinatorial<br />

sample or time. The significant principal components <strong>of</strong> the<br />

PCA model are used to develop the T 2 -chart <strong>and</strong> the remaining<br />

PCs contribute to the Q-chart. The Hotelling’s T 2 statistic is the<br />

sum <strong>of</strong> normalized squared scores <strong>and</strong> is a measure <strong>of</strong> the variation<br />

in each sample within the PCA model. The T 2 contributions<br />

describe how individual variables contribute to the Hotelling’s T 2<br />

value for a given sample. The Q residual is the squared prediction<br />

error <strong>and</strong> describes how well the PCA model fits each sample. It is<br />

a measure <strong>of</strong> the amount <strong>of</strong> variation in each sample not captured<br />

by principal components retained in the model. The graphical<br />

representation <strong>of</strong> PCA with T 2 <strong>and</strong> Q statistics for multiple<br />

samples in the PCA model is illustrated in Figure 5.<br />

It is important not only to discover a target feature in a<br />

combinatorial library but also to glean from these combinatorial<br />

experiments the variations in the library. An illustrative example<br />

can be provided from combinatorial polymer libraries <strong>of</strong> polyanhydrides<br />

(see Figure 6), which are important biomaterials for<br />

biomedical applications <strong>and</strong> drug delivery because <strong>of</strong> their<br />

degradation <strong>and</strong> erosion properties. 97 For drug delivery applications,<br />

the choice <strong>of</strong> correct chemistry <strong>and</strong> management <strong>of</strong> various<br />

degradation factors (for example, kinetics <strong>of</strong> drug release <strong>and</strong><br />

polymer-drug interaction) are essential for proper drug stabilization.<br />

<strong>High</strong> throughput FTIR screening has been shown to be an<br />

appropriate tool for screening compositional libraries <strong>of</strong> polyanhydride<br />

copolymers. FTIR spectra (see Figure 6A, B) <strong>and</strong><br />

images (see Figure 6C) were utilized for the PCA-based tracking <strong>of</strong><br />

the subtle changes in chemistry <strong>of</strong> polyanhydride copolymers. 98<br />

586 dx.doi.org/10.1021/co200007w |ACS Comb. Sci. 2011, 13, 579–633

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!