Combinatorial and High-Throughput Screening of Materials ...
Combinatorial and High-Throughput Screening of Materials ...
Combinatorial and High-Throughput Screening of Materials ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ACS <strong>Combinatorial</strong> Science<br />
possible to recognize these relationships with far greater speed<br />
<strong>and</strong> not necessarily depending on a priori models, provided the<br />
availability <strong>of</strong> the relevant data. The predictive aspect <strong>of</strong> data<br />
mining tasks can serve for both classification <strong>and</strong> regression<br />
operations. Data mining, which is an interdisciplinary blend <strong>of</strong><br />
statistics, machine learning, artificial intelligence, <strong>and</strong> pattern<br />
recognition, is viewed to have four core tasks:<br />
Cluster analysis seeks to find groups <strong>of</strong> closely related observations<br />
<strong>and</strong> is valuable in targeting groups <strong>of</strong> data that may have<br />
well behaved correlations <strong>and</strong> can form the basis <strong>of</strong> physics-based<br />
as well statistically based models. Cluster analysis when integrated<br />
with CHT experimentation, can serve as a powerful tool<br />
for rapidly screening combinatorial libraries.<br />
Predictive modeling helps build models for targeted objectives<br />
(e.g., a specific materials property) as a function <strong>of</strong> input or<br />
exploratory variables. The success <strong>of</strong> these models also helps<br />
refine the relevance <strong>of</strong> the input parameters.<br />
Association analysis is used to discover patterns that describe<br />
strongly associated features in data (for instance the frequency <strong>of</strong><br />
association <strong>of</strong> a specific materials property to materials chemistry).<br />
Such an analysis over extremely large data sets is made possible<br />
with the development <strong>of</strong> very high-speed search algorithms <strong>and</strong><br />
can help to develop heuristic rules for materials behavior governed<br />
by many factors.<br />
Anomaly detection does the opposite by identifying data or<br />
observations significantly different from the norm. To be able to<br />
identify such anomalies or outliers are critical in CHT experiments<br />
<strong>and</strong> high throughput screening especially to quantitatively<br />
assess uncertainty <strong>and</strong> accuracy <strong>of</strong> results <strong>and</strong> distinguish between<br />
true “discoveries” <strong>and</strong> false positives.<br />
Massive data generated from high-troughput characterization<br />
tools leads to the need for effective data analysis <strong>and</strong> interpretation<br />
to identify the trends <strong>and</strong> relationships in the data. 9<br />
Advanced mathematical <strong>and</strong> statistical techniques are used in<br />
CHT screening to determine, <strong>of</strong>ten by indirect means, the<br />
properties <strong>of</strong> substances that otherwise would be very difficult<br />
to measure directly. 82,83 These techniques have been successfully<br />
applied for visualization <strong>and</strong> pattern recognition, 84,85 lead identification<br />
<strong>and</strong> optimization, 86,87 process optimization, 66,88 <strong>and</strong><br />
development <strong>of</strong> predictive models <strong>and</strong> quantitative structure<br />
activity relationships (QSARs) <strong>and</strong> quantitative structure<br />
property relationships (QSPRs). 85,89,90 The implementation<br />
<strong>of</strong> multivariate calibration improves analysis selectivity when<br />
spectroscopic or any other types <strong>of</strong> measurements are performed<br />
with relatively low resolution instruments. 91,92<br />
Multivariate data analysis techniques are essential analytical<br />
tools for in combinatorial materials science. The attractiveness <strong>of</strong><br />
partial least-squares (PLS) method has been demonstrated for<br />
in situ monitoring <strong>of</strong> combinatorial reactions using single-bead<br />
FTIR spectroscopy. 93 Using these tools, quantitative <strong>and</strong> qualitative<br />
analysis <strong>of</strong> combinatorial samples was performed with IR<br />
spectra with severely overlapped b<strong>and</strong>s. A limitation <strong>of</strong> PLS<br />
method is in the ability to reveal only the changes due to reaction<br />
but it cannot provide spectra for pure components. 93 This<br />
limitation can be overcome by using other multivariate analysis<br />
tools such as evolving factor analysis <strong>and</strong> multivariate curve<br />
resolution methods. 8<br />
In combinatorial materials science, pattern recognition techniques<br />
find similarities <strong>and</strong> differences between samples <strong>of</strong><br />
combinatorial libraries, serve as a powerful visualization <strong>and</strong><br />
descriptor-identification tool, <strong>and</strong> can warn <strong>of</strong> the occurrence<br />
<strong>of</strong> abnormalities in the measured data. 84 These techniques can<br />
REVIEW<br />
reveal correlated patterns in large data sets obtained from<br />
combinatorial experiments, large databases or simulations, can<br />
determine the structural relationship among screening hits, <strong>and</strong><br />
can significantly reduce data dimensionality to make it more manageable<br />
in the database with the minimal loss <strong>of</strong> information.<br />
79 81,94<br />
Methods <strong>of</strong> pattern recognition include principal components<br />
analysis (PCA), hierarchical cluster analysis (HCA), s<strong>of</strong>t independent<br />
modeling <strong>of</strong> class analogies (SIMCA), neural networks,<br />
<strong>and</strong> some others. 95,96<br />
PCA is a widely used approach to search for patterns by<br />
seeking to reduce the dimensionality <strong>of</strong> data because it provides<br />
an unsupervised <strong>and</strong> robust way for analysis <strong>of</strong> multivariate data.<br />
PCA projects the data set onto a subspace <strong>of</strong> lower dimensionality<br />
with removed collinearity. PCA achieves this objective by<br />
explaining the variance <strong>of</strong> the data matrix in terms <strong>of</strong> the weighted<br />
sums <strong>of</strong> the original variables without significant loss <strong>of</strong> information.<br />
These weighted sums <strong>of</strong> the original variables are called<br />
principal components (PCs). Each PC is a suitable linear<br />
combination <strong>of</strong> all the original variables. The first PC accounts<br />
for the maximum variance (eigenvalue) in the original data set.<br />
The second PC is orthogonal (uncorrelated) to the first <strong>and</strong><br />
accounts for most <strong>of</strong> the remaining variance. Thus, the mth PC is<br />
orthogonal to all others <strong>and</strong> has the mth largest variance in the set<br />
<strong>of</strong> PCs. Once the N PCs have been calculated using eigenvalue/<br />
eigenvector matrix operations, only M PCs with variances above<br />
a critical level are retained. This M-dimensional PC space has<br />
retained most <strong>of</strong> the information from the initial N-dimensional<br />
variable space, by projecting it into orthogonal axes <strong>of</strong> high<br />
variance <strong>and</strong> forming a suitable PCA model. 18<br />
In PCA, the scores plots show the relations between analyzed<br />
samples, while the loadings plots show the relations between<br />
analyzed variables. To ensure the quality <strong>of</strong> the data analyzed<br />
using PCA, several statistical tools are also available. Multivariate<br />
control charts use two statistical indicators <strong>of</strong> the PCA model such<br />
as Hotelling’s T 2 <strong>and</strong> Q values plotted as a function <strong>of</strong> combinatorial<br />
sample or time. The significant principal components <strong>of</strong> the<br />
PCA model are used to develop the T 2 -chart <strong>and</strong> the remaining<br />
PCs contribute to the Q-chart. The Hotelling’s T 2 statistic is the<br />
sum <strong>of</strong> normalized squared scores <strong>and</strong> is a measure <strong>of</strong> the variation<br />
in each sample within the PCA model. The T 2 contributions<br />
describe how individual variables contribute to the Hotelling’s T 2<br />
value for a given sample. The Q residual is the squared prediction<br />
error <strong>and</strong> describes how well the PCA model fits each sample. It is<br />
a measure <strong>of</strong> the amount <strong>of</strong> variation in each sample not captured<br />
by principal components retained in the model. The graphical<br />
representation <strong>of</strong> PCA with T 2 <strong>and</strong> Q statistics for multiple<br />
samples in the PCA model is illustrated in Figure 5.<br />
It is important not only to discover a target feature in a<br />
combinatorial library but also to glean from these combinatorial<br />
experiments the variations in the library. An illustrative example<br />
can be provided from combinatorial polymer libraries <strong>of</strong> polyanhydrides<br />
(see Figure 6), which are important biomaterials for<br />
biomedical applications <strong>and</strong> drug delivery because <strong>of</strong> their<br />
degradation <strong>and</strong> erosion properties. 97 For drug delivery applications,<br />
the choice <strong>of</strong> correct chemistry <strong>and</strong> management <strong>of</strong> various<br />
degradation factors (for example, kinetics <strong>of</strong> drug release <strong>and</strong><br />
polymer-drug interaction) are essential for proper drug stabilization.<br />
<strong>High</strong> throughput FTIR screening has been shown to be an<br />
appropriate tool for screening compositional libraries <strong>of</strong> polyanhydride<br />
copolymers. FTIR spectra (see Figure 6A, B) <strong>and</strong><br />
images (see Figure 6C) were utilized for the PCA-based tracking <strong>of</strong><br />
the subtle changes in chemistry <strong>of</strong> polyanhydride copolymers. 98<br />
586 dx.doi.org/10.1021/co200007w |ACS Comb. Sci. 2011, 13, 579–633