Raport de cercetare - Lorentz JÄNTSCHI
Raport de cercetare - Lorentz JÄNTSCHI
Raport de cercetare - Lorentz JÄNTSCHI
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
[25]. In this<br />
situation, three alternative linear regression methods may be consi<strong>de</strong>red, these are a) Ridge<br />
Regression(RR), b) Principal Component Regression(PCR) and c) Partial Least Squares(PLS).<br />
These three methods are also very useful even when the in<strong>de</strong>pen<strong>de</strong>nt variables are highly<br />
correlated. In the ridge regression method, <strong>de</strong>scriptors are transformed into principal components<br />
(PCs). All of the principal components are used in the regression, but they are first shrunk<br />
differentially according to their eigenvalues and a ridging constant. In the principal components<br />
regression, the <strong>de</strong>scriptors are transformed into principal components after which a subset of the<br />
PCs is used in an ordinary least square regression. Partial least squares also uses a set of linear<br />
combinations of the <strong>de</strong>scriptors but, in this approach, the <strong>de</strong>pen<strong>de</strong>nt variable is also consi<strong>de</strong>red in<br />
this step. Each of these methods makes use of the entire available pool of in<strong>de</strong>pen<strong>de</strong>nt variables as<br />
opposed to selecting a subset, which introduces bias and may result in the elimination of important<br />
parameters from the study. Formal comparisons have consistently shown subsetting to be less<br />
effective than alternative methods, such as these, that retain all of the in<strong>de</strong>pen<strong>de</strong>nt variables and<br />
use other approaches to <strong>de</strong>al with the rank <strong>de</strong>ficiency [26]. Statistical theory suggests that RR is<br />
the best of the three methods and this has been generally borne out in multiple comparative studies<br />
[26–28]. As such, the RR mo<strong>de</strong>ls <strong>de</strong>veloped in the current study are analyzed in more <strong>de</strong>tail than<br />
the PCR and PLS mo<strong>de</strong>ls. The RR vector of regression coefficients, b, is given by b = (X T X+kI) -<br />
1 X T Y, where X is the matrix of <strong>de</strong>scriptors, Y is the vector of observed activities, I is an i<strong>de</strong>ntity<br />
matrix, and k is a nonnegative constant known as the “ridge” constant.<br />
÷ Lucrare: Multivariate analysis of experimental and computational <strong>de</strong>scriptors of molecular<br />
lipophilicity<br />
÷ Autori: Raimund Mannholda, Gabriele Crucianib, Karl Drossc, Roelof Rekkerd<br />
÷ Sursa: Journal of Computer-Ai<strong>de</strong>d Molecular Design, Volume 12, Number 6, 1998, p. 573-<br />
581(9).<br />
÷ Rezumat:<br />
Two experimental (log P, RMw) and 17 calculation <strong>de</strong>scriptors for molecular lipophilicity<br />
(fragmental, atom-based for based on molecular properties) were investigated by multivariate<br />
analysis for a database of 159 compounds including both simple structures as well as more<br />
complex drug molecules. Principal component analysis (PCA) of the entire database exhibits a<br />
clustering of chemical groups; preciseness of clustering corresponds to chemical similarity. Thus,<br />
diversity searching in databases might effectively be performed by PCA on the basis of<br />
calculatedlog P. The comparative validity check of experimental and computational procedures by<br />
regression analysis and PCA was performed with a chemically balanced, reduced data set (n D 55)<br />
representing 11 chemical groups with 5 members each. Regression of experimental <strong>de</strong>scriptors<br />
(log Poct versus RMw) proves that chromatographic data, obtained un<strong>de</strong>r well-<strong>de</strong>fined<br />
experimental conditions, can be used as valid substitutes for log P. Regression of calculated versus<br />
experimental lipophilicity data shows a superiority of fragmental over atom-based methods and<br />
approaches based on molecular properties, as indicated by correlation coefficients, slopes and<br />
intercepts. Inaddition, PCA revealed that fragmental methods (Rekker-type, KOWWIN, KLOGP)<br />
sense the compound ranking in log P data to almost the same extent as experimental approaches.<br />
For atom-based procedures and CLOGP, both the comparability of absolute values and the<br />
sensing of the compound ranking in the database are slightly less. This trend is more pronounced<br />
for the methods based on molecular properties, with the exception of BLOGP.<br />
÷ Detalii <strong>de</strong> interes:<br />
Validity check of calculation methods by regression analysis and PCA. Regression analysis of<br />
calculated versus experimental data shows that in general fragmental methods are superior to<br />
atom-based and 3D-related approaches. These results are in accord with our earlier analysis with a<br />
smaller dataset [35]. A limited applicability is often attributed to fragmental methods due to<br />
missing fragment values. This is true for CLOGP and Rekker-type methods, but not for<br />
KOWWIN. Information obtained by PCA on the same dataset, in general parallels the regression<br />
data, but unravels more precisely the comparability in absolute values and the reflection of<br />
38