20.07.2013 Views

Raport de cercetare - Lorentz JÄNTSCHI

Raport de cercetare - Lorentz JÄNTSCHI

Raport de cercetare - Lorentz JÄNTSCHI

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

[25]. In this<br />

situation, three alternative linear regression methods may be consi<strong>de</strong>red, these are a) Ridge<br />

Regression(RR), b) Principal Component Regression(PCR) and c) Partial Least Squares(PLS).<br />

These three methods are also very useful even when the in<strong>de</strong>pen<strong>de</strong>nt variables are highly<br />

correlated. In the ridge regression method, <strong>de</strong>scriptors are transformed into principal components<br />

(PCs). All of the principal components are used in the regression, but they are first shrunk<br />

differentially according to their eigenvalues and a ridging constant. In the principal components<br />

regression, the <strong>de</strong>scriptors are transformed into principal components after which a subset of the<br />

PCs is used in an ordinary least square regression. Partial least squares also uses a set of linear<br />

combinations of the <strong>de</strong>scriptors but, in this approach, the <strong>de</strong>pen<strong>de</strong>nt variable is also consi<strong>de</strong>red in<br />

this step. Each of these methods makes use of the entire available pool of in<strong>de</strong>pen<strong>de</strong>nt variables as<br />

opposed to selecting a subset, which introduces bias and may result in the elimination of important<br />

parameters from the study. Formal comparisons have consistently shown subsetting to be less<br />

effective than alternative methods, such as these, that retain all of the in<strong>de</strong>pen<strong>de</strong>nt variables and<br />

use other approaches to <strong>de</strong>al with the rank <strong>de</strong>ficiency [26]. Statistical theory suggests that RR is<br />

the best of the three methods and this has been generally borne out in multiple comparative studies<br />

[26–28]. As such, the RR mo<strong>de</strong>ls <strong>de</strong>veloped in the current study are analyzed in more <strong>de</strong>tail than<br />

the PCR and PLS mo<strong>de</strong>ls. The RR vector of regression coefficients, b, is given by b = (X T X+kI) -<br />

1 X T Y, where X is the matrix of <strong>de</strong>scriptors, Y is the vector of observed activities, I is an i<strong>de</strong>ntity<br />

matrix, and k is a nonnegative constant known as the “ridge” constant.<br />

÷ Lucrare: Multivariate analysis of experimental and computational <strong>de</strong>scriptors of molecular<br />

lipophilicity<br />

÷ Autori: Raimund Mannholda, Gabriele Crucianib, Karl Drossc, Roelof Rekkerd<br />

÷ Sursa: Journal of Computer-Ai<strong>de</strong>d Molecular Design, Volume 12, Number 6, 1998, p. 573-<br />

581(9).<br />

÷ Rezumat:<br />

Two experimental (log P, RMw) and 17 calculation <strong>de</strong>scriptors for molecular lipophilicity<br />

(fragmental, atom-based for based on molecular properties) were investigated by multivariate<br />

analysis for a database of 159 compounds including both simple structures as well as more<br />

complex drug molecules. Principal component analysis (PCA) of the entire database exhibits a<br />

clustering of chemical groups; preciseness of clustering corresponds to chemical similarity. Thus,<br />

diversity searching in databases might effectively be performed by PCA on the basis of<br />

calculatedlog P. The comparative validity check of experimental and computational procedures by<br />

regression analysis and PCA was performed with a chemically balanced, reduced data set (n D 55)<br />

representing 11 chemical groups with 5 members each. Regression of experimental <strong>de</strong>scriptors<br />

(log Poct versus RMw) proves that chromatographic data, obtained un<strong>de</strong>r well-<strong>de</strong>fined<br />

experimental conditions, can be used as valid substitutes for log P. Regression of calculated versus<br />

experimental lipophilicity data shows a superiority of fragmental over atom-based methods and<br />

approaches based on molecular properties, as indicated by correlation coefficients, slopes and<br />

intercepts. Inaddition, PCA revealed that fragmental methods (Rekker-type, KOWWIN, KLOGP)<br />

sense the compound ranking in log P data to almost the same extent as experimental approaches.<br />

For atom-based procedures and CLOGP, both the comparability of absolute values and the<br />

sensing of the compound ranking in the database are slightly less. This trend is more pronounced<br />

for the methods based on molecular properties, with the exception of BLOGP.<br />

÷ Detalii <strong>de</strong> interes:<br />

Validity check of calculation methods by regression analysis and PCA. Regression analysis of<br />

calculated versus experimental data shows that in general fragmental methods are superior to<br />

atom-based and 3D-related approaches. These results are in accord with our earlier analysis with a<br />

smaller dataset [35]. A limited applicability is often attributed to fragmental methods due to<br />

missing fragment values. This is true for CLOGP and Rekker-type methods, but not for<br />

KOWWIN. Information obtained by PCA on the same dataset, in general parallels the regression<br />

data, but unravels more precisely the comparability in absolute values and the reflection of<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!