10.07.2015 Views

The book of abstracts is available. - Poznań

The book of abstracts is available. - Poznań

The book of abstracts is available. - Poznań

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 th QTL-MAS Workshop, Poznań University <strong>of</strong> Life Sciences, Poland 2010A compar<strong>is</strong>on <strong>of</strong> random forests, boosting and support vector machines forgenomic selection with SNP markersJoseph O. Ogutu 1∗ , Hans-Peter Piepho, Torben Schulz-Streeck 11Bioinformatics Unit, Institute for Crop Science, University <strong>of</strong> Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart,Germany∗ Presenting author: Joseph Ogutu, email: jogutu2007@gmail.comBackground. Genomic selection (GS) <strong>is</strong> a method for estimating breeding values usingmolecular markers spanning the entire genome. It involves jointly estimating the effects <strong>of</strong> allgenes or chromosomal segments and combining the estimates to predict the total genomicbreeding value (GBV). Accurate prediction <strong>of</strong> GBVs thus represents a central challenge tocontemporary plant and animal breeders. However, the ex<strong>is</strong>tence <strong>of</strong> a wide array <strong>of</strong>approaches for predicting breeding values using markers makes it essential to evaluate andcompare their relative predictive performances to identify approaches able to accuratelypredict breeding values. Here, we compare the predictive performances <strong>of</strong> three key machinelearning methods, namely random forests; boosting and support vector machines, forpredicting GBV using SNP-marker data. We also explore the utility <strong>of</strong> random forests forimportance ranking <strong>of</strong> markers prior to mapping their chromosomal positions.Methods. We predicted GBV for one quantitative trait in a simulated data set <strong>of</strong> 3226individuals spanning five generations and an associated genome encompassing fivechromosomes with 10031 biallelic SNP-marker loci generated for the 14 th QTL-MASworkshop. Of the 3226 individuals 2326 in the first four generations were phenotyped andgenotyped and were used to train the random forest, boosting and support vector machineregression models. <strong>The</strong> models were used to predict GBV for the remaining 900 individuals inthe fifth generation without phenotypic records. A 5-fold cross-validation was used toevaluate the predictive performance <strong>of</strong> each method. Predictive accuracy was measured bymean squared error and the correlation between GBV and the simulated values. Randomforest was used to rank the importance <strong>of</strong> markers and positions <strong>of</strong> the most importantmarkers mapped on the pertinent chromosomes.Results. <strong>The</strong> correlation between the predicted and simulated breeding values for the 5-foldcross validation for random forests was modest and ranged from 0.393 to 0.5878 and themean squared error from 65.47 to 70.71. For support vector machines the mean squared errorfor the 5-fold cross-validation ranged from 196.4599 to 236.4377.Conclusions. <strong>The</strong> support vector machine predicted GBV better than random forests andboosting but our evaluation <strong>of</strong> the performance <strong>of</strong> boosting <strong>is</strong> still ongoing.32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!