Prediction of continuous phenotypes in mouse, fly, and rice genome wide association studies with support vector regression SNPs and ridge regression classifier

Document Type

Conference Proceeding

Publication Date

3-2-2016

Abstract

The ranking of SNPs and prediction of phenotypes in continuous genome wide association studies is a subject of increasing interest with applications in personalized medicine and animal and plant breeding. The ranking of SNPs in case control (discrete label) genome wide association studies has been examined in several previous studies with machine learning techniques but this is poorly explored for studies with quantitative labels. Here we study ranking of SNPs in mouse, fly, and rice continuous genome wide association studies given by the popular univariate Pearson correlation coefficient and the multivariate support vector regression and ridge regression. We perform cross-validation with the support vector regression and ridge regression models on top ranked SNPs and compute correlation coefficients between true and predicted phenotypes. Our results show that ridge regression prediction with top ranked support vector regression SNPs gives the highest accuracy. On all datasets we achieve accuracies comparable to previously published values but with fewer SNPs. Our work shows we can learn parsimonious SNP models for predicting continuous labels in genome wide studies.

Identifier

84969641304 (Scopus)

ISBN

[9781509002870]

Publication Title

Proceedings 2015 IEEE 14th International Conference on Machine Learning and Applications Icmla 2015

External Full Text Location

https://doi.org/10.1109/ICMLA.2015.224

First Page

1246

Last Page

1250

This document is currently not available here.

Share

COinS