Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality
Document Type
Article
Publication Date
10-2-2019
Abstract
Tuning parameter selection is of critical importance for kernel ridge regression. To date, a data-driven tuning method for divide-and-conquer kernel ridge regression (d-KRR) has been lacking in the literature, which limits the applicability of d-KRR for large datasets. In this article, by modifying the generalized cross-validation (GCV) score, we propose a distributed generalized cross-validation (dGCV) as a data-driven tool for selecting the tuning parameters in d-KRR. Not only the proposed dGCV is computationally scalable for massive datasets, it is also shown, under mild conditions, to be asymptotically optimal in the sense that minimizing the dGCV score is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework. Supplemental materials for this article are available online.
Identifier
85076878734 (Scopus)
Publication Title
Journal of Computational and Graphical Statistics
External Full Text Location
https://doi.org/10.1080/10618600.2019.1586714
e-ISSN
15372715
ISSN
10618600
First Page
891
Last Page
908
Issue
4
Volume
28
Grant
1811812
Fund Ref
Simons Foundation
Recommended Citation
Xu, Ganggang; Shang, Zuofeng; and Cheng, Guang, "Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality" (2019). Faculty Publications. 7279.
https://digitalcommons.njit.edu/fac_pubs/7279
