Document Type

Dissertation

Date of Award

12-31-2021

Degree Name

Doctor of Philosophy in Mathematical Sciences - (Ph.D.)

Department

Mathematical Sciences

First Advisor

Ji Meng Loh

Second Advisor

Sunil Kumar Dhar

Third Advisor

Wenge Guo

Fourth Advisor

Sundarraman Subramanian

Fifth Advisor

Yixin Fang

Abstract

Stochastic gradient descent (SGD) is a popular iterative method for model parameter estimation in large-scale data and online learning settings since it goes through the data in only one pass. While SGD has been well studied for independent data, its application to spatially-correlated data largely remains unexplored. This dissertation develops SGD-based parameter estimation and statistical inference algorithms for the spatial autoregressive (SAR) model, a common model for spatial lattice data.

This research contains three parts. (I) The first part concerns SGD estimation and inference for the SAR mean regression model. A new SGD algorithm based on maximum likelihood estimator (MLE) is proposed to accommodate the spatial correlation in the SAR model. Also, a statistical inference algorithm is proposed based on the online bootstrap resampling procedure (Fang et al., 2018). The asymptotic properties are then developed for the estimators and the finite sample properties for the estimators are investigated by simulations. The SGD-based parameter estimation procedures are shown to be more than 40 times faster than MLE for the settings examined. The SGD estimators for all parameters are close to the true values. The empirical coverages of confidence intervals (CIs) are at the nominal levels for the coefficients of the covariates but not for the spatial parameter. Two methods are proposed to improve the empirical coverage of CI for the spatial parameter. (II) The second part is regarding the SAR quantile regression mode. SGD algorithms based on one-stage quantile regression (1SQR) and two-stage quantile regression (2SQR) are developed for parameter estimation and statistical inference. Simulation results show that SGD estimator based on 2SQR is unbiased while that based on 1SQR is biased. Also, the empirical coverages of CIs constructed using SGD based on 2SQR are all at the nominal levels. (III) In the last part, this research analyzes a real dataset on charges for medical services provided by physicians and healthcare professionals. Both SAR mean regression and quantile regression models are fitted to study the effect of location and other characteristics of medical facilities on medical prices. Modeling results show that the spatial correlation parameter is significantly different from 0 (95% CI is (-0.27, -0.23) for the mean regression), suggesting spatial correlation of medical charges. Also the models find that charges depend on the total number of services provided yearly, gender of the provider, facility type, and whether the provider is in a metropolitan area.

Share

COinS