Date of Award
Doctor of Philosophy
Dr. Joseph W. McKean
Dr. Rajib Paul
Dr. Jeffrey Terpstra
Dr. Bradley Huitema
Interrupted time series, interventional design, double bootstrap, variational bayes, multivariate logistic regression, MCAR
Linear models are the most commonly used statistical methods in many disciplines. One of the model assumptions is that the error terms (residuals) are independent and identically distributed. This assumption is often violated and autoregressive error terms are often encountered by researchers. The most popular technique to deal with linear models with autoregressive errors is perhaps the autoregressive integrated moving average (ARIMA). Another common approach is generalized least squares, such as Cochrane-Orcutt estimation and Prais-Winsten estimation. However, these usually have poor behaviors when fitting small samples. To address this problem, a double bootstrap method was proposed by McKnight et al. (2000). One purpose of this study is to transfer their algorithm from Fortran to the R computing environment and, ultimately develop an R software package, which, as R, is freeware and runs on all platforms. Furthermore, this study fixes some flaws of the original method and develops a rank-based alternative, which is robust in terms of resistance to outliers. An R package is created and the usage is demonstrated via examples. Monte Carlo studies for different sample sizes (20, 30, 50, and 100) show that both the original and robust algorithm have the expected properties, even for small sample sizes.
In addition to the original algorithm, we also develop a robust rank-based alternative algorithm. By adopting the rank-based estimator, this new algorithm is resistant to outliers. This is the most important feature of the rank-based estimator. In the same time, this estimator does not loss much efficiency compared to the ordinary least square (OLS) estimator, when the random errors are normally distributed. Comparison of this new algorithm and the original one is made by simulation studies under different settings.
This research also includes an application of the variational approximation in fitting multivariate logistic regression with spatial effects in the Bayesian framework. Variational approximation is much faster than Markov Chain Monte Carlo (MCMC), with- out losing accuracy. Hence this technique becomes an important alternative to MCMC. Spatial models, such as Conditional Autoregressive (CAR) Models, are extremely popular in characterizing spatial dependencies when datasets are collected over aggregated spatial regions, such as, counties, census tracts, zip codes, etc. Modeling spatially correlated multiple health outcomes requires specification of cross-correlations. Statisticians developed several forms of multivariate conditional autoregressive models (MCAR) for joint modeling of multiple diseases. More specifically, this research investigates the generalized multivariate logistic regression with the spatial random eff ects modeled via MCAR. For the Bayesian inference of the parameters, both variational approximation and MCMC are developed. They are then compared in terms of the parameter point estimation, confidence interval (CI) and deviance information criterion (DIC). The simulation results exhibit the speedup and accuracy of the estimation and inference of the parameters.
Zhang, Shaofeng, "Development of Traditional and Rank-Based Algorithms for Linear Models with Autoregressive Errors and Multivariate Logistic Regression with Spatial Random Effects" (2017). Dissertations. 3122.