Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Hyun Keun Cho

Second Advisor

Dr. Magdalena Niewiadomska-Bugaj

Third Advisor

Dr. Joshua Naranjo

Fourth Advisor

Dr. Jun-Seok Oh


Correlated data analysis, estimation efficiency, estimating functions


Correlated data arise frequently in many studies where multiple response variables or repeatedly measured responses within subjects are correlated. My dissertation topic lies broadly in developing various statistical methodologies for correlated types of data such as longitudinal data, clustered data, and multivariate data.

Multiple response variables might be relevant within subjects. A univariate procedure fitting each response separately does not take into account the correlation among responses. To improve estimation efficiency for the regression parameter, this study proposes two estimation procedures by accommodating correlations among the response variables. The proposed procedures do not require knowledge of the true correlation structure nor does it estimate the parameters associated with the correlation. We further propose simple and powerful inference procedures for a goodness-of-fit test that possess the chi-squared asymptotic properties.

For longitudinal count data with overdispersion, the overdispersion parameter plays a significant role in efficient estimation of the regression parameter. In this study, we develop a correlation structure for longitudinal count data in the negative binomial regression model, which is incorporated into a joint estimating equation to estimate both the regression parameter and the overdispersion parameter simultaneously. On the other hand, inclusion of the overdispersion parameter can hinder efficient estimation and inference for the regression parameter when overdispersion is not present. This study provides new modeling for longitudinal count data and proposes a test detecting the presence of overdispersion.

In clinical trials, the outcome of a disease is measured at baseline and again as a response variable recorded over time during follow-up treatments. Baseline measurement is one of the most important determinants of the proper treatment for patients. This study proposes an nonparametric polynomial regression model to assess the treatment effects over time at various baseline levels. We further propose a model selection procedure based on the empirical log-likelihood method, which identifies the optimality of polynomial at each baseline. In addition, we provide a hypothesis test to assess the therapeutic effects by comparing the treatment groups.

Access Setting

Dissertation-Open Access