Date of Award


Degree Name

Doctor of Philosophy


Interdisciplinary Health Sciences

First Advisor

Dr. Rob Lyerla

Second Advisor

Dr. Ray Neff

Third Advisor

Dr. Kieran Fogarty


Public health, statistics, community health, publicly available data, logistic regression, life expectancy


Where we live can affect nearly every aspect of our lives, including how long those lives will be. Previous studies have demonstrated large health disparities between communities in the United States; since the health of a community can significantly impact its residents, evaluating and predicting community health is of interest to many people. Currently the Federal government collects and publishes data on community-level demographics, housing, crime rates, and disease. State-level and non-governmental agencies track and share community housing prices, school performance, neighborhood safety, and other characteristics. In short, there is a wealth of information but the work being done around these data have either been narrowly focused, supplemented with private data, or largely descriptive. This research uses these data to provide insight on a community’s health through examining a change in the life-expectancy estimate over time. Although life expectancy is only one attribute of community health, using life expectancy allows for the representation of the broad concept of community health in an objective and straightforward way.

Over 100 publicly available variables with information on every county in the United States were procured from reputable and publicly available sources such as the U.S. Census Bureau. A previous factor analysis found that a three-factor model was sufficiently parsimonious to represent the 100 variables, and loadings of the variables into each factor allowed for identification of the underlying data structure as containing a factor related to socioeconomics, a factor for lifestyle and community, and a factor for culture and environment. This analysis was used as a framework to build a predictive diagnostic tool of a community’s health. Per guidelines from the literature, only variables with a loading of at least 0.5 on the factors were considered for inclusion in the tool. Stepwise logistic regression was used to select the indicator variables and establish significance of the model. Discriminant analysis and stratified likelihood ratios were used to assess model performance. From the results of the logistic regression model, a probability score was calculated for each county and counties were then classified as at low, moderate, or high risk of an adverse life expectancy change based on the quartiles of the scores.

Seven indicator variables were selected for inclusion in the final model, which was statistically significant (p < 0.001). After assessing performance, the finalized model was output to an Excel™ workbook to produce a prototype of the prediction tool. In the workbook, users are able to enter values for each indicator variable to receive the community’s category of risk as a result, alongside a table which highlights both the top protective and adverse influences in the community. This research continues the ongoing process of transforming publicly available data into prospective action that may improve health equity across the country. The diagnostic tool resulting from this research provides a checklist for public health professionals to target existing disparities by focusing future health improvement projects on measures with the highest effect on life expectancy so that they may work towards improvement in community health.

Access Setting

Dissertation-Open Access

Included in

Public Health Commons